-----------------------------------------------------------------------
;  Copyright (C) 1995
;  Associated Universities, Inc. Washington DC, USA.
;
;  This program is free software; you can redistribute it and/or
;  modify it under the terms of the GNU General Public License as
;  published by the Free Software Foundation; either version 2 of
;  the License, or (at your option) any later version.
;
;  This program is distributed in the hope that it will be useful,
;  but WITHOUT ANY WARRANTY; without even the implied warranty of
;  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;  GNU General Public License for more details.
;
;  You should have received a copy of the GNU General Public
;  License along with this program; if not, write to the Free
;  Software Foundation, Inc., 675 Massachusetts Ave, Cambridge,
;  MA 02139, USA.
;
;  Correspondence concerning AIPS should be addressed as follows:
;          Internet email: aipsmail@nrao.edu.
;          Postal address: AIPS Project Office
;                          National Radio Astronomy Observatory
;                          520 Edgemont Road
;                          Charlottesville, VA 22903-2475 USA
-----------------------------------------------------------------------


            Installing AIPS on an IBM RISC System 6000 and

      Performance Results for Convex C220 and SUN Sparc Computers



                  Eric W. Greisen, Mark Calabretta

                Australia Telescope National Facility
                          16 July 1990



I. Introduction

     On 18 and 19 June, 1990, we installed AIPS on an IBM RISC System
6000 computer located at their North Sydney office building.  This
turned out to be a remarkably straightforward operation, with a
minimum of minor hitches, which we will describe below.  The source
code was ported via a QIC150 tape written by 'tar' from the ATNF
(Australia Telescope National Facility) version of AIPS release
15APR90 maintained on CSIRO's Convex 220 in Marsfield, Australia.
The DDT data for the small and medium tests were also written on
a second QIC150 tape in binary form (via 'tar') from the Convex 220.
In other words, the data were transported as an AIPS user's binary
file system from one architecture to another.  All parameters of
the files, so far as we could tell, remained intact in this process.
Therefore, the Convex and the IBM systems use identical binary
formats for double- and single-precision floating numbers, integers,
and characters.  This insures compatability with SUN workstations
as well, since they are used at ATNF with the Convex with complete
compatability.  The DDT data were computed for the 15OCT87 release
of AIPS and were not totally compatible with the current programs.
This will be reflected in somewhat larger differences between the
master and test images both when computed on the Convex and on the
IBM.

     The system was compiled and linked on a non-standard RISC
System model 520, but then moved to a standard model 320.  This
desk-top computer had 24 Mbytes memory, 2 320-Mbyte internal disk
drives, a QIC150 tape drive, and a large color monitor.  The
operating system was release OS: 9021A2.  At the suggestion of
IBM personnel, all Fortran and C routines were compiled with
full optimization.  The two AIPS data areas were on the same disk
drive.

     For comparison, DDT was also run on the Convex and SUN
computers owned by ATNF in Marsfield.  The Convex is a model C220
with 128 Mbytes of memory running under OS 8.0.  The Convex is an
older version and does not have the ESP (enhanced scalar processor)
unit.  The disks used were 4-way striped with file/fragment sizes
of 64K/8K.  DDT was run on an empty Convex with one and with two
processors enabled.  A limited set of routines (including most Q
routines) are compiled vector/parallel on the Convex, but the rest
were compiled with optimization level "O0" ("basic block scalar").
One of the SUN computers used was a Sparc-1 (4/60) running SUNOS
4.0.3c on 16 Mbytes of memory.  The AIPS disks were provided via
NFS by the Convex.  Because of the speed of the Convex, this is
quite usable, but the results for the SUN quoted below are
degraded by this.  As a test, DDT was run both with an empty
Convex and with a lightly-loaded Convex.  The other SUN tested
was a Sparc model 4/370 equipped with 24 Mbytes of memory and a
TAAC board and also running SUNOS 4.0.3c.  On the 4/370, only
the small DDT test was used.  Both SUNs were run with SunView
windows and a single cpu meter display.  The code is compiled
with no optimization on SUN Sparcs.


II. DDT Results


     DDT compares computed images with master images and reports
the peak and rms difference in units of bits with the respect to
the peak of the master image.  The results for the IBM 6000/320,
Convex C220, and SUN Sparc-1 were:

       Test            Peak                   Rms
                Convex  IBM    SUNs    Convex  IBM    SUNs
  Small:
       UVMAP     13.0   13.1   13.3     18.8   18.2   18.8
       UVBEAM    10.1   10.2    9.9     16.2   15.8   16.2
       APCLN     18.8   13.6   14.1     24.1   16.4   20.7
       APRES     16.9   17.0   17.0     22.2   22.4   22.4
       MXMAP     12.6   12.3   12.9     18.5   18.1   18.7
       MXBEAM    13.9   11.1   13.7     19.3   17.7   19.3
       MXCLN     14.8   14.3   14.3     17.6   17.6   17.6
       VTESS      4.1    4.1    4.1     10.9   10.9   10.9

  Medium:
       UVMAP     13.5   12.9   13.5     17.8   18.0   18.1
       UVBEAM    14.0   13.0   13.9     17.8   18.3   18.5
       APCLN     17.3   12.0   11.8     23.7   14.8   14.7
       APRES     15.4   15.4   15.4     21.2   21.4   21.3
       MXMAP     13.0   13.3   12.6     17.8   18.0   18.1
       MXBEAM    14.3   13.0   13.9     17.8   18.4   18.5
       MXCLN     10.3   10.0   10.3     14.2   14.2   14.2
       VTESS      3.3    3.3    3.3     10.8   10.8   10.8

which indicate that all four machines are computing basically
correct results.  The low numbers of bits in VTESS are due to a
change in algorithm since the master data were computed.  The
surprisingly high number of bits for APCLN on the Convex is not
explained, but there are software differences between the
scalar routines used on the IBM and SUN and the vector routines
used on the Convex.

     The tables below summarize the performance results for the
computers.  The Convex numbers are labeled C210 for one processor
and C220 for two processor configurations.  The later reports
120 seconds of cpu time for every minute of real time.  This
makes the cpu times roughly those of a C210, but the real times
are still those of the C220.  The SUN results are labeled 4/60
and 4/370 for runs with an empty Convex file server and 4/60*
for a lightly-loaded Convex.  It is clear that the results for
the SUNs may be very misleading.  All executable and data files
for the SUNs lie on the Convex and are served via NFS.  The swap
areas plus "/" and "/usr" areas are on local SUN disk.  The real
times for smaller tasks are heavily affected by NFS.  The heavily
computational tasks such as APCLN, MX (clean), and ASCAL did use
100 per cent of the SUN cpus for lengthy periods.  Note that NFS
also competes with the DDT tasks themselves for the use of the
SUN cpu.  The comparison of the 4/60 and 4/60* columns show just
the added cost to the SUN cpu time for light loading on the file
serving computer.  This cost alone is significant.  Therefore
the numbers below represent those for a SUN in a commonly used
networking configuration, not those for a stand-alone machine.

     The reported cpu times in seconds for the small test were

                 Convex     IBM            SUN
              C210   C220   320    4/370    4/60   4/60*
  UVSRT       3.28   3.34   1.93    6.81    8.88    9.22
  UVSRT       3.42   3.48   2.05    7.04    8.88    9.37
  UVDIF       1.39   1.41   1.24    4.33    7.14    7.11
  UVDIF       1.38   1.42   1.25    4.33    7.23    7.15
  CCMRG       1.97   2.03   0.91    2.48    3.04    3.23
  SUBIM       1.21   1.23   0.76    1.75    2.14    2.11
  SUBIM       1.21   1.21   0.73    1.69    2.06    2.23
  COMB        1.26   1.26   0.75    1.95    2.32    2.41
  COMB        1.26   1.27   0.79    1.99    2.41    2.46
  COMB        1.29   1.31   0.85    1.94    2.53    2.47
  COMB        1.26   1.27   0.88    1.95    2.42    2.65
  COMB        1.29   1.32   0.83    1.96    2.39    2.57
  COMB        1.27   1.29   0.87    1.97    2.42    2.49
  COMB        1.21   1.24   0.80    1.88    2.36    2.63
  COMB        1.33   1.36   0.85    2.11    2.51    2.79

  UVMAP       5.04   5.14   4.97   21.72   27.50   28.61
  APCLN cln  14.25  14.52  57.97  169.23  209.22  218.31
  APCLN res   3.36   3.47   3.05   14.33   17.79   18.81
  ASCAL       8.05   8.26  24.54   82.29  112.05  116.65
  MX map      7.42   7.65   5.47   24.17   30.07   31.28
  MX clean   30.82  31.94  82.54  292.34  370.10  383.55
  VTESS      16.98  17.61  15.54   77.93   96.51   99.64

  DDT comp   21            14      32      35      41

For the medium test the cpu times were

                  Convex       IBM            SUN
               C210    C220    320    4/370    4/60   4/60*
  UVSRT        4.55    4.63    2.88                   14.17
  UVSRT        4.76    4.87    2.99                   14.42
  UVDIF        2.13    2.16    1.99                   11.63
  UVDIF        2.25    2.29    2.13                   12.11
  CCMRG        3.10    3.19    1.60                    6.34
  SUBIM        1.93    1.94    1.97                    5.22
  SUBIM        1.91    1.96    1.98                    5.08
  COMB         2.02    2.05    1.90                    5.16
  COMB         2.02    2.06    1.94                    5.14
  COMB         2.07    2.10    1.94                    5.45
  COMB         2.03    2.06    1.99                    5.51
  COMB         2.06    2.10    2.04                    5.20
  COMB         2.04    2.09    2.00                    5.26
  COMB         1.99    2.02    1.96                    5.12
  COMB         2.10    2.15    1.98                    5.35
				
  UVMAP       10.87   11.08   14.22                   90.30
  APCLN cln   77.14   78.36  517.12                 1910.02
  APCLN res   10.56   10.84   11.91                   80.71
  ASCAL       17.70   17.92   74.83                  376.64
  MX map      17.42   17.93   17.26                  107.47
  MX clean   112.96  115.64  586.12                 2371.67
  VTESS       39.18   40.00   48.26                  319.66

where the first group lists scalar programs and the second lists
vector/parallel programs.  The "DDT comp" line is the approximate
times required by the AIPS program itself to compile the POPS
source for the test contained in DDTLOAD.001.  The IBM outperforms
the Convex on purely scalar codes by a small amount.  On codes
dominated by IO the IBM does rather better than the Convex in terms
of cpu times.  On codes involving large vectorizable computations,
however, the IBM 320 can be up to 6 times slower than the Convex.
VTESS, which is only moderately vectorized, shows comparable times
for both machines.  The networked SUN 4/60 is 5 or 6 times slower
than the IBM on the heavily computational jobs and 2.5 to 5 times
slower on the small jobs.  The SUN 4/370 is about 25% faster than
the 4/60.  The cost of lightly loading the server cpu is about 3%
added cpu to the SUN 4/60 computer being served.

     The real times, each only good to 1 second accuracy at best
are listed below.  REMEMBER THAT THE SUN IS TOTALLY DEPENDENT ON
NFS for AIPS which very obviously affected the total real and cpu
times.  The reported real times in seconds for the small DDT were

                 Convex     IBM          SUN
               C210  C220   320  4/370  4/60  4/60*
     UVSRT        6     6     7     52    57     86
     UVSRT        5     6     7     50    57     66
     UVDIF        4     4     7     11    15     21
     UVDIF        3     3     3     10    15     19
     CCMRG        4     4     2     20    19     22
     SUBIM        2     2     1     11    11     13
     SUBIM        1     1     1     11    20     22
     COMB         2     2     2     12    12     24
     COMB         2     2     2     12    16     21
     COMB         2     1     2     12    20     22
     COMB         2     1     2     13    13     19
     COMB         2     2     2     12    20     14
     COMB         2     2     2     11    17     14
     COMB         2     2     2     12    21     23
     COMB         2     2     1     12    16     30
	
     UVMAP        7     7     9     63    74     96
     APCLN cln   17    17    60    201   246    315
     APCLN res    6     6     4     40    39     48
     ASCAL       10    10    26    111   146    180
     MX map      11    10     8     66    80     98
     MX clean    36    34    89    398   486    595
     VTESS       25    25    20    155   179    222

     DDT comp    36          36    202   194    258

And for the medium DDT test

                 Convex     IBM          SUN
               C210  C220   320  4/370  4/60  4/60*
     UVSRT        7     6    11                 120
     UVSRT        7     7    10                 129
     UVDIF        4     4     3                  24
     UVDIF        4     4     9                  29
     CCMRG        6     6     3                  36
     SUBIM        2     3     3                  32
     SUBIM        3     3     2                  33
     COMB         3     2     5                  37
     COMB         3     3     4                  29
     COMB         2     3     5                  39
     COMB         3     3     4                  38
     COMB         3     3     5                  36
     COMB         2     3     5                  42
     COMB         3     3     5                  38
     COMB         3     3     5                  34
	
     UVMAP       14    14    26                 301
     APCLN cln   84    76   529                2467
     APCLN res   14    13    17                 175
     ASCAL       20    19    77                 511
     MX map      22    22    25                 263
     MX clean   124   113   611                3373
     VTESS       47    46    68                 659

These tables make it clear that the IBM IO system is also quite
fast, although the superiority of the Convex IO is hinted by
the numbers for UVSRT and COMB.  The cpu/real ratios were as
high as 0.97 for the medium ASCAL, APCLN, and MX(clean) on the
IBM and a very good 0.71 on the medium VTESS.  The highest
cpu/real ratios for the networked SUN 4/60 were about 0.80 for
the cleans (APCLN and MX) and ASCAL and as low as 0.16 for
the heavy IO task UVSRT.  The networked 4/370 had somewhat lower
cpu/real ratios than did the networked 4/60.  A light load on
the Convex cost about 22% additional real time to the 4/60 being
served.  The use of parallel cpus on the Convex actually costs a
small amount of cpu time for all tasks, with only small (<= 10%)
improvements in real time for the vector/parallel tasks.  Note
that the medium cleans achieved "cpu/real" ratios greater than 1.
These results suggest that the AIPS algorithms are optimised for
vector rather than parallel execution.


III. Source code modifications required for the IBM


     Source code which was not thought to be system dependent
was not modified during this test except to correct simple
errors in the AIPS code.  The worst of these were errors due
to the code overhaul's modification of file formats.  FILINI
and PRTAC do not correctly support the new accounting file
format (and FILINI may have other troubles as well).  The IBM
will compile character strings of length between 1 and 500
by default.  We invoked a compiler option ('-qcharlen=32767')
to avoid that limitation and the compiler option '-qextname'
to produce external names with trailing underscore characters
assumed in AIPS C routines.  We compiled all subroutines and
tasks, since we were told that the compiler is particularly
good at reporting problems.  Not surprisingly, it reported a
host of "errors" in $APLVMS routines COMPLOT, GRISUB, and
ZETASUBS and several local ATNF routines.  Tasks UVFND, CLCOR,
TABED, UNCAL, SETJY, and TABEX were found to have (illegal)
branches into IF-THEN-ELSE or DO-loop structures and GETJY
repeats a variable name in an EQUIVALENCE group.  None of
these are used by DDT and corrections have been submitted to
the Charlottesville AIPS group.

     The IBM RISC System 6000 AIX operating system is touted
as a Bell System V system with a nearly complete addition of
Berkeley functions.  To test this, we began by feeding the SUN
collection of Z routines to the compilers.  The problems
reported by the compilers were:

1. ZDCHIC.C   system variable NBBY not declared in <sys/param.h>.
              We replaced it with its local value (8).

2. ZCREA2.C   struct type "statfs" is not recognized.  We
              simply copied in the $APLUNIX version instead.

3. ZDATE.C    struct type "tm" not defined in <sys/time.h> on the
              IBM.  We replaced it with the $APLUNIX version which
              invokes <time.h>.

4. ZTIME.C    as ZDATE.C.

5. ZTXMA2.C   the Berkeley routine worked after struct type "direct"
              was renamed "dirent".

6. ZTKILL.C   the Berkeley routine worked after struct type "direct"
              was renamed "dirent".

7. ZACTV9.C   function "signal" returns a pointer to null on the
              IBM.  Changing the declaration of "onint" fixed the
              mismatch of pointer types.  The IBM expected a "fork"
              rather than a "vfork" so we used the $APLBELL version.

8. ZSPAWN.C   a local routine invoked "vfork".  We replaced it with
              a stubbed Fortran routine to save trouble.  A standard
              UNIX routine using "fork" may need to be created to
              support the ATNF added functions.

9. MAXPATH    ZDAOPN.C, ZTPOP2.C, ZLOCK.C, ZCREA2.C and others use
              a parameter named MAXPATH.  This occurs in an IBM
              system include and we should replace ours with a new
              spelling to eliminate a warning from the compiler.

10. ZTTYIO.C  or its friends had some trouble opening/closing units
              5 and 6.  This made annoying messages a few times
              only and we ignored them as minor and presumably
              correctable.

11. ZTACT2.C  the Berkeley routine compiled after struct type "direct"
              was renamed "dirent".  HOWEVER, unlike the Convex and
              SUN and other Unix systems, the IBM returns either
              no error or "errno == EPERM" (not owner) when a
              "kill(*pid,0)" is attempted on a Zombie process.  This
              causes AIPS to think a task is still busy and therefore
              to suspend itself indefinitely waiting on it.  This
              problem must be solved in a proper way before the IBM
              can be regarded as having passed the acceptance tests.
              We bypassed the problem, however, by issuing a "wait"
              in ZTACT2.C in order to run the DDT which always wants
              a wait anyway.


IV. Conclusions


     The IBM RISC System 6000 series appears to be a quite
functional choice for running AIPS.  The unit outperforms a
Convex C220 marginally on scalar code and does well (but not
THAT well) on intensive vector problems.  We did not test the
AIPS tape tasks as that poses a more complicated installation
problem.  Exabyte drives are available and we would be surprised
if there were any severe problem with AIPS tapes on the IBM.
We also did not test the display capabilities of the machine.
IBM supports the X Window System version 11, but does not use
the same toolkit as is used by SUN.  Thus, some work will be
required to provide full AIPS TV functionality on IBM RISC
computers.  The IBM is completely compatible in binary data
formats with the SUN and Convex and will therefore be useful
in networked environments.

     In addition to a model 320, IBM currently sells a model
530 with a 40 per cent faster clock rate and other architectural
differences which should enhance performance by more than simply
the clock speed.  We hope to test that model in the near future.
A still faster model, the 540, is due to be available soon.  It
differs from the 530 only in the clock speed, so its performance
will be predictable from the measurements on the 530.

    One problem with Zombie tasks must be solved before the
System 6000 can be fully used by AIPS.  Anyone purchasing such
an IBM for use with AIPS should make acceptance contingent on
tape IO functionality within AIPS and should also take into
account the present uncertainty regarding the TV display.

     Anyone using, or planning to use, a SUN to reduce data with
AIPS with all disks provided via NFS should consider that decision
carefully.  Proper DDT tests on a stand-alone SUN should be made
to delineate the magnitude of the degradation due to NFS.

     PLEASE NOTE that this description does not constitute any
sort of endorsement of these products by the National Radio
Astronomy Observatory, the Australia Telescope National Facility,
or the CSIRO.  The IBM RISC System 6000 series is not now, and
may never be, supported by the NRAO-released versions of AIPS.

