----------------------------------------------------------------------- ; Copyright (C) 1995 ; Associated Universities, Inc. Washington DC, USA. ; ; This program is free software; you can redistribute it and/or ; modify it under the terms of the GNU General Public License as ; published by the Free Software Foundation; either version 2 of ; the License, or (at your option) any later version. ; ; This program is distributed in the hope that it will be useful, ; but WITHOUT ANY WARRANTY; without even the implied warranty of ; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ; GNU General Public License for more details. ; ; You should have received a copy of the GNU General Public ; License along with this program; if not, write to the Free ; Software Foundation, Inc., 675 Massachusetts Ave, Cambridge, ; MA 02139, USA. ; ; Correspondence concerning AIPS should be addressed as follows: ; Internet email: aipsmail@nrao.edu. ; Postal address: AIPS Project Office ; National Radio Astronomy Observatory ; 520 Edgemont Road ; Charlottesville, VA 22903-2475 USA ----------------------------------------------------------------------- Installing AIPS on an IBM RISC System 6000 and Performance Results for Convex C220 and SUN Sparc Computers Eric W. Greisen, Mark Calabretta Australia Telescope National Facility 16 July 1990 I. Introduction On 18 and 19 June, 1990, we installed AIPS on an IBM RISC System 6000 computer located at their North Sydney office building. This turned out to be a remarkably straightforward operation, with a minimum of minor hitches, which we will describe below. The source code was ported via a QIC150 tape written by 'tar' from the ATNF (Australia Telescope National Facility) version of AIPS release 15APR90 maintained on CSIRO's Convex 220 in Marsfield, Australia. The DDT data for the small and medium tests were also written on a second QIC150 tape in binary form (via 'tar') from the Convex 220. In other words, the data were transported as an AIPS user's binary file system from one architecture to another. All parameters of the files, so far as we could tell, remained intact in this process. Therefore, the Convex and the IBM systems use identical binary formats for double- and single-precision floating numbers, integers, and characters. This insures compatability with SUN workstations as well, since they are used at ATNF with the Convex with complete compatability. The DDT data were computed for the 15OCT87 release of AIPS and were not totally compatible with the current programs. This will be reflected in somewhat larger differences between the master and test images both when computed on the Convex and on the IBM. The system was compiled and linked on a non-standard RISC System model 520, but then moved to a standard model 320. This desk-top computer had 24 Mbytes memory, 2 320-Mbyte internal disk drives, a QIC150 tape drive, and a large color monitor. The operating system was release OS: 9021A2. At the suggestion of IBM personnel, all Fortran and C routines were compiled with full optimization. The two AIPS data areas were on the same disk drive. For comparison, DDT was also run on the Convex and SUN computers owned by ATNF in Marsfield. The Convex is a model C220 with 128 Mbytes of memory running under OS 8.0. The Convex is an older version and does not have the ESP (enhanced scalar processor) unit. The disks used were 4-way striped with file/fragment sizes of 64K/8K. DDT was run on an empty Convex with one and with two processors enabled. A limited set of routines (including most Q routines) are compiled vector/parallel on the Convex, but the rest were compiled with optimization level "O0" ("basic block scalar"). One of the SUN computers used was a Sparc-1 (4/60) running SUNOS 4.0.3c on 16 Mbytes of memory. The AIPS disks were provided via NFS by the Convex. Because of the speed of the Convex, this is quite usable, but the results for the SUN quoted below are degraded by this. As a test, DDT was run both with an empty Convex and with a lightly-loaded Convex. The other SUN tested was a Sparc model 4/370 equipped with 24 Mbytes of memory and a TAAC board and also running SUNOS 4.0.3c. On the 4/370, only the small DDT test was used. Both SUNs were run with SunView windows and a single cpu meter display. The code is compiled with no optimization on SUN Sparcs. II. DDT Results DDT compares computed images with master images and reports the peak and rms difference in units of bits with the respect to the peak of the master image. The results for the IBM 6000/320, Convex C220, and SUN Sparc-1 were: Test Peak Rms Convex IBM SUNs Convex IBM SUNs Small: UVMAP 13.0 13.1 13.3 18.8 18.2 18.8 UVBEAM 10.1 10.2 9.9 16.2 15.8 16.2 APCLN 18.8 13.6 14.1 24.1 16.4 20.7 APRES 16.9 17.0 17.0 22.2 22.4 22.4 MXMAP 12.6 12.3 12.9 18.5 18.1 18.7 MXBEAM 13.9 11.1 13.7 19.3 17.7 19.3 MXCLN 14.8 14.3 14.3 17.6 17.6 17.6 VTESS 4.1 4.1 4.1 10.9 10.9 10.9 Medium: UVMAP 13.5 12.9 13.5 17.8 18.0 18.1 UVBEAM 14.0 13.0 13.9 17.8 18.3 18.5 APCLN 17.3 12.0 11.8 23.7 14.8 14.7 APRES 15.4 15.4 15.4 21.2 21.4 21.3 MXMAP 13.0 13.3 12.6 17.8 18.0 18.1 MXBEAM 14.3 13.0 13.9 17.8 18.4 18.5 MXCLN 10.3 10.0 10.3 14.2 14.2 14.2 VTESS 3.3 3.3 3.3 10.8 10.8 10.8 which indicate that all four machines are computing basically correct results. The low numbers of bits in VTESS are due to a change in algorithm since the master data were computed. The surprisingly high number of bits for APCLN on the Convex is not explained, but there are software differences between the scalar routines used on the IBM and SUN and the vector routines used on the Convex. The tables below summarize the performance results for the computers. The Convex numbers are labeled C210 for one processor and C220 for two processor configurations. The later reports 120 seconds of cpu time for every minute of real time. This makes the cpu times roughly those of a C210, but the real times are still those of the C220. The SUN results are labeled 4/60 and 4/370 for runs with an empty Convex file server and 4/60* for a lightly-loaded Convex. It is clear that the results for the SUNs may be very misleading. All executable and data files for the SUNs lie on the Convex and are served via NFS. The swap areas plus "/" and "/usr" areas are on local SUN disk. The real times for smaller tasks are heavily affected by NFS. The heavily computational tasks such as APCLN, MX (clean), and ASCAL did use 100 per cent of the SUN cpus for lengthy periods. Note that NFS also competes with the DDT tasks themselves for the use of the SUN cpu. The comparison of the 4/60 and 4/60* columns show just the added cost to the SUN cpu time for light loading on the file serving computer. This cost alone is significant. Therefore the numbers below represent those for a SUN in a commonly used networking configuration, not those for a stand-alone machine. The reported cpu times in seconds for the small test were Convex IBM SUN C210 C220 320 4/370 4/60 4/60* UVSRT 3.28 3.34 1.93 6.81 8.88 9.22 UVSRT 3.42 3.48 2.05 7.04 8.88 9.37 UVDIF 1.39 1.41 1.24 4.33 7.14 7.11 UVDIF 1.38 1.42 1.25 4.33 7.23 7.15 CCMRG 1.97 2.03 0.91 2.48 3.04 3.23 SUBIM 1.21 1.23 0.76 1.75 2.14 2.11 SUBIM 1.21 1.21 0.73 1.69 2.06 2.23 COMB 1.26 1.26 0.75 1.95 2.32 2.41 COMB 1.26 1.27 0.79 1.99 2.41 2.46 COMB 1.29 1.31 0.85 1.94 2.53 2.47 COMB 1.26 1.27 0.88 1.95 2.42 2.65 COMB 1.29 1.32 0.83 1.96 2.39 2.57 COMB 1.27 1.29 0.87 1.97 2.42 2.49 COMB 1.21 1.24 0.80 1.88 2.36 2.63 COMB 1.33 1.36 0.85 2.11 2.51 2.79 UVMAP 5.04 5.14 4.97 21.72 27.50 28.61 APCLN cln 14.25 14.52 57.97 169.23 209.22 218.31 APCLN res 3.36 3.47 3.05 14.33 17.79 18.81 ASCAL 8.05 8.26 24.54 82.29 112.05 116.65 MX map 7.42 7.65 5.47 24.17 30.07 31.28 MX clean 30.82 31.94 82.54 292.34 370.10 383.55 VTESS 16.98 17.61 15.54 77.93 96.51 99.64 DDT comp 21 14 32 35 41 For the medium test the cpu times were Convex IBM SUN C210 C220 320 4/370 4/60 4/60* UVSRT 4.55 4.63 2.88 14.17 UVSRT 4.76 4.87 2.99 14.42 UVDIF 2.13 2.16 1.99 11.63 UVDIF 2.25 2.29 2.13 12.11 CCMRG 3.10 3.19 1.60 6.34 SUBIM 1.93 1.94 1.97 5.22 SUBIM 1.91 1.96 1.98 5.08 COMB 2.02 2.05 1.90 5.16 COMB 2.02 2.06 1.94 5.14 COMB 2.07 2.10 1.94 5.45 COMB 2.03 2.06 1.99 5.51 COMB 2.06 2.10 2.04 5.20 COMB 2.04 2.09 2.00 5.26 COMB 1.99 2.02 1.96 5.12 COMB 2.10 2.15 1.98 5.35 UVMAP 10.87 11.08 14.22 90.30 APCLN cln 77.14 78.36 517.12 1910.02 APCLN res 10.56 10.84 11.91 80.71 ASCAL 17.70 17.92 74.83 376.64 MX map 17.42 17.93 17.26 107.47 MX clean 112.96 115.64 586.12 2371.67 VTESS 39.18 40.00 48.26 319.66 where the first group lists scalar programs and the second lists vector/parallel programs. The "DDT comp" line is the approximate times required by the AIPS program itself to compile the POPS source for the test contained in DDTLOAD.001. The IBM outperforms the Convex on purely scalar codes by a small amount. On codes dominated by IO the IBM does rather better than the Convex in terms of cpu times. On codes involving large vectorizable computations, however, the IBM 320 can be up to 6 times slower than the Convex. VTESS, which is only moderately vectorized, shows comparable times for both machines. The networked SUN 4/60 is 5 or 6 times slower than the IBM on the heavily computational jobs and 2.5 to 5 times slower on the small jobs. The SUN 4/370 is about 25% faster than the 4/60. The cost of lightly loading the server cpu is about 3% added cpu to the SUN 4/60 computer being served. The real times, each only good to 1 second accuracy at best are listed below. REMEMBER THAT THE SUN IS TOTALLY DEPENDENT ON NFS for AIPS which very obviously affected the total real and cpu times. The reported real times in seconds for the small DDT were Convex IBM SUN C210 C220 320 4/370 4/60 4/60* UVSRT 6 6 7 52 57 86 UVSRT 5 6 7 50 57 66 UVDIF 4 4 7 11 15 21 UVDIF 3 3 3 10 15 19 CCMRG 4 4 2 20 19 22 SUBIM 2 2 1 11 11 13 SUBIM 1 1 1 11 20 22 COMB 2 2 2 12 12 24 COMB 2 2 2 12 16 21 COMB 2 1 2 12 20 22 COMB 2 1 2 13 13 19 COMB 2 2 2 12 20 14 COMB 2 2 2 11 17 14 COMB 2 2 2 12 21 23 COMB 2 2 1 12 16 30 UVMAP 7 7 9 63 74 96 APCLN cln 17 17 60 201 246 315 APCLN res 6 6 4 40 39 48 ASCAL 10 10 26 111 146 180 MX map 11 10 8 66 80 98 MX clean 36 34 89 398 486 595 VTESS 25 25 20 155 179 222 DDT comp 36 36 202 194 258 And for the medium DDT test Convex IBM SUN C210 C220 320 4/370 4/60 4/60* UVSRT 7 6 11 120 UVSRT 7 7 10 129 UVDIF 4 4 3 24 UVDIF 4 4 9 29 CCMRG 6 6 3 36 SUBIM 2 3 3 32 SUBIM 3 3 2 33 COMB 3 2 5 37 COMB 3 3 4 29 COMB 2 3 5 39 COMB 3 3 4 38 COMB 3 3 5 36 COMB 2 3 5 42 COMB 3 3 5 38 COMB 3 3 5 34 UVMAP 14 14 26 301 APCLN cln 84 76 529 2467 APCLN res 14 13 17 175 ASCAL 20 19 77 511 MX map 22 22 25 263 MX clean 124 113 611 3373 VTESS 47 46 68 659 These tables make it clear that the IBM IO system is also quite fast, although the superiority of the Convex IO is hinted by the numbers for UVSRT and COMB. The cpu/real ratios were as high as 0.97 for the medium ASCAL, APCLN, and MX(clean) on the IBM and a very good 0.71 on the medium VTESS. The highest cpu/real ratios for the networked SUN 4/60 were about 0.80 for the cleans (APCLN and MX) and ASCAL and as low as 0.16 for the heavy IO task UVSRT. The networked 4/370 had somewhat lower cpu/real ratios than did the networked 4/60. A light load on the Convex cost about 22% additional real time to the 4/60 being served. The use of parallel cpus on the Convex actually costs a small amount of cpu time for all tasks, with only small (<= 10%) improvements in real time for the vector/parallel tasks. Note that the medium cleans achieved "cpu/real" ratios greater than 1. These results suggest that the AIPS algorithms are optimised for vector rather than parallel execution. III. Source code modifications required for the IBM Source code which was not thought to be system dependent was not modified during this test except to correct simple errors in the AIPS code. The worst of these were errors due to the code overhaul's modification of file formats. FILINI and PRTAC do not correctly support the new accounting file format (and FILINI may have other troubles as well). The IBM will compile character strings of length between 1 and 500 by default. We invoked a compiler option ('-qcharlen=32767') to avoid that limitation and the compiler option '-qextname' to produce external names with trailing underscore characters assumed in AIPS C routines. We compiled all subroutines and tasks, since we were told that the compiler is particularly good at reporting problems. Not surprisingly, it reported a host of "errors" in $APLVMS routines COMPLOT, GRISUB, and ZETASUBS and several local ATNF routines. Tasks UVFND, CLCOR, TABED, UNCAL, SETJY, and TABEX were found to have (illegal) branches into IF-THEN-ELSE or DO-loop structures and GETJY repeats a variable name in an EQUIVALENCE group. None of these are used by DDT and corrections have been submitted to the Charlottesville AIPS group. The IBM RISC System 6000 AIX operating system is touted as a Bell System V system with a nearly complete addition of Berkeley functions. To test this, we began by feeding the SUN collection of Z routines to the compilers. The problems reported by the compilers were: 1. ZDCHIC.C system variable NBBY not declared in . We replaced it with its local value (8). 2. ZCREA2.C struct type "statfs" is not recognized. We simply copied in the $APLUNIX version instead. 3. ZDATE.C struct type "tm" not defined in on the IBM. We replaced it with the $APLUNIX version which invokes . 4. ZTIME.C as ZDATE.C. 5. ZTXMA2.C the Berkeley routine worked after struct type "direct" was renamed "dirent". 6. ZTKILL.C the Berkeley routine worked after struct type "direct" was renamed "dirent". 7. ZACTV9.C function "signal" returns a pointer to null on the IBM. Changing the declaration of "onint" fixed the mismatch of pointer types. The IBM expected a "fork" rather than a "vfork" so we used the $APLBELL version. 8. ZSPAWN.C a local routine invoked "vfork". We replaced it with a stubbed Fortran routine to save trouble. A standard UNIX routine using "fork" may need to be created to support the ATNF added functions. 9. MAXPATH ZDAOPN.C, ZTPOP2.C, ZLOCK.C, ZCREA2.C and others use a parameter named MAXPATH. This occurs in an IBM system include and we should replace ours with a new spelling to eliminate a warning from the compiler. 10. ZTTYIO.C or its friends had some trouble opening/closing units 5 and 6. This made annoying messages a few times only and we ignored them as minor and presumably correctable. 11. ZTACT2.C the Berkeley routine compiled after struct type "direct" was renamed "dirent". HOWEVER, unlike the Convex and SUN and other Unix systems, the IBM returns either no error or "errno == EPERM" (not owner) when a "kill(*pid,0)" is attempted on a Zombie process. This causes AIPS to think a task is still busy and therefore to suspend itself indefinitely waiting on it. This problem must be solved in a proper way before the IBM can be regarded as having passed the acceptance tests. We bypassed the problem, however, by issuing a "wait" in ZTACT2.C in order to run the DDT which always wants a wait anyway. IV. Conclusions The IBM RISC System 6000 series appears to be a quite functional choice for running AIPS. The unit outperforms a Convex C220 marginally on scalar code and does well (but not THAT well) on intensive vector problems. We did not test the AIPS tape tasks as that poses a more complicated installation problem. Exabyte drives are available and we would be surprised if there were any severe problem with AIPS tapes on the IBM. We also did not test the display capabilities of the machine. IBM supports the X Window System version 11, but does not use the same toolkit as is used by SUN. Thus, some work will be required to provide full AIPS TV functionality on IBM RISC computers. The IBM is completely compatible in binary data formats with the SUN and Convex and will therefore be useful in networked environments. In addition to a model 320, IBM currently sells a model 530 with a 40 per cent faster clock rate and other architectural differences which should enhance performance by more than simply the clock speed. We hope to test that model in the near future. A still faster model, the 540, is due to be available soon. It differs from the 530 only in the clock speed, so its performance will be predictable from the measurements on the 530. One problem with Zombie tasks must be solved before the System 6000 can be fully used by AIPS. Anyone purchasing such an IBM for use with AIPS should make acceptance contingent on tape IO functionality within AIPS and should also take into account the present uncertainty regarding the TV display. Anyone using, or planning to use, a SUN to reduce data with AIPS with all disks provided via NFS should consider that decision carefully. Proper DDT tests on a stand-alone SUN should be made to delineate the magnitude of the degradation due to NFS. PLEASE NOTE that this description does not constitute any sort of endorsement of these products by the National Radio Astronomy Observatory, the Australia Telescope National Facility, or the CSIRO. The IBM RISC System 6000 series is not now, and may never be, supported by the NRAO-released versions of AIPS.