%-----------------------------------------------------------------------
%; Copyright (C) 1995
%; Associated Universities, Inc. Washington DC, USA.
%;
%; This program is free software; you can redistribute it and/or
%; modify it under the terms of the GNU General Public License as
%; published by the Free Software Foundation; either version 2 of
%; the License, or (at your option) any later version.
%;
%; This program is distributed in the hope that it will be useful,
%; but WITHOUT ANY WARRANTY; without even the implied warranty of
%; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
%; GNU General Public License for more details.
%;
%; You should have received a copy of the GNU General Public
%; License along with this program; if not, write to the Free
%; Software Foundation, Inc., 675 Massachusetts Ave, Cambridge,
%; MA 02139, USA.
%;
%; Correspondence concerning AIPS should be addressed as follows:
%; Internet email: aipsmail@nrao.edu.
%; Postal address: AIPS Project Office
%; National Radio Astronomy Observatory
%; 520 Edgemont Road
%; Charlottesville, VA 22903-2475 USA
%-----------------------------------------------------------------------
% AIPS memo 71 -- Pat Murphy, 910408.
% Revised 910411 -- Bob Burns' comments
\input AMEMO.MAC % Aips memo macros in DOCTXT
\font\eightrm=cmr8 % for footnotes
\font\eightit=cmti8 % ditto
\def\author{Patrick P.~Murphy}
\def\revisors{}
\def\mdate{April 8, 1991}
\def\ftitle{A Comparison of DDT results}
\def\ftitlemore{IBM RS/6000 and Convex C--1}
\def\stitle{DDT Results: IBM RS/6000 and Convex C--1}
\def\memnumb{7} \def\memnumc{1} % memo #71
\memobegin
\subtit{ INTRODUCTION}
This memo documents the results of running the \AIPS\ benchmark suite
of programs --- the DDT\footnote*{\eightrm The
so-called {\eightit dirty dozen tests\/}.} tests --- on the IBM RS/6000
series of workstations and servers, and on the Convex C--1. The
systems tested were:\medskip
\item{$\bullet$} IBM Model 530: 48 Megabytes, 200--500 Megabytes free disk space
\item{$\bullet$} IBM Model 540: 64 Megabytes, 70--200 Megabytes free disk space
\item{$\bullet$} Convex C-1: 32 Megabytes, 400 Megabytes free disk space,
integral vector processor
\medskip
\noindent In addition, an IBM RS/6000 model 320 with 24 Megabytes of
memory and about 400 Megabytes of available disk space was tested for
a brief period during the course of the tests described herein.
Unfortunately, by the time the specific test had stabilized, this
system was no longer available. However, it is possible to use some
of the early tests to infer the approximate performance of this
system.
All IBM systems were running version 3.1 of the {\it AIX} operating
system. The model 320 was a loaner workstation from IBM in Richmond
(Thanks to George Latimer), the 530 is the Charlottesville {\tt lemur}
workstation, and the 540 is the {\tt polaris} server. The Convex
system is {\tt nrao1}.
Close to 30 separate tests were done on these systems, ranging
from the small DDT suite, through subsets of the large, to the
full-blown large test. Two main versions of \AIPS\ were used: {\tt
15JAN91} and {\tt 15JUL91}, although at one point a {\tt 15APR91}
version of {\tt MX.EXE} was tried for comparative purposes.
The main purpose of the many tests was to eliminate spurious
effects from the results. The effects in question ranged from
incorrect values of {\tt EDGSKP} and the AP size to competition for
CPU cycles from runaway emacs ``zombie'' processes.
It should be stressed that the version of \AIPS\ used for most of
these tests ({\tt 15JUL91}) was a {\it frozen snapshot\/} of the
continually changing \AIPS\ ``TST'' version that exists on {\tt nrao1}
and the many machines throughout NRAO served by the \AIPS\ midnight
jobs. This snapshot was taken on about February 20. The intent here
of course is to prevent day-to-day changes made by \AIPS\ programmers
from affecting the results. There is nothing special about either the
date or the characteristics of the \AIPS\ code resulting from this
snapshot. Among the IBM systems, the exact same binaries were used in
running the program. These were initially compiled on the
RS/6000--320 and moved to the other systems as needed, although the
final figures presented here resulted from the identical code rebuilt
on the RS/6000--540 and moved to the 530 later. It is assumed that a
rebuild of the code on each RS/6000 system would not significantly
affect the results.
In the final results, the compilation characteristics of the \AIPS\ %
code was either identical or as near as possible to identical among
the different systems tested. The system was compiled without
debugging information and with no optimization, except for the {\tt
QPSAP} routines which were as fully optimized as the native compilers
would allow. The AIX fortran compiler only has one optimization
setting, whereas the Convex fortran has several. The value used on
the Convex was {\tt -O2}.
\subtit{ ACCURACY RESULTS}
For the purposes of comparison, the medium DDT was run on all systems
with the same size pseudo-array processor (1280 kwords). All the
relevant input parameters were the same for each run and the {\tt
EDGSKP} parameter was set to 8. The accuracy results were identical
for all three IBM tests. The conclusion one can draw from these
figures is that there are few significant differences in accuracy
between the Convex and the IBM RS/6000 series. If anything, the IBM
seems to give better overall results than the Convex, although the
difference is slight. The results are presented in Table 1.
\bigskip
\vbox{\settabs 7 \columns
\+& {\bf Table 1}. Accuracy Results, 15JUL91 MEDIUM test\cr
\+& on IBM RS/6000 and Convex C--1 systems.\cr
\+& \cr
\+& & \hfill RS & 6000 & \hfill Convex & C--1 & \cr
\+& Task & \hfill Peak Bits & \hfill Rms Bits & \hfill Peak Bits & \hfill Rms Bits & \cr
\+&\hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \cr
\+& UVMAP & \hfill 14.97 & \hfill 18.25 & \hfill 14.63 & \hfill 17.90 &\cr
\+& UVBEAM & \hfill 15.15 & \hfill 18.68 & \hfill 14.52 & \hfill 17.89 &\cr
\+& APCLN & \hfill 11.78 & \hfill 14.73 & \hfill 11.92 & \hfill 14.61 &\cr
\+& APRES & \hfill 15.27 & \hfill 21.26 & \hfill 15.37 & \hfill 21.20 &\cr
\+& MXMAP & \hfill 14.85 & \hfill 18.25 & \hfill 14.36 & \hfill 17.91 &\cr
\+& MXBEAM & \hfill 15.28 & \hfill 18.68 & \hfill 14.68 & \hfill 17.90 &\cr
\+& MXCLN & \hfill 9.99 & \hfill 14.19 & \hfill 10.29 & \hfill 14.18 &\cr
\+& VTESS & \hfill 21.31 & \hfill 29.34 & \hfill 23.12 & \hfill 30.85 &\cr
\+&\hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill &\cr}
\medskip
In the course of the trial runs, it was determined that reducing the
AP size to 256 kwords resulted in slightly poorer accuracy results for
both {\tt MXMAP} (14.25/18.24) and {\tt MXBEAM} (14.83/18.67), on the
RS/6000 systems.
Additional accuracy results were also obtained for all DDT test sizes
with the 15JAN91 version of \AIPS. This was performed on the model
540 with {\tt EDGSKP} set to 4, 8, or 16 as appropriate, and a 256
kword AP.\bigskip
\vbox{\settabs 7 \columns
\+{\bf Table 2}. Accuracy results, 15JAN91 tests, RS6000/540. \cr
\+ \cr
\+ Task & & Large & & Medium & & Small &\cr
\+ & \hfill Peak & \hfill Rms & \hfill Peak & \hfill Rms & \hfill Peak & \hfill Rms &\cr
\+\hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill &\cr
\+ UVMAP & \hfill 12.33 & \hfill 17.29 & \hfill 14.97 & \hfill 18.25 & \hfill 13.49 & \hfill 19.33 &\cr
\+ UVBEAM & \hfill 15.12 & \hfill 17.75 & \hfill 15.15 & \hfill 18.68 & \hfill 10.83 & \hfill 16.71 &\cr
\+ APCLN & \hfill 10.95 & \hfill 16.79 & \hfill 11.78 & \hfill 14.73 & \hfill 13.74 & \hfill 16.38 &\cr
\+ APRES & \hfill 14.11 & \hfill 20.62 & \hfill 15.27 & \hfill 21.26 & \hfill 17.03 & \hfill 22.43 &\cr
\+ MXMAP & \hfill 13.88 & \hfill 18.92 & \hfill 14.25 & \hfill 18.24 & \hfill 13.04 & \hfill 18.97 &\cr
\+ MXBEAM & \hfill 15.10 & \hfill 19.59 & \hfill 14.83 & \hfill 18.67 & \hfill 14.61 & \hfill 19.72 &\cr
\+ MXCLN & \hfill 9.47 & \hfill 15.86 & \hfill 9.99 & \hfill 14.19 & \hfill 13.80 & \hfill 17.53 &\cr
\+ VTESS & \hfill 18.71 & \hfill 27.37 & \hfill 21.31 & \hfill 29.34 & \hfill 21.74 & \hfill 28.84 &\cr
\+\hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \hrulefill & \cr}\bigskip
\medskip
\subtit{ TIMING RESULTS}
\medskip
Table 3 below summarizes the results of the timing tests on the
Convex, the IBM rs540 and rs530. The numbers are derived from the
printed values reported by \AIPS\ on termination of the various tasks.
The numbers for tasks {\tt UVSRT} and {\tt UVDIF} represent the {\it
sum\/} of the two times those tasks are run in the DDT suite, and
those for {\tt COMB} represent the sum of the eight separate runs of
that task within the DDT procedure.
Note that the scratch disk used in all cases was the same as that used
for the data --- the IBM internal disk.
\bigskip
%%%%%%%%%%%%%%%%%%% Here's the original, wrong numbers. Does show
%%%%%%%%%%%%%%%%%%% scaling of rs320, however. %%%%%%%%%%%%%%%%%%%%%
%\vbox{\settabs 9 \columns
%\+ {\bf Table 3}. Times for MEDIUM DDT tests, \AIPS\ 15JUL91. \cr
%\+ \cr
%\hrule \vskip 1.5pt \hrule \vskip 2pt
%\+Task & \hfill RS&6000/320 & \hfill RS&6000/530 & \hfill RS&6000/540 & \hfill Con&vex C--1 &\cr
%\+ & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall &\cr
%\vskip 3pt \hrule \vskip 3pt
%\+UVSRT & \hfill 12.00 & \hfill 26 & \hfill 10.24 & \hfill 35 & \hfill 7.63 & \hfill 18 & \hfill 36.72 & \hfill 50 &\cr
%\+UVDIF & \hfill 7.00 & \hfill 8 & \hfill 5.51 & \hfill 12 & \hfill 4.59 & \hfill 10 & \hfill 19.04 & \hfill 33 &\cr
%\+CCMRG & \hfill 3.01 & \hfill 9 & \hfill 2.90 & \hfill 14 & \hfill 1.97 & \hfill 5 & \hfill 10.80 & \hfill 18 &\cr
%\+SUBIM & \hfill 2.10 & \hfill 6 & \hfill 1.91 & \hfill 8 & \hfill 1.37 & \hfill 4 & \hfill 6.50 & \hfill 8 &\cr
%\+COMB & \hfill 18.29 & \hfill 47 & \hfill 15.16 & \hfill 80 & \hfill 11.58 & \hfill 39 & \hfill 60.58 & \hfill 85 &\cr
%\vskip 6pt
%\+UVMAP & \hfill 12.80 & \hfill 27 & \hfill 10.46 & \hfill 31 & \hfill 8.05 & \hfill 21 & \hfill 31.16 & \hfill 40 &\cr
%\+APCLN & \hfill 545.95 & \hfill 563 & \hfill 428.60 & \hfill 444 & \hfill 355.92 & \hfill 364 & \hfill 200.95 & \hfill 240 &\cr
%\+APRES & \hfill 10.84 & \hfill 17 & \hfill 8.62 & \hfill 16 & \hfill 6.76 & \hfill 11 & \hfill 25.88 & \hfill 34 &\cr
%\+ASCAL & \hfill 166.54 & \hfill 173 & \hfill 124.79 & \hfill 135 & \hfill 103.32 & \hfill 107 & \hfill 130.70 & \hfill 152 &\cr
%\+MXMAP & \hfill 20.24 & \hfill 35 & \hfill 16.24 & \hfill 41 & \hfill 12.93 & \hfill 25 & \hfill 46.48 & \hfill 60 &\cr
%\+MXCLN & \hfill 618.15 & \hfill 661 & \hfill 481.89 & \hfill 513 & \hfill 402.42 & \hfill 430 & \hfill 294.04 & \hfill 349 &\cr
%\+VTESS & \hfill 46.35 & \hfill 65 & \hfill 36.87 & \hfill 65 & \hfill 29.76 & \hfill 41 & \hfill 108.14 & \hfill 135 &\cr
%\vskip 6pt \hrule \vskip 3pt
%\+Total:& \hfill 1463.27 &\hfill 1637 & \hfill 1143.19 & \hfill 1394 & \hfill 946.30 & \hfill 1075 & \hfill 970.99 & \hfill 1204 &\cr
%\+Ratio:& \hfill 1.00 & \hfill 1.00 & \hfill 0.78 & \hfill 0.85 & \hfill 0.65 & \hfill 0.66 & \hfill 0.66 & \hfill 0.73 &\cr
%\vskip 3pt
%\+Speedup:&\hfill 1.00 & \hfill 1.00 & \hfill 1.28 & \hfill 1.17 & \hfill 1.55 & \hfill 1.52 & \hfill 1.51 & \hfill 1.36 &\cr
%\+-wrt C1:&\hfill 0.66 & \hfill 0.73 & \hfill 0.85 & \hfill 0.86 & \hfill 1.03 & \hfill 1.12 & \hfill 1.00 & \hfill 1.00 &\cr
%}
%%%%%%%%%%%%%%%%%%%%%%%% Revised table, after binaries rebuilt %%%%%%%%%%%%%%%%%%%%%%%%%%%
\vbox{\settabs 7 \columns
\+ {\bf Table 3}. Times for MEDIUM DDT tests, \AIPS\ 15JUL91. \cr
\+ \cr
\hrule \vskip 1.5pt \hrule \vskip 2pt
\+Task & \hfill RS&6000/530 &\hfill RS&6000/540 & \hfill Con&vex C--1 &\cr
\+ & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall &\cr
\vskip 3pt \hrule \vskip 3pt
\+UVSRT & \hfill 9.24 & \hfill 26 & \hfill 7.48 & \hfill 17 & \hfill 36.72 & \hfill 50 &\cr
\+UVDIF & \hfill 5.51 & \hfill 24 & \hfill 4.57 & \hfill 9 & \hfill 19.04 & \hfill 33 &\cr
\+CCMRG & \hfill 2.43 & \hfill 8 & \hfill 2.03 & \hfill 5 & \hfill 10.80 & \hfill 18 &\cr
\+SUBIM & \hfill 1.74 & \hfill 5 & \hfill 1.35 & \hfill 4 & \hfill 6.50 & \hfill 8 &\cr
\+COMB & \hfill 13.76 & \hfill 62 & \hfill 11.54 & \hfill 31 & \hfill 60.58 & \hfill 85 &\cr
\vskip 6pt
\+UVMAP & \hfill 10.03 & \hfill 26 & \hfill 8.17 & \hfill 20 & \hfill 31.16 & \hfill 40 &\cr
\+APCLN & \hfill 230.68 & \hfill 242 & \hfill 192.56 & \hfill 203 & \hfill 200.95 & \hfill 240 &\cr
\+APRES & \hfill 8.39 & \hfill 14 & \hfill 6.82 & \hfill 11 & \hfill 25.88 & \hfill 34 &\cr
\+ASCAL & \hfill 123.85 & \hfill 129 & \hfill 103.43 & \hfill 108 & \hfill 130.70 & \hfill 152 &\cr
\+MXMAP & \hfill 15.95 & \hfill 31 & \hfill 13.22 & \hfill 25 & \hfill 46.48 & \hfill 60 &\cr
\+MXCLN & \hfill 284.95 & \hfill 312 & \hfill 237.94 & \hfill 260 & \hfill 294.04 & \hfill 349 &\cr
\+VTESS & \hfill 36.64 & \hfill 54 & \hfill 30.26 & \hfill 42 & \hfill 108.14 & \hfill 135 &\cr
\vskip 6pt \hrule \vskip 3pt
\+Total:& \hfill 743.17 & \hfill 933 & \hfill 619.37 & \hfill 735 & \hfill 970.99 & \hfill 1204 &\cr
\vskip 3pt
\+Units of C1's:
& \hfill 1.31 & \hfill 1.29 & \hfill 1.57 & \hfill 1.64 & \hfill 1.00 & \hfill 1.00 &\cr
}\medskip
From earlier results, one can arrive at a rough figure for the
relative performance of the IBM RS/6000 model 320. This system was
able to run the {\tt MXCLN} phase of the DDT in 368.12 seconds. Using
this result to scale the model 320 CPU time relative to the other IBM
RS/6000 models, we can infer a performance factor of roughly 1.01
relative to a Convex C-1 over all the tests in CPU time.
The figures above clearly show that the IBM RS/6000 systems are
considerably faster than the Convex at the typically scalar tasks, but
the Convex of course excels in the tasks such as {\tt APCLN} and {\tt
MXCLN} that vectorize well on its architecture. Despite this, the
Convex can only beat the RS/6000 model 530 on one task ({\tt APCLN})
in CPU usage, and even there the wallclock time is almost equal.
\bigskip
\subtit{ OPTIMIZATION TIMING}
While running the {\tt 15JAN91} version of \AIPS\ on the RS6000/540
system, various levels of optimization were used while running the
large DDT test several times. The levels were:\medskip
\item{$\bullet$} No optimization at all, everything compiled with debug on
\item{$\bullet$} Only the Q routines optimized
\item{$\bullet$} Everything optimized
\medskip
\noindent The only other difference was that {\tt EDGSKP} was
accidentally set to 4 instead of 16 for the non-optimized (first)
case. It is very unlikely that this error would affect the numbers
presented below to any large extent. The conclusions would certainly
not be affected at all. The AP size used for these tests was 256 kwords.
\medskip
\vbox{\settabs 7 \columns
\+{\bf Table 4}. Times for Large DDT, 15JAN91, RS6000/540. \cr
\+ \cr
\+ Task & \hfill No Opti& mization & \hfill Q opt.& only & \hfill Everyth& ing optimized &\cr
\+ & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall & \hfill CPU & \hfill Wall &\cr
\vskip 6pt \hrule \vskip 3pt
\+UVSRT & \hfill 46.12 & \hfill 80 & \hfill 45.38 & \hfill 70 & \hfill 19.27 & \hfill 76 &\cr
\+UVDIF & \hfill 25.81 & \hfill 37 & \hfill 22.81 & \hfill 36 & \hfill 22.76 & \hfill 23 & \cr
\+CCMRG & \hfill 5.50 & \hfill 9 & \hfill 5.34 & \hfill 9 & \hfill 5.37 & \hfill 9 & \cr
\+SUBIM & \hfill 4.53 & \hfill 9 & \hfill 2.73 & \hfill 9 & \hfill 2.80 & \hfill 9 & \cr
\+COMB & \hfill 37.35 & \hfill 113 & \hfill 23.66 & \hfill 113 & \hfill 23.37 & \hfill 114 & \cr
\vskip 6pt
\+UVMAP & \hfill 136.82 & \hfill 160 & \hfill 35.50 & \hfill 65 & \hfill 35.54 & \hfill 66 & \cr
\+APCLN & \hfill 2337.29 & \hfill 2361 & \hfill 1121.75 & \hfill 1145 & \hfill 1121.86 & \hfill 1145 & \cr
\+APRES & \hfill 144.24 & \hfill 154 & \hfill 32.50 & \hfill 43 & \hfill 32.51 & \hfill 43 & \cr
\+ASCAL & \hfill 3816.36 & \hfill 3849 & \hfill 1949.32 & \hfill 1965 & \hfill 1949.07 & \hfill 1965 & \cr
\+MXMAP & \hfill 202.52 & \hfill 238 & \hfill 53.05 & \hfill 81 & \hfill 52.99 & \hfill 76 & \cr
\+MXCLN & \hfill 3061.16 & \hfill 3143 & \hfill 1245.92 & \hfill 1301 & \hfill 1247.44 & \hfill 1307 & \cr
\+VTESS & \hfill 651.97 & \hfill 696 & \hfill 172.31 & \hfill 216 & \hfill 172.19 & \hfill 215 & \cr
\vskip 6pt \hrule \vskip 3pt
\+Total:&\hfill 10469.67 &\hfill 10849 & \hfill 4710.27 & \hfill 5053 & \hfill 4685.17 & \hfill 5048 & \cr
\+Ratio:& \hfill 1.00 & \hfill 1.00 & \hfill 0.45 & \hfill 0.47 & \hfill 0.45 & \hfill 0.47 & \cr
\+Speedup:&\hfill 1.00 & \hfill 1.00 & \hfill 2.22 & \hfill 2.15 & \hfill 2.23 & \hfill 2.15 & \cr}\medskip
\bigskip
\noindent As with the previous tests, the results of the two runs
of UVSRT, two of UVDIF, and eight< of COMB are summed together.
The conclusions are obvious. Optimizing the key set of ``Q'' routines
makes a major difference to elapsed times for most of the
\AIPS\ tasks. Applying the optimization to the rest of the system makes
little or no difference to either elapsed or CPU times.
\bigskip
\subtit{Effects of NFS}
The following table shows the details of one DDT medium run, performed
on the rs530 with {\it everything\/} on a NFS-mounted disk: the
executable binaries, the data disks, scratch files. The
partition in question was on the rs540 disk. \bigskip
\vbox{\settabs 5 \columns
\+& {\bf Table 5}. Times for MEDIUM DDT tests, \AIPS\ 15JUL91. \cr
\+& IBM RS6000/530, data and binaries on NFS disk. \cr
\+ \cr
\+&\hrulefill & \hrulefill & \hrulefill & \cr
\+&Task & \hfill CPU & \hfill Wall & \cr
\+&\hrulefill & \hrulefill & \hrulefill & \cr
\+&UVSRT & \hfill 10.50 & \hfill 156 & \cr
\+&UVDIF & \hfill 6.25 & \hfill 24 & \cr
\+&CCMRG & \hfill 3.69 & \hfill 39 & \cr
\+&SUBIM & \hfill 2.56 & \hfill 22 & \cr
\+&COMB & \hfill 23.50 & \hfill 183 & \cr
\vskip 6pt
\+&UVMAP & \hfill 13.68 & \hfill 90 & \cr
\+&APCLN & \hfill 240.63 & \hfill 376 & \cr
\+&APRES & \hfill 11.62 & \hfill 63 & \cr
\+&ASCAL & \hfill 126.93 & \hfill 204 & \cr
\+&MXMAP & \hfill 21.70 & \hfill 130 & \cr
\+&MXCLN & \hfill 302.38 & \hfill 611 & \cr
\+&VTESS & \hfill 42.17 & \hfill 190 & \cr
\+&\hrulefill & \hrulefill & \hrulefill & \cr
\+&Total:& \hfill 805.61 & \hfill 2088 & \cr
\vskip 3pt
\+&Units of C1's:
& \hfill 1.21 & \hfill 0.58 & \cr
}\medskip
The results from this test indicate some slight degradation in CPU
time; it is unclear what the origin of this effect is. It is not
surprising that the wallclock time degrades significantly; this of
course shows clearly that such a setup is severely network-bound. For
reference, this test was performed at an off-peak hour, with both
systems almost completely unloaded and very little network traffic.
\end