F.4 Storing data on tape

Images and uv databases are written to magnetic tape by FITTP for archival purposes and for transfer to other computers and sites. Three FITS-standard formats are available, controlled through the adverb FORMAT. The preferred format is 32-bit floating point (IEEE standard) format. There are no dynamic range limitations in this format and, on many modern computers, no bit manipulation is required since they use IEEE floating internally.

Of the two integer formats, there is little reason to use the 32-bit integer since it poses dynamic range, re-scaling, and other problems with no saving in space. The 16-bit integer format uses 16-bit signed 2’s complement integers to represent the data. Such numbers are limited to the range -32768 to 32767. FITTP has to find the maximum and minimum in the image and then scale the data to fit in this numeric range. For images of limited dynamic range, this format is perfectly adequate. In fact, FITAB offers the option to reduce the dynamic range even further with the QUANTIZE adverb. For images written to FITS disk files, this allows for better compression before the files are transmitted over the Internet. For high-dynamic range images, the 16-bit format may not be adequate. (The integer formats are no longer allowed for uv data. More than one user has reduced all his “good” spectral channels to pure 0 by scaling all the uv data to include one really horrendously bad sample.) A less important benefit of the floating point format is that the numbers representing your data are recorded exactly on tape as they are stored on disk; there are no “quantization errors”. This may be important for software development.

The preceding paragraphs do not tell the full story, however. The portion of the FITS standard used by FITTP does not allow for uv data on tape in a compressed format. Instead, FITTP expands the data into the uncompressed form and then writes the data on tape. In the conversion, the real and imaginary values that were stored in one packed number are expanded into three real values — one each for real, imaginary and weight terms — and the weight and scale random parameters are removed since they are no longer required. Consequently, the compressed data are expanded to

{(# random parameters - 2)+ [(# pol) × (# IF s)× (# frequencies)× 3]}
---{(#-random-parameters)+-[(#-pol)-×-(#-IF-s)×-(#-frequencies)]}----
the original size (where # random parameters is the original number in the compressed database).

As an example, let us consider a multi-source spectral-line database stored on disk in compressed format. The data set has seven channels each at 2 IFs with 2 polarizations. There are nine random parameters and 834031 visibilities. From §F.1, we can calculate the size of the uv file to be 123 Mbytes. (Remember, this doesn’t include any of the extension files, some of which might be several Mbytes in size.) Before the file is written to tape in 32-bit floating format, it is first expanded by a factor of

{(9---2)+-[2-×-2×-7×-3]}
    {9+ [2 × 2× 7]}    = 2.333.
Consequently, the data will occupy
123 × 2.333M bytes = 287M bytes
on tape. In other words, this database and all the associated extension files will not fit on a standard, 6250 bpi tape even using BLOCKING = 10, but modern tape technology solves this problem.

Note that FITTP writes history file data into the FITS header and writes table extension files as extensions after the main image or data set within the same tape file. Plot (PL) and slice (SL) files are not saved to tape.

Task FITAB uses “binary tables” to represent visibility data rather than the old, mildly deprecated “random groups” form of the FITS format. This has several advantages for uv data. It allows compressed data to be recorded in that form exactly (except for byte-order questions which should not concern the reader). It also allows the data and attached tables to be divided into pieces which will reduce the size of files to be copied over the Internet, making copying and tape storage somewhat more reliable. The principal disadvantage of FITAB uv data output is that only two packages can read it so far. The one “outside” package that reads the format is obit available from Bill Cotton at NRAO Charlottesville.

F.4.1 DAT and Exabyte tapes

The arrival of modern tape technologies has hastened the demise of 9-track tapes. First Exabyte (8mm) and then DAT (4mm) have provided much higher storage capacities than the 9-track tapes and have also provided faster seeks between file marks and greater data reliability. The new technologies are very much cheaper as well, in part because they have been adopted by the PC market. They are both technically quite complex internally. The DAT tape has a “system log” area at the beginning which allows for the fast seeks. It is a bit fragile, however, since it is updated when the tape is unloaded and hence can be incorrect if there is an unfortunate power failure. Both technologies are still evolving and both now offer various data encoding/compression options. Unfortunately, the data compression techniques vary considerably with tape model and manufacturer and hence should not be used to archive or transport data. The data are blocked on the tapes by means known only to the manufacturers and are not significantly under user control. It is still probably good to use a large BLOCKING, but only for I/O transfer reasons. The EOF marks can be expensive on these tape devices.

Exabytes at low density have a capacity of about 2.2 Gbytes on a 112m tape and use about 1 Mbyte (or maybe even 4 Mbytes) for each EOF mark. The large size of the EOF limits the number of files you can write rather significantly. The EOFs are also slow to process mechanically. Exabytes at high density have a capacity of 4.5 Gbytes on a 112m tape and use 48 Kbytes per EOF mark. DATs have a capacity of 2.0 Gbytes on a 90m tape, but also come in 60m and 120m sizes. The EOF mark size is not readily available, but is probably no more than 48 Kbytes. The early warning of the end-of-medium is 40 Mbytes before the actual end of tape.