The main motivation for processing ALMA data in quasi-real time is to optimize the scientific efficiency of the array. The instrument will be dynamically scheduled (Section 4), so an evaluation of the data quality must be available very soon after data taking, using visibilities in quasi real time, in order to allow switching projects if the current one is not matched to the actual observing conditions. Though readily available information on the array behavior can be obtained by monitoring atmospheric data (such as water content, and it fluctuations), more valuable information can be obtained by monitoring the atmospheric phase itself (using phase calibrators). Finally it is important to be able to determine quite soon if a project's goals are being attained, and for this a first step is naturally to calibrate the data as completely as feasible, and evaluate the quality of that calibration. The instantaneous u,v coverage being reasonably good, pipeline data processing can include not only calibration but also imaging using, if possible, the best solution to the inversion problem: diagnostic tools could eventually select between competitive methods. Another motivation is naturally to make the instrument more accessible to first-time users by producing images in a quasi-automated mode.
The required data quality level will be specified by the astronomers in their proposals, so that the output of the data pipeline can be used to decide when a project is completed. This can be e.g. on the basis of a certain rms noise level at a certain spatial resolution, a dynamic range, the achieved angular resolution or eventually the translation of these into more technical specifications such as rms phase uncertainties on calibrators, bandpass calibration accuracy, tolerance on side lobe levels in the synthesized beam, etc...
The pipeline must be able to process systematically the quasi totality of the measurements obtained with the array in a fully automated procedure. Its output will constitute a data archive with rather homogeneous properties. However their quality will not necessarily be optimal: human intervention will often be required to enhance the quality of the output. These final results should also be archived too, but in an other base of reduced data. Comparison between these two databases can be very useful to optimize the observing and data reduction procedures. These two databases, being often of much more modest size than the raw database, will be much more easily manageable and accessible through the Internet. This should maximize the use of ALMA observations, e.g. for the preparation of new projects by the proposing astronomers or, when they become publicly accessible, for direct scientific use in a different astronomical context than that of the original proposal.
Pipeline data processing will also enhance the efficiency of
interactive observing, either by the astronomer if so requested
in the proposal, or by the staff during technical time. The data
pipeline will not only provide to the astronomer the possibility of
adjusting the
observing strategy following the results in
quasi-real time, but also of running projects
more efficiently in focus with their scientific objectives. High level
specifications could actually be given during Phase 2 of the
proposal submission procedure. To illustrate this, let us consider a
proposal with the following requirements: some wide region of the
sky must be imaged in the continuum in the 1.2 mm window; this wide
field imaging must reach a certain specified rms sensitivity; all
compact sources found above 5
in that image have to be
imaged in pointed mode, one field per source, at higher frequency
down to again a certain sensitivity limit such that their spatial
morphology can be investigated. Such a high level of specifications
implies the need of high level measurement tools as part of the data
pipeline such as a source extractor to blindly find sources and their
positions; in this case the observing procedure will include several
observing modes (mosaicing, pointed observations, multi-frequency)
and it will be set dynamically, on the basis of results obtained by the
data pipeline during the sequence of observations.
The basic components in the array calibration are the baseline calibration, the pointing, and focus determinations. They must be back fed as soon as possible to be taken into account by the real time system. The operator must have the flexibility to suspend a sequence of activities such as those described in a data reduction procedure and to resume from that state (plus the manual modifications) when the activity was suspended. He or she may change e.g. to an other pointing source or phase calibrator. These actions may have effects on the ongoing observing procedure. It must be possible to modify the level of interaction at any time from fully automated continuous processing to prompts requesting inputs with sensible visible defaults up to prompts with no default (i.e. fully manual).
The basic components here are phase and amplitude calibrations and their interpolation between calibrator observation. Results (e.g. rms phases and seeing information) must be back fed both to the scheduler and the observing processes. The pipeline must also be able to self calibrate the data when possible.
For continuum projects the data pipeline must subtract the atmospheric contribution, in a way that depends of course on the actual observing mode. For line data it must subtract measurements obtained on an OFF position if needed, normalize by gains to scale the data into temperature units; it must also subtract spectral baselines. The pipeline must be able to grid the data for imaging. It must display the results at the various stages. If these total power measurements are obtained by a sub-array while the other antennas are used for the cross correlations for the same target, the calibrations should be able to proceed in parallel such that when imaging both data sets are ready to be combined when the imaging stage begins.
The pipeline must produce continuum and/or line images of the calibrated data obtained so far. These images must be visualized, interactively in the case of interactive observations. The pipeline should also be able to compare redundant data (obtained simultaneously or not) to better assess the data quality. It must be possible to feed these interactive measurements back to the scheduler or to the observing process, if relevant. The images should be deconvolved using the most appropriate algorithm; it is desirable to allow several algorithms to compete in case of complex images for which there is no guaranty of a single optimum algorithm. The imaging pipeline must be able to produce images with inclusion of zero and short spacings. In any case it must return information about the robustness of the results in these cases where a unique method is not available.
The pipeline interacts with a number of actors in the system. It also plays an important role in the sequence diagram for the array activity. The following actors interact with the data pipeline: