forked from timj/aandc-oracdr
-
Notifications
You must be signed in to change notification settings - Fork 0
/
acoracdr.tex
1312 lines (1120 loc) · 59.2 KB
/
acoracdr.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
%% This is file `elsarticle-template-2-harv.tex',
%%
%% Copyright 2009 Elsevier Ltd
%%
%% This file is part of the 'Elsarticle Bundle'.
%% ---------------------------------------------
%%
%% It may be distributed under the conditions of the LaTeX Project Public
%% License, either version 1.2 of this license or (at your option) any
%% later version. The latest version of this license is in
%% http://www.latex-project.org/lppl.txt
%% and version 1.2 or later is part of all distributions of LaTeX
%% version 1999/12/01 or later.
%%
%% The list of all files belonging to the 'Elsarticle Bundle' is
%% given in the file `manifest.txt'.
%%
%% Template article for Elsevier's document class `elsarticle'
%% with harvard style bibliographic references
%%
%% $Id: elsarticle-template-2-harv.tex 155 2009-10-08 05:35:05Z rishi $
%% $URL: http://lenova.river-valley.com/svn/elsbst/trunk/elsarticle-template-2-harv.tex $
%%
%%\documentclass[preprint,authoryear,12pt]{elsarticle}
%% Use the option review to obtain double line spacing
%% \documentclass[authoryear,preprint,review,12pt]{elsarticle}
%% Use the options 1p,twocolumn; 3p; 3p,twocolumn; 5p; or 5p,twocolumn
%% for a journal layout:
%% Astronomy & Computing uses 5p
%% \documentclass[final,authoryear,5p,times]{elsarticle}
\documentclass[final,authoryear,5p,times,twocolumn]{elsarticle}
%% if you use PostScript figures in your article
%% use the graphics package for simple commands
%% \usepackage{graphics}
%% or use the graphicx package for more complicated commands
\usepackage{graphicx}
%% or use the epsfig package if you prefer to use the old commands
%% \usepackage{epsfig}
%% The amssymb package provides various useful mathematical symbols
\usepackage{amssymb}
%% The amsthm package provides extended theorem environments
%% \usepackage{amsthm}
\usepackage[pdftex,pdfpagemode={UseOutlines},bookmarks,bookmarksopen,colorlinks,linkcolor={blue},citecolor={green},urlcolor={red}]{hyperref}
\usepackage{hypernat}
%% Alternatives to hyperref for testing
%\usepackage{url}
%\newcommand{\htmladdnormallinkfoot}[2]{#1\footnote{\texttt{#2}}}
%\newcommand{\htmladdnormallink}[1]{\texttt{#1}}
%\newcommand{\href}[2]{\texttt{#2}}
%% The lineno packages adds line numbers. Start line numbering with
%% \begin{linenumbers}, end it with \end{linenumbers}. Or switch it on
%% for the whole article with \linenumbers after \end{frontmatter}.
%% \usepackage{lineno}
%% natbib.sty is loaded by default. However, natbib options can be
%% provided with \biboptions{...} command. Following options are
%% valid:
%% round - round parentheses are used (default)
%% square - square brackets are used [option]
%% curly - curly braces are used {option}
%% angle - angle brackets are used <option>
%% semicolon - multiple citations separated by semi-colon (default)
%% colon - same as semicolon, an earlier confusion
%% comma - separated by comma
%% authoryear - selects author-year citations (default)
%% numbers- selects numerical citations
%% super - numerical citations as superscripts
%% sort - sorts multiple citations according to order in ref. list
%% sort&compress - like sort, but also compresses numerical citations
%% compress - compresses without sorting
%% longnamesfirst - makes first citation full author list
%%
%% \biboptions{longnamesfirst,comma}
% \biboptions{}
\journal{Astronomy \& Computing}
%% Make single quotes look right in verbatim mode
\usepackage{upquote}
\usepackage{upgreek}
\usepackage{color}
\usepackage{listings}
\definecolor{mygreen}{rgb}{0,0.6,0}
\definecolor{mygray}{rgb}{0.5,0.5,0.5}
\definecolor{mymauve}{rgb}{0.58,0,0.82}
\lstset{ %
backgroundcolor=\color{white}, % choose the background color
basicstyle=\footnotesize\ttfamily, % size of fonts used for the code
breaklines=true, % automatic line breaking only at whitespace
captionpos=b, % sets the caption-position to bottom
commentstyle=\color{mygreen}, % comment style
escapeinside={\%*}{*)}, % if you want to add LaTeX within your code
keywordstyle=\color{blue}, % keyword style
stringstyle=\color{mymauve}, % string literal style
}
% Aim to be consistent, and correct, about how we refer to sections
\newcommand*\secref[1]{Sect.~\ref{#1}}
\newcommand*\appref[1]{\ref{#1}}
\begin{document}
\begin{frontmatter}
%% Title, authors and addresses
%% use the tnoteref command within \title for footnotes;
%% use the tnotetext command for the associated footnote;
%% use the fnref command within \author or \address for footnotes;
%% use the fntext command for the associated footnote;
%% use the corref command within \author for corresponding author footnotes;
%% use the cortext command for the associated footnote;
%% use the ead command for the email address,
%% and the form \ead[url] for the home page:
%%
%% \title{Title\tnoteref{label1}}
%% \tnotetext[label1]{}
%% \author{Name\corref{cor1}\fnref{label2}}
%% \ead{email address}
%% \ead[url]{home page}
%% \fntext[label2]{}
%% \cortext[cor1]{}
%% \address{Address\fnref{label3}}
%% \fntext[label3]{}
\title{ORAC-DR: A generic data reduction pipeline infrastructure}
%% use optional labels to link authors explicitly to addresses:
%% \author[label1,label2]{<author name>}
%% \address[label1]{<address>}
%% \address[label2]{<address>}
\author[jac]{Tim Jenness\corref{cor1}\fnref{timj}}
\ead{[email protected]}
\author[jac]{Frossie Economou\fnref{fe}}
\cortext[cor1]{Corresponding author}
\fntext[timj]{Present address: Department of Astronomy, Cornell University, Ithaca,
NY 14853, USA}
\fntext[fe]{Present address: LSST Project Office, 933 N.\ Cherry Ave, Tucson, AZ 85721, USA}
\address[jac]{Joint Astronomy Centre, 660 N.\ A`oh\=ok\=u Place, Hilo, HI
96720, USA}
\begin{abstract}
%% Text of abstract
ORAC-DR is a general purpose data reduction pipeline system designed
to be instrument and observatory agnostic. The pipeline works with
instruments as varied as infrared integral field units, imaging
arrays and spectrographs, and sub-millimeter heterodyne arrays \&
continuum cameras. This paper describes the architecture of the
pipeline system and the implementation of the core
infrastructure. We finish by discussing the lessons learned since
the initial deployment of the pipeline system in the late 1990s.
\end{abstract}
\begin{keyword}
%% keywords here, in the form: keyword \sep keyword
%% MSC codes here, in the form: \MSC code \sep code
%% or \MSC[2008] code \sep code (2000 is the default)
data reduction pipelines \sep techniques: miscellaneous \sep methods:
data analysis
\end{keyword}
\end{frontmatter}
% \linenumbers
%% Journal abbreviations
\newcommand{\mnras}{MNRAS}
\newcommand{\aap}{A\&A}
\newcommand{\aaps}{A\&AS}
\newcommand{\pasp}{PASP}
\newcommand{\apj}{ApJ}
\newcommand{\apjs}{ApJS}
\newcommand{\qjras}{QJRAS}
\newcommand{\an}{Astron.\ Nach.}
\newcommand{\ijimw}{Int.\ J.\ Infrared \& Millimeter Waves}
\newcommand{\procspie}{Proc.\ SPIE}
\newcommand{\aspconf}{ASP Conf. Ser.}
%% Applications
%% Misc
\newcommand{\recipe}{\emph{Recipe}}
\newcommand{\recipes}{\emph{Recipes}}
\newcommand{\primitive}{\emph{Primitive}}
\newcommand{\primitives}{\emph{Primitives}}
\newcommand{\Frame}{\emph{Frame}}
\newcommand{\Group}{\emph{Group}}
\newcommand{\Index}{\emph{index}}
\newcommand{\oracdr}{\textsc{orac-dr}}
\newcommand{\cgsdr}{\textsc{cgs}{\footnotesize 4}\textsc{dr}}
%% Links
\newcommand{\ascl}[1]{\href{http://www.ascl.net/#1}{ascl:#1}}
%% main text
\section{Introduction}
In the early 1990s each instrument delivered to the United Kingdom
Infrared Telescope (UKIRT) and the James Clerk Maxwell Telescope (JCMT) came
with its own distinct data reduction system that reused very little
code from previous instruments. In part this was due to the rapid
change in hardware and software technologies during the period, but it
was also driven by the instrument projects being delivered
by independent project teams with no standardisation requirements
being imposed by the observatory. The observatories were required to
support the delivered code and as operations budgets shrank the need
to use a single infrastructure became more apparent.
\cgsdr\
\citep[][\ascl{1406.013}]{1992ASPC...25..479S,1996ASPC...87..223D} was
the archetypal instrument-specific on-line data reduction system at
UKIRT. The move from VMS to UNIX in the acquisition environment coupled
with plans for rapid instrument development of UFTI
\citep{2003SPIE.4841..901R}, MICHELLE \citep{1993ASPC...41..401G} and
UIST \citep{2004SPIE.5492.1160R}, led to a decision to revamp the
pipeline infrastructure at UKIRT \citep{1998ASPC..145..196E}. In the
same time period the SCUBA instrument \citep{1999MNRAS.303..659H} was
being delivered to the JCMT. SCUBA had an on-line data reduction
system developed on VMS that was difficult to modify and ultimately
was capable solely of simple quick-look functionality. There was no explicit
data reduction pipeline and this provided the opportunity to develop a
truly instrument agnostic pipeline capable of supporting different
imaging modes and wavelength regimes.
The Observatory Reduction and Acquisition Control Data Reduction pipeline
\citep[\oracdr;][\ascl{1310.001}]{1999ASPC..172...11E,2008AN....329..295C} was
the resulting system. In the sections that follow we present an
overview of the architectural design and then describe the pipeline
implementation. We finish by detailing lessons learned during the
lifetime of the project.
\section{Architecture}
The general architecture of the \oracdr\ system has been described
elsewhere \citep{1999ASPC..172...11E,2008AN....329..295C}. To
summarize, the system is split into discrete units with well-defined
interfaces. The recipes define the processing steps that are required
using abstract language and no obvious software code. These recipes
are expanded into executable code by a parser and this code is
executed with the current state of the input data file objects and
calibration system. The recipes call out to external ``algorithm
engines'' using a standardized calling interface and it is these
engines that contain the detailed knowledge of how to process pixel
data. In all the currently supported instruments the algorithm
engines are from the Starlink software collection
\citep[][\ascl{1110.012}]{2014ASPC..485..391C} and use the ADAM
messaging system \citep{1992ASPC...25..126A}, but this is not
required by the \oracdr\ design.
A key part of the architecture is that the pipeline can function
entirely in a data-driven manner. All information required to reduce
the data correctly must be available in the metadata of the input data
files. This requires a systems engineering approach to observatory
operations where the metadata are treated as equals to the science
pixel data \citep[see e.g.,][for an overview of the JCMT and UKIRT
approach]{2011tfa..confE..42J} and all observing modes are designed
with observation preparation and data reduction in mind.
\section{Implementation}
In this section we discuss the core components of the pipeline
infrastructure. The algorithms themselves are pluggable parts of the
architecture and are not considered further. The only requirement
being that the algorithm code must be callable either directly from
Perl or over a messaging interface supported by Perl.
\subsection{Data Detection}
The first step in reducing data is determining which data should be
processed. \oracdr\ separates data detection from pipeline processing,
allowing for a number of different schemes for locating files. In
on-line mode the pipeline is set up to assume an incremental delivery
of data throughout the period the pipeline is running. Here we
describe the most commonly-used options.
\subsubsection{Flag files}
The initial default scheme was to check whether a new file with the
expected naming convention had appeared on disk. Whilst this can work
if the appearance of the data file is instantaneous (for example, it
is written to a temporary location and then renamed), it is all too
easy to attempt to read a file that is being written to. Modifying
legacy acquisition systems to do atomic file renames proved to be
difficult and instead a ``flag'' file system was used.
A flag file was historically a zero-length file created as soon as the
observation was completed and the raw data file was closed. The
pipeline would look for the appearance of the flag file (it would be
able to use a heuristic to know the name of the file in advance and
also look a few ahead in case the acquisition system had crashed) and
use that to trigger processing of the primary data file.
As more complex instruments arrived capable of writing multiple files
for a single observation (either in parallel or sequentially)
\citep{2009MNRAS.399.1026B,2013MNRAS.430.2513H} the flag system was
modified to allow the pipeline to monitor a single flag file but
storing the names of the relevant data files inside the file (one file
per line). For the instruments writing files sequentially the pipeline
is able to determine the new files that have been added to the file
since the previous check.
Historically synchronization delays over NFS mounts caused
difficulties when the flag file would appear but the actual data file
had not yet appeared to the NFS client computer, but on modern systems
this behavior no longer occurs.
\subsubsection{Parameter monitoring}
The SCUBA-2 quick look pipeline \citep{2005ASPC..347..585G} had a
requirement to be able to detect files taken at a rate of
approximately 1\,Hz for stare observations. This was impractical using
a single-threaded data detection system embedded in the pipeline process and
using the file system. Therefore, for SCUBA-2 quick-look processing the
pipeline uses a separate process that continually monitors the
four data acquisition computers using the DRAMA messaging system
\citep{1995SPIE.2479...62B}. When all four sub-arrays indicate that a
matched data set is available the monitored data are written to disk
and a flag file created. Since these data are ephemeral there is a
slight change to flag file behavior in that the pipeline will take
ownership of data it finds by renaming the flag file. If that happens
the pipeline will be responsible for cleaning up; whereas if the
pipeline does not handle the data before the next quick look image
arrives the gathering process will remove the flag file and delete the
data before making the new data available.
\subsection{File format conversion}
Once files have been found they are first sent to the format
conversion library. The instrument infrastructure defines what the
external format of each file is expected to be and also the internal format
expected by the reduction system. The format conversion system knows
how to convert the files to the necessary form. This does not always
involve a change in low level format (such as FITS to NDF) but can
handle changes to instrument acquisition systems such as converting
HDS files spread across header and exposure files into a single HDS
container matching the modern UKIRT layout.
\subsection{Recipe Parser}
A \recipe\ is the top-level view of the data processing steps
required to reduce some data. The requirements were that the recipe
should be easily editable by an instrument scientist without having to
understand the code, the \recipe\ should be easily understandable by
using plain language, and it should be possible to reorganize steps
easily. Furthermore, there was a need to allow \recipes\ to be edited
``on the fly'' without having to restart the pipeline. The next data file
to be picked up would be processed using the modified version of the
\recipe\ and this is very important during instrument commissioning. An
example, simplified, imaging \recipe\ is shown in Fig.\
\ref{fig:recipe}. Each of these steps can be given parameters to
modify their behavior. The expectation was that these \recipes\ would
be loadable into a Recipe Editor GUI tool, although such a tool was
never implemented.
\begin{figure}
{
\small
\begin{verbatim}
_SUBTRACT_DARK_
_DIVIDE_BY_FLAT_
_BIAS_CORRECT_GROUP_
_APPLY_DISTORTION_TRANSFORMATION_
_GENERATE_OFFSETS_JITTER_
_MAKE_MOSAIC_ FILLBAD=1 RESAMPLE=1
\end{verbatim}
}
\caption{A simplified imaging \recipe. Note that the individual steps
make sense scientifically and it is clear how to change the order or
remove steps. The \texttt{\_MAKE\_MOSAIC\_} step includes override
parameters.}
\label{fig:recipe}
\end{figure}
Each of the steps in a \recipe\ is known as a
\primitive. The \primitives\ contain the Perl source code and can
themselves call other \primitives\ if required.
The parser's core job is to read the \recipe, replace the mentions of \primitives\
with subroutine calls to the source code for that primitive. For each
\primitive\ the parser keeps a cache containing the compiled form of
the \primitive\ as a code reference, the modification time associated
with the \primitive\ source file when it was last read, and the full
text of the \primitive\ for debugging purposes. Whenever a \primitive\
code reference is about to be executed the modification time is
checked to decide whether the \primitive\ needs to be re-read.
\begin{figure*}[t]
\begin{lstlisting}[language=perl,numbers=left]
sub {
my $_PRIM_DEPTH_ = shift;
my $_PRIM_CALLERS_ = shift;
$_PRIM_DEPTH_++;
die "Primitive depth very high ($_PRIM_DEPTH_). Possible recursive primitive"
if $_PRIM_DEPTH_ > 10;
ORAC::Recipe::Execution->current_primitive( "_TEST_PRIMITIVE_", $_PRIM_CALLERS_);
my $Frm = shift;
my $Grp = shift;
my $Cal = shift;
my $Display = shift;
my $Mon = shift;
my $ORAC_Recipe_Info = shift;
my $ORAC_PRIMITIVE = "_TEST_PRIMITIVE_";
my $DEBUG = 0;
my %_TEST_PRIMITIVE_ = @_;
my $_PRIM_ARGS_ = \%_TEST_PRIMITIVE_;
my $_PRIM_ARGS_STRING_ = ORAC::General::convert_args_to_string( %$_PRIM_ARGS_ );
orac_loginfo( 'Primitive Arguments' => $_PRIM_ARGS_STRING_ );
my $_PRIM_EPOCH_ = &Time::HiRes::gettimeofday();
orac_logkey("_TEST_PRIMITIVE_");
$Frm->set_app_name(Primitive=>"_TEST_PRIMITIVE_") if $Frm->can("set_app_name");
my %RECPARS = %{$ORAC_Recipe_Info->{Parameters}};
my %Mon;
if (tied %$Mon) {
my $obj = tied %$Mon;
tie %Mon, ref($obj), $obj; # re-tie
} else {
%Mon = %$Mon;
}
#line 1 _TEST_PRIMITIVE_
...
\end{lstlisting}
\caption{Code added to the start of each \primitive. In this case
the \primitive\ is called \texttt{\_TEST\_PRIMITIVE\_}. In
debugging mode additional lines would be present. This code has
been generated with the 2014A version of \oracdr. Line 1 indicates
that this is an anonymous subroutine code reference and not a
named subroutine. Lines 2 to 13 read the (internal) subroutine
arguments and lines 16 to 19 process the user-supplied primitive
arguments. Lines 19 to 22 setup timing, logging and GUI updates
and line 23 proceses \recipe\ parameters. Lines 24 to 30 deal with
the algorithm engines and finally line 31 resets the line counter
so that error messages from user-supplied \primitive\ code report
the line number as expected by the programmer. The user-supplied
\primitive\ code would be inserted from line 32 onwards and then
be followed by the code inserted to complete the primitive.}
\label{fig:primhead}
\end{figure*}
The parser is also responsible for adding additional code at the
start of the \primitive\ to allow it to integrate into the general
pipeline infrastructure. This code includes:
\begin{itemize}
\item Handling of state objects that are passed through the subroutine
argument stack and parsing of parameters passed to the \primitive\
by the caller. These arguments are designed not be language-specific
and use a simple \texttt{KEYWORD=VALUE} syntax
and can not be handled directly by the Perl interpreter.
\item Trapping for primitive call recursion.
\item Debugging information
such as timers to allow profile information be
collected, and entry and exit log messages to indicate exactly when
a routine is in use.
\item Callbacks to GUI code to indicate which \primitive\ is
currently active.
\end{itemize}
Fig.\ \ref{fig:primhead} shows an example of the subroutine entry code generated automatically by the
parser. The design is such that adding new code to the entry and exit of each
\primitive\ can be done in a few lines with little overhead.
Calling algorithm engines is a very common occurrence and is also where
most of the time is spent during \recipe\ execution. In order to
minimize repetitive coding for error conditions and to allow for profiling, calls to
algorithm engines are surrounded by code to automatically handle these
conditions. For example:
\begin{lstlisting}[language=perl]
$Mon{monolith}->obeyw("action","arg1=a");
\end{lstlisting}
is expanded significantly and becomes something akin to:
\begin{lstlisting}[language=perl]
my $OBEYW_STATUS=$Mon{monolith}->obeyw("action",
"arg1=a");
if ($OBEYW_STATUS != ORAC__OK) {
orac_err ("Error in obeyw: $OBEYW_STATUS\n");
my $obeyw_args = "arg1=a arg2=b";
orac_print("Arguments were: ","blue");
orac_print("$obeyw_args\n\n","red");
if ($OBEYW_STATUS == ORAC__BADENG) {
orac_err("Monolith monolith has died.");
delete $Mon{monolith};
}
return $OBEYW_STATUS;
}
#line 5 _TEST_PRIMITIVE_
\end{lstlisting}
where much code has been left out of this example for
clarity\footnote{The \texttt{oracdr\_parse\_recipe} command can be run
to provide a complete translation of a \recipe.} and in particular the
logging and profiling code are missing. As for the previous example,
the line counter is reset when the added code ends.
\subsection{Recipe Parameters}
The general behavior of a recipe can be controlled by editing it and
adjusting the parameters passed to the \primitives. A much more
flexible scheme is available which allows the person running the
pipeline to specify a \recipe\ configuration file that can be used to
control the behavior of \recipe\ selection and how a \recipe\ behaves.
The configuration file is a text file written in the INI
format. Although it is possible for the \recipe\ to be specified on
the command-line that \recipe\ would be used for all the files being
reduced in the same batch and this is not an efficient way to
permanently change the \recipe\ name. Changing the file header is not
always possible so the configuration file can be written to allow
per-object selection of \recipes. For example,
\begin{quote}
\begin{verbatim}
[RECIPES_SCIENCE]
OBJECT1=REDUCE_SCIENCE
OBJECT2=REDUCE_FAINT_SOURCE
A.*=BRIGHT_COMPACT
\end{verbatim}
\end{quote}
would select \texttt{REDUCE\_SCIENCE} whenever a \emph{science}
observation of OBJECT1 is encountered but choose
\texttt{REDUCE\_FAINT\_SOURCE} whenever OBJECT2 is found. The third
line is an example of a regular expression that can be used to select
recipes based on a more general pattern match of the object name. This relies
on header translation functioning to find the observation type and
object name correctly. This sort of configuration is quite common when the
Observing Tool has not been set up to switch recipes.
Once a \recipe\ has been selected it can be configured as simple
key-value pairs:
\begin{quote}
\begin{verbatim}
[REDUCE_SCIENCE]
PARAM1 = value1
PARAM2 = value2
[REDUCE_SCIENCE:A.*]
PARAM1 = value3
\end{verbatim}
\end{quote}
and here, again, the parameters selected can be controlled by a
regular expression on the object name. The final set of parameters are
made available to the primitives in a hash (see
Fig.~\ref{fig:primhead} line 23).
\subsection{Recipe Execution}
\label{sec:exec}
Once a set of files have been found the header is read to determine
how the data should be reduced. Files from the same observation are
read into what is known as a \Frame\ object. This object contains all
the metadata and pipeline context and, given that the current
algorithm engines require files to be written, the name of the
currently active intermediate file (or files for observations that
either consist of multiple files or which generate multiple
intermediate files). In some cases, such as for ACSIS, a single
observation can generate multiple files that are independent and in
these cases multiple \Frame\ objects are created and they are
processed independently. There is also a \Group\ object which
contains the collection of \Frame\ objects that the pipeline should
combine. The hierarchy for the \Frame\ class is shown in
Fig.~\ref{fig:frameclass}.
\begin{figure*}
\includegraphics[width=\textwidth]{frame-class-hierarchy}
\caption{Hierarchy for the \Frame\ class. The location of the ESO
classes reflects an historical anomaly associated with the early
development of the ESO instrument testing. LCO and Gemini
instruments also inherit from UKIRT, indicating that the UKIRT
functionality is not as UKIRT-centric as the name would suggest. The
second row contains file format specific routines. Most \Frame\
classes inherit from NDF and this is because most of the FITS-based
implementations do on the fly format conversion to NDF before the
\Frame\ object is instantiated.}
\label{fig:frameclass}
\end{figure*}
The pipeline will have been initialized to expect a particular instrument and
the resulting \Frame\ and \Group\ objects will be instrument-specific subclasses.
The \Frame\ object contains sufficient information to allow the
pipeline to work out which \recipe\ should be used to reduce the
data. The \recipe\ itself is located by looking through a search path
and modifiers can be specified to select recipe variants. For example,
if the recipe would normally be \texttt{REDUCE\_SCIENCE} the pipeline
can be configured to prefer a recipe suffix of \texttt{\_QL} to
enable a quick-look version of a recipe to be selected at the summit
whilst selecting the full recipe when running off-line.
The top-level \recipe\ is parsed and is then evaluated in the
parent pipeline context using the Perl \texttt{eval} function. The
\recipe\ is called with the relevant \Frame, and \Group\ objects along
with other context (see Fig.\ \ref{fig:primhead} for an example). The
reason we use \texttt{eval} rather than running the recipe in a
distinct process is to allow the recipe to update the state. As
discussed in section \ref{sec:onvoff}, the pipeline is designed to
function in an incremental mode where data are reduced as they arrive,
with group co-adding either happening incrementally or waiting for a
set cadence to complete. This requires that the group processing stage
knows the current state of the \Group\ object and of the contributing
\Frame\ objects. Launching an external process to execute the
recipe each time new data arrived would significantly complicate the
architecture.
As noted in the previous section, the \recipe\ is parsed incrementally
and the decision on whether to re-read a \primitive\ is deferred until
that \primitive\ is required. This is important for instruments such
as MICHELLE and UIST which can observe in multiple modes
(spectroscopy, imaging, IFU), sometimes
requiring a single recipe invocation to call \primitives\ optimized
for the different modes. The execution environment handles this by
allowing a caller to set the instrument mode and this dynamically
adjusts the \primitive\ selection code.
\subsection{Header Translation}
As more instruments were added to \oracdr\ it quickly became apparent
that many of the \primitives\ were being adjusted to support different
variants of FITS headers through the use of repetitive if/then/else
constructs. This was making it harder to support the code and it was
decided to modify the \primitives\ to use standardized headers. When a
new \Frame\ object is created the headers are immediately translated
to standard form and both the original and translated headers are
available to \primitive\ authors.
The code to do the translation was felt to be fairly generic and was
written to be a standalone
module\footnote{\texttt{Astro::FITS::HdrTrans}, available on
CPAN}. Each instrument header maps to a single translation class
with a class hierarchy that allows, for example, JCMT instruments to
inherit knowledge of shared JCMT headers without requiring that the
translations be duplicated. Each class is passed the input header and
reports whether the class can process it, and it is an error for multiple
classes to be able to process a single header. A method exists for each
target generic header and has the form:
\begin{lstlisting}[language=perl]
$genericvalue = $class->_to_GENERIC_NAME(\%fits);
\end{lstlisting}
where, for example the method to calculate the start airmass would be
\texttt{\_to\_AIRMASS\_START}. The simple unit mappings (where there
is a one-to-one mapping of an instrument header to a generic header
without requiring changes to units) are defined as simple Perl hashes
but at compile-time the corresponding methods are generated so that
there is no difference in interface for these cases. Complex mappings
that may involve multiple input FITS headers, are written as explicit
conversion methods.
The header translation system can also reverse the mapping such that a
set of generic headers can be converted back into instrument-specific
form. This can be particularly useful when required to update a header
during processing.
\subsection{Calibration System}
During \Frame\ processing it is necessary to make use of calibration
frames or parameters derived from calibration observations. The early
design focused entirely on how to solve the problem of selecting the
most suitable calibration frame for a particular science observation
without requiring the instrument scientist to write code or understand
the internals of the pipeline. The solution that was adopted involves
two distinct operations: filing calibration results and querying those results.
When a calibration image is reduced (using the same pipeline
environment as science frames) the results of the processing are
registered with the calibration system. Information such as the name
of the file, the wavelength, and the observing mode are all stored in the \Index.
In the current system the \Index\ is a text file on disk that is cached by
the pipeline but the design would be no different if an SQL database
was used instead; no \primitives\ would need to be modified to switch
to an SQL backend. The only requirement is that the \Index\ is
persistent over pipeline restarts (which may happen a lot during
instrument commissioning).
The second half of the problem was to provide a rules-based system.
A calibration rule simply indicates how a header in the science data
must relate to a header in the calibration database in order for the
calibration to be flagged as suitable. The following is an excerpt
from a rules file for an imaging instrument dark calibration:
\begin{quote}
{\small
\begin{verbatim}
OBSTYPE eq 'DARK'
MODE eq $Hdr{MODE}
EXP_TIME == $Hdr{EXP_TIME}
MEANCOUNT
\end{verbatim}
}
\end{quote}
Each row in the rules file is evaluated in turn by replacing the
unadorned keyword with the corresponding calibration value read from
the \Index\ and the \texttt{\$Hdr} corresponding to the science
header. In the above example the
calibration would match if the exposure times and observing readout
mode match and the calibration itself is a dark.
These rules are evaluated using the Perl \texttt{eval} command
so the full Perl interpreter is available. The following common idiom
is used to filter out calibration observations that are too old:
\begin{quote}
{\small
\begin{verbatim}
ORACTIME ; abs( ORACTIME - $Hdr{ORACTIME} ) < 1.0
\end{verbatim}
}
\end{quote}
where we make use of the fact that a bare number or quoted string on
its own is effectively a no-op and the real logic begins after the
statement separator.\footnote{\texttt{ORACTIME} is the standardized
representation of the observation time of the \Frame.}
The rules file itself represents the schema of the database in
that for every line in the rules file, information from that
calibration is stored in the \Index. In the example above,
\texttt{MEANCOUNT} is not used in the rules processing but the
presence of this item means that the corresponding value will be
extracted from the header of the calibration image and registered in
the calibration database.
The calibration selection system can behave differently in off-line
mode as the full set of calibrations can be made available and
calibrations taken after the current observation may be relevant. Each
instrument's calibration class can decide whether this is an
appropriate behavior.
The calibration system can be modified by a command-line argument at
run time to allow the user to decide which behavior to use. For
example, with the SCUBA pipeline \citep{1999ASPC..172..171J} the user
can decide which opacity calibration scheme they require from a number
of options.
One of the more controversial aspects of the calibration system was
that the UKIRT pipelines would stop and refuse to reduce data if no
suitable calibration frame had been taken previously (such as a dark
taken in the wrong mode or with the wrong exposure). This sometimes
led to people reporting that the pipeline had crashed (and so was
unstable) but the purpose was to force the observer to stop and think
about their observing run and ensure that they did not take many hours
of data with their calibration observations being taken in a manner
incompatible with the science data. A pro-active pipeline helped to
prevent this and also made it easier to support flexible scheduling
\citep{2002ASPC..281..488E,2004SPIE.5493...24A} without fearing that
the data were uneducable.
This hard-line approach to requiring fully calibrated observations,
even if the PI's specific science goals did not require it, was
adopted in anticipation of the emergence of science data archives as
an important source of data for scientific papers. Casting the PI not
as the data owner, but rather as somebody who is being leased
observatory data from the public domain for the length of their
proprietary period, requires an observation as only being complete if
fully calibratable. In that way, the telescope time's value is
maximised by making the dataset useful to the widest range of its
potential uses. To this end, the authors favor a model where for
flexibly-scheduled PI-led facilities, calibration time is not deducted
from the PI's allocation.
\subsection{Configurable Display System}
\begin{figure*}
{\small
\begin{verbatim}
# Send raw frame to first Gaia window
num type=image tool=gaia region=0 window=0 autoscale=1 zautoscale=1
raw type=image tool=gaia region=0 window=0 autoscale=1 zautoscale=1
# Send darsubtracted frame to first Gaia window
dk type=image tool=gaia region=0 window=0 autoscale=1 zautoscale=1
# Send flatfielded frame to first Gaia window
ff type=image tool=gaia region=0 window=0 autoscale=1 zautoscale=1
# Send mosaic frame to second Gaia window
g_mos type=image tool=gaia region=0 window=1 autoscale=1 zautoscale=1
# Send polarimetry vectorplot (intensity image) to KAPVIEW window
I type=vector tool=kapview region=0 window=0 autoscale=1 zautoscale=1
\end{verbatim}
}
\caption{Sample display configuration file used as the default for the
UFTI instrument.}
\label{fig:disp}
\end{figure*}
On-line pipelines are most useful when results are displayed to the
observer. One complication with pipeline display is that different
observers are interested in different intermediate data products or
wish the final data products to be displayed in a particular
way. Display logic such as this can not be embedded directly in
\primitives; all a \primitive\ can do is indicate that a particular
product \emph{could} be displayed and leave it to a different system
to decide \emph{whether} the product should be displayed and how to
do so.
The display system uses the \oracdr\ file naming convention to
determine relevance. Usually, the text after the last underscore,
referred to as the file suffix, is used to indicate the reduction step
that generated the file: \texttt{mos} for mosaic, \texttt{dk} for
dark, etc. When a \Frame\ or \Group\ is passed to the display system
the file suffix and, optionally a \Group\ versus \Frame\ indicator,
are used to form an identifier which is compared with the entries in
the display configuration file (Fig.\ \ref{fig:disp}). For each row
containing a matching identifier the files will be passed to the
specific display tool. Different plot types are available such as
image, spectrum, histogram, and vector plot and also a specific mode
for plotting a 1-dimensional dataset over a corresponding model. Additional
parameters can be used to control placement within a viewport and how
auto-scaling is handled. The display system currently supports \textsc{gaia}
\citep[][\ascl{1403.024}]{2009ASPC..411..575D} and \textsc{kappa}
\citep[][\ascl{1403.022}]{SUN95} as well as the historical P4 tool
(part of \cgsdr\ \citep{SUN27} and an important influence on the
design).
Originally the display commands would be handled within the \recipe\
execution environment and would block the processing until the display
was complete. This can take a non-negligible amount of time and for the
SCUBA-2 pipeline to meet its performance goals this delay was
unacceptable. The architecture was therefore modified to allow the
display system running from within the \recipe\ to register the
display request but for a separate process to be monitoring these
requests and triggering the display.
\subsection{Support modules}
As well as the systems described above there are general support
modules that provides standardized interfaces for message output, log files
creation and temporary file handling.
The message output layer is required
to allow messages from the algorithm engines and from the \primitives\
to be sent to the right location. This might be a GUI, the terminal or
a log file (or all at once) and supports different messaging levels to
distinguish verbose messages, from normal messages and
warnings. Internally this is implemented as a tied object that
emulates the file handle API and contains multiple objects to allow
messages to be sent to multiple locations.
Log files are a standard requirement for storing information of
interest to the scientist about the processing such as
quality assurance parameters or photometry results. The pipeline
controls the opening of these files in a standard way so that the
primitive writer simply has to worry about he content.
With the current algorithm engines there are many intermediate files
and most of them are temporary. The allocation of filenames is handled
by the infrastructure and they are cleaned up automatically unless the
pipeline is configured in debugging mode to retain them.
\section{Supporting New Instruments}
An important part of the \oracdr\ philosophy is to make adding new
instruments as painless as possible and re-use as much of the
existing code as possible. The work required obviously depends on the
type of instrument. An infrared array will be straightforward as many
of the \recipes\ will work with only minor adjustments. Adding support
for an X-Ray telescope or radio interferometer would require
significantly more work on the recipes.
To add a new instrument the following items must be considered:
\begin{itemize}
\item How are new data presented to the pipeline? \oracdr\ supports a
number of different data detection schemes but can't cover every option.
\item What is the file format? All the current \recipes\ use Starlink
algorithm engines that require NDF \citep{ndfjenness} and if FITS
files are detected the infrastructure converts them to NDF before
handing them to the rest of the system. If the raw data are in HDF5,
or use a very complex data model on top of FITS, new code will have
to be written to support this.
\item How to map the metadata to the internal expectations of the
pipeline? A new module would be needed for \texttt{Astro::FITS::HdrTrans}.
\item Does it need new \recipes/\primitives? This depends on how close
the instrument is to an instrument already supported. The \recipe\
parser can be configured to search in instrument-specific
sub-directories and, for example, the Las Cumbres Observatory
imaging recipes use the standard \primitives\ in many case but also
provide bespoke versions that handle the idiosyncrasies of their
instrumentation.
\end{itemize}
Once this has been decided new subclasses will have to be written to
encode specialist behavior for \Frame\ and \Group\ objects and the
calibration system, along with the instrument initialization class
that declares the supported calibrations and algorithm engines.
\section{Lessons Learned}
\subsection{Language choice can hinder adoption}
In 1998 the best choice of dynamic ``scripting'' language for an astronomy project was
still an open question with the main choices being between Perl and
Tcl/Tk with Python being a distant third
\citep{1995ComPh...9...57A,1999ASPC..172..494J,1999ASPC..172..483B,2000ASPC..216...91J}.
Tcl/Tk had already been adopted by Starlink
\citep{1995ASPC...77..395T}, STScI \citep{1998SPIE.3349...89D},
SDSS \citep{1996ASPC..101..248S} and ESO \citep{1996ASPC..101..396H,1995ASPC...77...58C} and
would have been the safest choice, but at the time it was felt that
the popularity of Tcl/Tk was peaking. Perl was chosen as it was a language
gaining in popularity and the development team were proficient in
it in addition to developing the Perl Data Language \citep[PDL;][]{PDL}
promising easy handling of array data; something Tcl/Tk was incapable
of handling.
Over the next decade and a half, beginning with the advent of \texttt{pyraf}
\citep[][\ascl{1207.010}]{2000ASPC..216...59G,2006hstc.conf..437G}
and culminating in Astropy \citep{2013A&A...558A..33A},
Python became the dominant language for astronomy,
becoming the \emph{lingua franca} for new students in astronomy and
the default scripting interface for new data reductions systems such
as those for ALMA
\citep{2007ASPC..376..127M} and LSST \citep{2010SPIE.7740E..15A}.
In this environment, whilst \oracdr\ received much interest from other
observatories, the use of Perl rather than Python became a
deal-breaker given the skill sets of development groups. During this
period only two additional observatories adopted the pipeline: the
Anglo-Australian Observatory for IRIS2 \citep{2004SPIE.5492..998T} and Las Cumbres
Observatory for their imaging pipeline \citep{2013PASP..125.1031B}.
The core design concepts were not at issue, indeed, Gemini adopted the
key features of the \oracdr\ design in their Gemini Recipe System
\citep{2014ASPC..485..359L}. With approximately 100,000 lines of Perl code in
\oracdr\footnote{For infrastructure and \primitives, but counting code only, with comments adding more than
100,00 lines to that
number. Blank line count not included, nor are support modules from CPAN
required by the pipeline but distributed separately.} it
is impractical to rewrite it all in Python given that the system does
work as designed.
Of course, a language must be chosen without the benefit of hindsight
but it is instructive to see how the best choice for a particular
moment can have significant consequences 15 years later.
\subsection{In-memory versus intermediate files}
When \oracdr\ was being designed the choice was between IRAF
\citep[][\ascl{9911.002}]{2012ASPC..461..595F} and Starlink for the algorithm engine.
At the time the answer was that Starlink messaging and error reporting were
significantly more robust and allowed the \primitives\ to adjust their
processing based on specific error states (such as there being too few
stars in the field to solve the mosaicking offsets). Additionally,
Starlink supported variance propagation and a structured data format.
From a software
engineering perspective Starlink was clearly the correct choice but it
turned out to be yet another reason why \oracdr\ could not be adopted
by other telescopes. Both these environments relied on each command
reading data from a disk file, processing it in some way and then
writing the results out to either the same or a new file. Many of
these routines were optimized for environments where the science data
was comparable in size to the available RAM and went to great lengths
to read the data in chunks to minimize swapping. It was also not
feasible to rewrite these algorithms (that had been well-tested) in
the Perl Data Language, or even turn the low-level libraries into Perl
function calls, and the penalty involved in continually reading
and writing to the disk was deemed to be a good trade off.
As it turns out, the entire debate of Starlink versus IRAF is somewhat