Pipeline Engine, henceforth referred to as engine, is a JAVA based framework, which links sequential activities, human and computer, into a defined process flow and manages how the data moves from step to step based on the results of each step. In most laboratories, some processes (or pipelines) are carried out automatically without any human intervention and others, such as QC or other inspections which require a person to make judgments or drawing of ROI etc, are necessarily done by hand. The pipeline engine manages how the data moves from step to step based on the results of each step. The process flow is defined in a XML document called the pipeline descriptor and the executables are defined in a separate XML document called resource descriptors. With a catalog of resource descriptors one could thus start creating pipelines.
A process is a collection of steps to be executed and a step is a collection of resources (human or computer) to be invoked. When an executable is invoked, the engine monitors its exit status and if the executable was succesfully launched, continues onto the next resource/step. In case the executable fails, unless the step specifically is set to continue on failure (using the attribute continueOnFailure), the engine terminates the entire pipeline and sends an email notification. In order to temporarily pause the pipeline for some human intervention, use the attribute awaitApprovalToProceed to specify that the current step is to be executed and the engine should stop executing any other step beyond the current step. One can then restart the pipeline from the next step at a later point in time.
The resource descriptor document defines the executable wrt to the input arguments and the outputs. The engine is launched using two execucatables viz. PipelineRunner and the XnatPipelineLauncher. The difference between the two is that XnatPipelineLauncher updates the XNAT workflow element as the steps execute.
The engine comes bundled with a tool ResourceImporter which generates the Resource Descriptors (see
XML Based Pipeline and Resource descriptors. Parameters can be passed to the Pipeline descriptor either in a separate Parameter document or as arguments at the command prompt. The engine uses the features of XPath and hence within a pipeline one can create loops, use if conditions. The engine also has goto capabilities.
Execuatbles are launched either locally or on a remote machine using ssh. The engine is not yet capable of transferring data to the remote machine but if the machines share a common file system then jobs can be executed remotely
Each step of the pipeline is monitored. If a step fails to execute, unless overidden with continueOnFailure, the engine aborts the pipeline and is capable of notifying by email.
A pipeline can be restarted at any step. A step could be another pipeline (called a pipelet). In this case the engine exports all the parameters to the pipelet and launches the pipelet.
The engine captures provenance information as it executes the steps.
The engine can interact with XNAT to update the workflow element as it completes a step.
In the following paragraphs, we will analyze the various tags in the Pipeline descriptor used by XNAT to transfer a session from the Pre-Archive folder to the permanent archive. This file, Transfer.xml, is located within catalog/xnat_tools folder.
Notice the element
<outputFileNamePrefix>
^concat(/Pipeline/parameters/parameter[name='destinationDir']/values/unique/text(),'/',/Pipeline/name/text())^
</outputFileNamePrefix>
outputFileNamePrefix is an optional tag which is used to specify the prefix for the output files that the engine generates, viz, log, err and an internal xml representation of the Pipeline. In the xml snippet above, the text uses XPATH string function concat and the XPATH notation /Pipeline/parameters/parameter[name='destinationDir']/values/unique/text() and is enclosed within ^(caret character). The presence of the ^ symbol is an indication for the engine to resolve the XPATH before launching. In fact, the engine first reads an xml document, resolves all XPATH statements and then resolves all the steps against the resource descriptors and then launches the steps.
The commented portion of the xml contains
Schema Representation of the Parameter element | |
<parameter> <name>destinationDir</name> <values> <unique>/data/dest</unique> </values> <description>The path to the destination directory</description> </parameter> |
This snippet is describing a parameter for the pipeline. The name of the parameter is destinationDir. The value of the parameter, destinationDir, in this example is unique and is specified as /data/dest. Note that as there is no ^(caret) symbol surrounding the text and so the engine will take the content as is. One can specify a list of values for a parameter.
Snippet from /catalog/xnat_tools/Transfer.xml | Schema Representation of the Step element |
<step id="1" description="Copy files from prearc to archive"> <resource name="AntCopy" location="%PIPELINE_DIR_PATH%/catalog/ant-tools"> <argument id="source"> <value> ^/Pipeline/parameters/parameter[name='sourceDir']/values/unique/text()^ </value> </argument> <argument id="destination"> <value> ^/Pipeline/parameters/parameter[name='destinationDir']/values/unique/text()^ </value> </argument> <argument id="overwrite"/> </resource> </step> |
A step is a collection of actions to be performed. These could be calls to executable events or email notifications (Step id=5 is a email notification sent by the transfer pipeline (Transfer.xml above)). A step could be a pipelet, which is another pipeline. The engine exports all the parameters of the pipeline which invokes a pipelet. The engine captures provenance information as it executes the steps and one can define the output(s) (typically files) which a step will generate. These can then be used to populate XNAT.
Going back to the Step 1 of the Transfer Pipeline, when the engine executes this step, after resolving the XPATH values, the contents of the resource tag are reconciled against the Resource descriptor. The attributes step.resource.name and step.resource.location link the step.resource to the Resource descriptor. The elements, Resource.name and Resource.location in the Resource descriptor, together, on the other hand, point to the executable which gets invoked by this Resource.
Snippet from /catalog/xnat_tools/Transfer.xml | Resource Descriptor from /catalog/ant-tools/AntCopy.xml |
<step id="1" description="Copy files from prearc to archive"> <resource name="AntCopy" location="%PIPELINE_DIR_PATH%/catalog/ant-tools"> <argument id="source"> <value> ^/Pipeline/parameters/parameter[name='sourceDir']/values/unique/text()^ </value> </argument> <argument id="destination"> <value> ^/Pipeline/parameters/parameter[name='destinationDir']/values/unique/text()^ </value> </argument> <argument id="overwrite"/> </resource> </step> |
Consider the element Resource.input.argument.
<pip:argument id="source">
<pip:name>src</pip:name>
<pip:description>Source directory to copy from</pip:description>
</pip:argument>
This defines an input argument of the executable represented by the resource descriptor AntCopy.xml. The attribute id is used in the Pipeline Descriptor to refer to the argument (See Step above). Let us assume that the value for the parameter sourceDir was /data/dir1 and the value for the parameter destinationDir was /data/dir2. Then the command that the engine would execute would be:
%PIPELINE_DIR_PATH%/ant-tools/bin/AntCopy -src /data/dir1 -dest /data/dir2 -overwrite
More info on Resource descriptor
A pipeline can be launched using either the script PipelineRunner or XnatPipelineLauncher. These scripts are located in the bin folder. As mentioned before the difference between these two scripts is that XnatPipelineLauncher is XNAT aware and hence updates the workflow element as it executes the steps. Each of these scripts need the name of the pipeline which is to be executed. A pipeline descriptor may define the required parameters or one can pass the parameters as command prompt arguments to the scripts PipelineRunner or XnatPipelineLauncher.
All calls to org.nrg.pipeline.XnatPipelineLauncher.launch() method are intercepted by PIPELINE_HOME/bin/schedule. You could overload this method to :
Pipeline engine supports the Sun Grid Engine (SGE). The class org.nrg.pipeline.client.PipelineJobSubmitter, using the DRMAA API, interacts with the SGE. These are the steps that are performed by the PipelineJobSubmitter:
Integrating SGE and XNAT: Setup the machine which is hosting TOMCAT as a submit host.