XML Overview for Developers

Module descriptions

To create modules describing your executables, let’s take a look at a simple example.

$ ~/bin/program -input inFile.dat -output outFile.dat

The program has one input parameter and one output parameter, with a switch preceding each one. To create pipeline xml that describes this, let’s start with the file prolog and version number:

<?xml version="1.0" encoding="UTF-8"?> <pipeline version=".1"> </pipeline>

Every pipeline file needs to be wrapped in a Module Group, so we need to add that into the file as well.

<?xml version="1.0" encoding="UTF-8"?> <pipeline version=".1"> <moduleGroup> </moduleGroup> </pipeline>

Now if we want to describe an entire workflow, we would populate the <moduleGroup> element with identifying attributes that we’ll be covered later. However, because we want to describe a single executable, we need to place a <module> element inside the <moduleGroup> and fill in the relevant attributes.

<?xml version="1.0" encoding="UTF-8"?> <pipeline version=".1"> <moduleGroup> <module name="My Example Program" description="This program does amazing cool stuff." location="pipeline://localhost//bin/program" > </module> </moduleGroup> </pipeline>

That information is a good start for our module, but we can be more descriptive to help out our users. For example, we can add information about the package and package version by using the ‘package’ and ‘version’ attributes in the module element. You can also note the executable version by using the ‘executableVersion’ attribute.

Now, let’s add some information about the author(s) of the executable that we’re describing. To do this, we just add an <executableAuthor> element underneath the <module> element, with as many <author> elements inside as necessary.

Optionally, you can also add information about the person/people who described/created the module definition (the XML that you are writing) of the executable. To do this, you just add an <authors> element with multiple <author> elements underneath it.

Your program may have been based on some publications that you or someone else has written, and to add that into the XML, add a <citations> element with as many <citation> elements underneath it as necessary. The text of the citation doesn’t need to be formatted in any particular way, but if you want your PubMed IDs are formatted appropriately (i.e. PMID: 1873403), the Pipeline GUI will provide a link for users to go directly to that paper on the PubMed website. If you enter a DOI id in the citation text, the Pipeline will make those linkable as well.

Within the Pipeline GUI, users can search through the library of tools by typing in keywords that will be searched against various attributes (name, authors, description, binary location, etc.) of the module definitions. It will also search against tags that the module describer adds. To tag our module definition, just add a <tag> element for each tag you want to add.

Parameters

Now that we’ve thoroughly described the program’s functionality, we’ll learn how to define our inputs and outputs. If you recall from our example program at the top of the page, we had just one input and one output. to our program, with the input coming first on the command line. Let’s add the input parameter first by using the <input> element.

<?xml version="1.0" encoding="UTF-8"?> <pipeline version=".1"> <moduleGroup> <module name="My Example Program" description="This program does amazing cool stuff." location="pipeline://localhost//bin/program" package="AIR" version="2.4" executableVersion="1.3" > <authors> <author fullName="Ms. Module Describer 1" email="describer1@loni.usc.edu" /> <author fullName="Mr. Module Describer 2" /> </authors> <executableAuthors> <author fullName="Arash Payan" email="SPAMASAURUS@loni.usc.edu" website="http://www.arashpayan.com" /> <author fullName="Linus Torvalds" email="SPAMASAURUS@kernel.org" website="http://www.kernel.org" /> </executableAuthors> <citations> <citation>Woods, R.P., Grafton, S.T., Holmes, C.J., Cherry, S.R., and Mazziotta, J.C. (1998a). Automated image registration: I. General methods and intrasubject, intramodality validation. J Comput Assist Tomogr V22, 139-152. PMID: 9448779</citation> <citation>Woods, R.P., Grafton, S.T., Watson, J.D., Sicotte, N.L., and Mazziotta, J.C. (1998b). Automated image registration: II. Intersubject validation of linear and nonlinear models. J Comput Assist Tomogr 22, 153-165. PMID: 9448780</citation> </citations> <tag>converter</tag> <tag>translator</tag> <input name="Input" description="The input file to our program." enabled="true" required="true" order="0" switch="-input" switchSpaced="true"> <format type="File" cardinality="1"> <fileTypes> <filetype name="AIR file" extension="air" description="AIR Linear transformation" /> <filetype name="MNC file" extension="mnc" description="MNC file" /> <filetype name="Text file" extension="txt" description="Text file" /> <filetype name="Analyze Image" extension="img" description="Analyze Image file"> <need>hdr</need> </filetype> </fileTypes> </format> </input> <output name="Output" description="The output file produced by the program." enabled="true" required="true" order="1" switch="-output" switchSpaced="true"> <format type="File" cardinality="1"> <fileTypes> <filetype name="XFM file" extension="xfm" description="MNC transformation" /> </fileTypes> </format> </output> </module> </moduleGroup> </pipeline>

Most of the attributes are self-explanatory on the <input> element. The ‘order’ attribute just specifies the order in which a parameter should appear on the command line (0-indexed). The ‘enabled’ attribute tells the Pipeline if that particular parameter is enabled for usage in the workflow (required parameters will always be enabled, even if the ‘enabled’ attribute is set to false in the XML). The ‘switchSpaced’ attribute is usually true, but can be set to false if the parameter you’re trying to describe does not want a space between its switch and it’s arguments. For example, a parameter that’s specified with a -k might need to be followed by an integer (-k0 or -k1 or -k2). In this case, you would want ‘switchSpaced’ to be set to false.

In the <format> element underneath the <input> and <output> elements there are two attributes to worry about (for now). The first one is the ‘type’ attribute, which describes the type of data the parameter accepts/produces. The types supported by the Pipeline are

File Directory String Number Enumerated

The ‘cardinality’ attribute specifies how many arguments the parameter requires after the switch (e.g. the parameter requires 3 filenames, or 2 numbers). Acceptable values are below

-2 'n' cardinality -1 Inifinite cardinality 0 Does not need any arguments 1 Needs 1 argument 2 Needs 2 arguments 3 Needs 3 arguments ...

The cardinality of an output parameter can not be infinite (-1), and the cardinality of an input parameter can not be ‘n’ (-2). An ‘n’ cardinality parameter means that its argument count will be equal to the number of input arguments given to its ‘base’ parameter. For example, let’s say we create an input parameter with infinite cardinality named ‘Input’, and an output parameter named ‘Output’ (we’re very creative here) with cardinality ‘n’, and a cardinality base of ‘Input’. If the user binds 22 files to the ‘Input’ parameter, then the Pipeline will expect the program to create 22 output for the ‘Output’ parameter. The XML for this example input/output pair is the following:

If you specify the parameter to be of type ‘File’ you’ll need to specify the accepted filetype(s) inside the <format> element. NOTE: Inputs can accept multiple file types but outputs can only have 1. The <filetype> element is pretty simple. It has 3 attributes and optional <need> elements. The attributes are
name The human readable name of the file type
extension The extension of the filetype (no ‘.’ required)
description A short description of the file type
<need> A nested element (you can have multiple) that contains the extension of any file needed by this type

Transformations

Assume we have a program that takes an input, creates two outputs. One of the outputs is explicitly defined by the user, and the other is automatically created by the program with the usage of the ‘-analysis’ flag, like so:

$ ~/bin/program -input inFile.dat -output outFile.dat -analysis
The outputs of this program are

outFile.dat outFile.analysis

Transformations are a way of describing the form of inputs and outputs in the pipeline by using statically defined strings (ie. ‘inFile.dat’, ‘outFile.dat’, etc.) and manipulating them to create descriptions of inputs/outputs. In our above example, we had two outputs, of which the second we couldn’t access. Using transformations though, we can easily describe the second output:

<format type="File" base="outFile.dat"> <transform order="1" op="subtract">dat</transform> <transform order="2" op="append">analysis</transform> </format>

The first transformation deletes the string ‘dat’ from the end of the base string, which in this case is outFile.dat. Then the second transformation appends the string ‘analysis’ to the resulting string of transform 1. These two transformations can be assigned to a Parameter, and it will be able to reference the second output that we were previously unable to use before transformations. There are 4 types of transformations that can be used to describe parameters:

subtract
prepend append replace

Usage of those four transformations should be sufficient for describing most parameters. ‘Delete’ and ‘insert’ should be implemented using the ‘replace’ transformation. Also note that the second output of our example was a ‘File’ specified in the ‘format’ element.

XSD file

The XSD file for the Pipeline XML schema is also available for download here.

The XSD for Provenance files is available for download here.

XML Overview for Developers

Module descriptions

Parameters

Transformations

XSD file

Learn

Manuals

Video Tutorials

More Help

Resources