1. Introduction

Setting up your own Pipeline server is a great way to remotely take advantage of the power of a cluster or a just a dedicated computer with many helpful programs installed on it. More importantly, you can enable many people to take advantage of all this power all through the easy to use interface of the Pipeline client.

Table of Contents Next: 2. Installation

The LONI Pipeline is a workflow processing application that can be used to wrap any executable for use in the environment. No change to your programs (adding code, implementing interfaces, etc.) is required for you to use the program within the LONI Pipeline. All that you need is an understanding of the program’s command line usage and you can begin using it in the Pipeline. For a visual introduction to using the Pipeline, you can take a look at some screencasts which cover a variety of different Pipeline topics.

Table of Contents Next: 2. Installation

This booklet gives an introduction to LONI Pipeline, provides installation instructions. It has several concrete and complete neuroimaging processing examples from designing modules to executing workflows to viewing results.

Click here to read the LONI Pipeline handbook in PDF format.

 

Handbook Handbook Handbook Handbook Handbook Handbook

Introduction

This is a listing of the commonly used Pipeline terms and their definitions.

Terms

Annotation – Notes that you can add to a workflow to remind yourself of pertinent information.

Cache – A directory in which the application creates intermediate output files, streams, and log files.

Command String – The exact command that was submitted by the Pipeline to the underlying operating system for execution.

Connection Manager – The dialog box which holds connections that you have created to Pipeline servers.

Data Sink – A special module that takes one or more output values and can be used as the output destination of one or more modules.

Data Source – A special module that takes one or more input values and can be used as the input source of one or more modules.

DRMAA – Distributor Resource Management Applications API. A library, or specifications, which allow applications to interact, or submit control to jobs on one or more DRM systems.

Executable – A file whose contents are meant to be interpreted as a program by a computer.

IDA – An acronym that stands for Image Database Archive. Along with being one of the protocol which the LONI Pipeline support, IDA offers the following benefits:

  • De-identification – Addresses government regulations for protection of human subject privacy
  • Data Transmission – Data is transmitted over the internet using Hyper-Text Transfer Protocol with SSL encryption (HTTPS)
  • Storage – Data is archived on a fault-tolerant storage area network (SAN), providing near 24/7 availability

Execution Dialog – A dialog which shows important messages printed by the different times during its execution.

Module – The smallest unit of a Pipeline workflow. Specifically, it is a chunk of XML that describes an executable and its inputs and outputs. It can be created by a user and placed directly into a workflow, or a user can drag and drop predefined modules from the library of any server to which they are connected.

Module Definition – The collection of information including the executable author, name, package, version, description and parameter names that must be specified for each executable to create a module that can be used in the Pipeline. Once a module definition has been created for an executable, it can be saved in the library and reused by other users.

Module Group – A collection of modules. The Pipeline can abstract a Module Group to be represented as a single module in a workflow.

Output Log – A collection of messages which are printed out to the output stream of the application.

Package – A suite of module definitions which are interrelated.

Parameter – An input or output to a module.

Personal Library – A place to store or save workflows and modules for easy access through the pipeline.

Pipeline – An environment to develop workflows for data processing, independent of data location, program location, and platform.

Validation – Occurs automatically when the user requests that a workflow be executed. Validation entails:

  • Verifying the existence of inputs, outputs and executables
  • Cycle detection
  • File cardinality checks
  • File type checking

Depending on workflow content, connection status and other variables, the validation may be more or less complex. Please refer to Pipeline documentation on validation for more information.

Workflow – A set of connected modules that performs analysis or simply processes input data.

  1. Introduction
  2. Installation
    1. Requirements
    2. Downloading
    3. Setup and launching
  3. Interface overview
    1. Connection manager
  4. Building a workflow
    1. Dragging in modules
    2. Connecting modules
    3. Setting parameter values
    4. Processing multiple inputs
    5. Enable/Disable parameters
    6. Saving a workflow
  5. Execution
    1. Executing a workflow
    2. Viewing output

1. Introduction

This Quick Start Guide to the LONI Pipeline covers the fundamentals of building a Pipeline. For a more detailed description of Pipeline features, please see the User Guide.

2. Installation

2.1 Requirements

The only requirement of the Pipeline client is an installation of JRE 1.6 or higher, which can be downloaded from Oracle. In terms of memory consumption, it’s unlikely that you’ll need to worry about having sufficient RAM to run the Pipeline.

2.2 Downloading

To get the latest version of the LONI Pipeline, go to the Pipeline web site and click on the download link in the navbar at the top. A LONI account is required to download LONI software, you can fill an application here.

2.3 Setup and launching

OS X: To install the program, double click the disk image file you downloaded, and drag the LONI Pipeline application into the Applications folder. Once the program is done copying you can unmount (eject) the disk image and throw it in the trash. To start the Pipeline, just go to your Applications folder and double-click on the LONI Pipeline application.

Windows: To install on Windows, double-click the installer and follow the on-screen instruction. Once it finishes installing, you can throw away the installer and launch the program by going to the Start menu->Programs->LONI Pipeline and start the program.

Linux/Unix: Extract the contents of the file to a location on disk, and execute the PipelineGUI script. Make sure you have the java binary in your path.

3. Interface overview

Interface Overview

3.1 Connecting to Pipeline servers

If you need to connect to different Pipeline servers, go to the ‘Window’ menu and click on ‘Connections…’. Alternatively, you can click on the disconnected circles at the bottom right of the window, and in the popup menu click on ‘Connections…’.

undefined

In here you can add a connection to any Pipeline server that you want to access. If you don’t know of any servers you can add the LONI Pipeline server (cranium.loni.usc.edu) but you will need to apply for a LONI cranium account to actually connect to it. Please note this account is different from the general LONI account. Once you’ve entered the connection, go ahead and click ‘Connect’ then close the dialog. After 30 seconds or so you’ll notice that your server library has been populated with tools from the server.

4. Building a workflow

Open a new workflow by going to File->New.

4.1 Dragging in modules

Go to the server library at the left and expand the desired package. Click on a module and drag it into the workflow canvas that you just opened. Repeat this step for all other modules that you need.

undefined

4.2 Connecting modules

Each module in a workflow can have some inputs and outputs. The inputs are on the top, and the outputs on the bottom. Connect the modules by clicking on the output parameter of a module and then dragging the mouse pointer to the following module’s input parameter.

Module connection

When you attempt to make a connection, the Pipeline does some initial checking to make sure the connection is valid. For example, it won’t let you connect a file type parameter to a number type parameter, or connecting an output to another output and more.

4.3 Setting parameter values

Now, specify values for the input parameters of each module which do NOT have a connection to a previous module. Double click on the input parameter and select the input value, making sure to choose an input that correctly matches the parameter type (File, Directory, String, Number or Enumerated). Also, File parameters can require a specific file type, so make sure to check this too if necessary.

Once you’ve set the inputs, you’ll want to specify a destination for the output of the final module. Double-click on the output parameter and specify the path where you want the output(s) to be written to.

undefined

Note that you can mix data that is located on your computer and the computer that the server resides on, and the Pipeline will take care of moving data back and forth for you. For example, the input to a module could be located on your local drive, but you could set the output to be written to some location on the Pipeline server or vice versa.

4.4 Processing multiple inputs

One of the strengths of the LONI Pipeline is its ability to simplify processing of multiple pieces of data, by using the same workflow you use to process a single input. In order to do this, you can create a Data Source and use it to feed a list of inputs into the first module. Right click on any blank space in the workflow canvas and select ‘Add Data Source’. In the dialog that opens enter some information about the data source, and then click on the ‘Data’ tab. From here, you can click on ‘Add files’ at the bottom of the dialog and add multiple files into the list, or you can just type in the path to a file manually. Note that at the bottom there is an option for a server in case you want the data source to represent data on another computer.

undefined

4.5 Enable/Disable parameters

Most modules have 2-3 required parameters on them, and several more optional parameters. If you want to exercise any of those additional options, simply double-click on the module and you’ll see a list of all the required and optional parameters for that module. For each additional option you want to use just click on the box on the left side of its name to enable it. Conversely, to disable it click on the box again. Notice that you are not able to disable parameters that are required.

4.6. Saving Workflows

In order to save a workflow, go to File->Save.

5. Execution

5.1 Executing a workflow

Once you’ve completed your workflow, you can execute the workflow by simply clicking on the ‘Play’ button at the bottom of the workflow area. If the program needs a connection to a server, it will prompt you for a username and password. If you’ve already stored a username and password to the server in your list of connections, then it will automatically connect for you.

Once all necessary connections have been made and has completed the workflow will begin to execute.

undefined

5.2 Viewing output

As the modules continue executing you can view the output and error streams of any completed module. You can bring up the log viewer by going to Window->Log Viewer or more easily, right-clicking on the module that you want to view information about and click on ‘Show Output Logs.’ This will bring up the log viewer and set its focus on the module that was clicked.

  1. Distributed Pipeline Server Installation Utility
    1. Requirements
    2. Warning
    3. Downloading
    4. GUI
      1. Start the Installer
      2. Select Components
      3. Install Grid Engine
      4. Install Pipeline
      5. Install Neuro Imaging Tools
      6. Install Neuro Bioinformatics Tools
      7. Finish Install
      8. Start the Server
    5. Command Line Installation
    6. Troubleshoot
  2. Conventional Installation (without DPS utility)
    1. Requirements
    2. Downloading
    3. Starting the server

2.1 Distributed Pipeline Server Installation Utility

The Distributed Pipeline Server Installer is a GUI installer that allows you to install and configure 3 types of resources – backend grid management resources (Grid Engine), the Pipeline server, and a number of computational imaging and informatics software tools. After successfully running the installer, you will have a running Pipeline server with grid engine managing jobs on your machine(s), imaging and informatics software tools installed, as well as a set of predefined workflows and modules in your server library.

2.1.1 Requirements

The requirements for the Pipeline server installation can be found on Distributed Pipeline Server Installer page.

Warning: If any of the requirements are not met, there may be unexpected behavior in the installer (e.g. hanging, crashing). If you have any questions, please contact pipeline@loni.usc.edu

A complete installation (including grid engine, the Pipeline server, and all software tools) can take several hours. However, this is mostly because some of the tools take a long time to download (e.g. FSL can take up to 6 hours, depending on your internet speed). If you skip the tools or have already downloaded the ones that require manual download, the total installation time is less than 30 minutes.

2.1.2 Warning

When you run the DPS installation utility to install the Pipeline server, the underlying scripts will edit the firewall rules to open up the Pipeline port for connections from clients. Be forewarned that these changes can cause unexpected results on your system. We recommend backing up your iptables before starting the installation. In the future, this automatic configuration step will be made more robust.

2.1.3 Downloading

Download the installer from the Pipeline website, under Downloads > Distributed Pipeline Server Installer.

2.1.4 GUI

The graphical interface of the DPS utility simplifies the installation experience for the user by hiding unessential details and only asking the user for minimal configuration preferences. The steps are documented below and are accompanied by screenshots.

2.1.4.1 Start the Installer

To start the installer, open a terminal, change directories to the directory where the installer file is located, and type

su root (how to become root)
tar -zxvf pipelineServerInstaller.tar.gz
cd pipelineServerInstaller
./launchInstaller.sh

2.1.4.2 Select Components

After reading and agreeing to the license, you will be asked for an installation location and what components you want to install:

You can select any* or all of the components. It will guide you through all the steps needed for the installation.

* For example, if you have already installed SGE before launching this installer, then deselect the Oracle Grid Engine component. Likewise, if you only want to install the latest tools, you can select the Neuro Imaging Tools component and uncheck the rest.

The installer will verify the Shared File System Location given. It is required to have it on NFS if the server is set to use a grid. The shared file system is used for the Pipeline server to store intermediate files of workflows and to install Grid Engine and Tools.

2.1.4.3 Install Grid Engine

In this section you can configure Grid Engine installation. You can specify an installation location, cluster name (which uniquely identifies a specific Grid Engine cluster), spool directory (for spooling data), and execution hosts (hosts that execute the tasks (jobs)). You can leave installation location, cluster name and spool directory as they are, but you must provide a list of hostnames. You must provide fully qualified domain names, so something like “host1”, “localhost” or “127.0.0.1” is not allowed.

2.1.4.4 Install Pipeline

In this section you can configure the Pipeline server. You can specify an installation directory, Pipeline server address, port and user to run the Pipeline server process. The username must already exist and you can have the option to have its sudo file modified to accommodate privilege escalation.

User authentication lets you specify the authentication mechanism for the Pipeline server. If you already have NIS configured (there are plenty of online help resources, e.g. configure NIS server and client), it’s recommended to select the NIS option. Otherwise, you can select SSH Based option, which runs ssh command to test the provided credential. You can also choose No Authentication to let anybody connect to your sever. This option should only be used for testing and on a server with limited internal network access.

If the modify sudoers file option is selected, the installer will modify the operating system’s sudoers file so that the Pipeline server user will be able to sudo as any user, except root and the optional list of users provided. For example, if you have some user that can sudo as root, then this user should be listed as an exception, so that the Pipeline user will not be able to gain root access.

Install Pipeline with SGE already installed

If you already have SGE installed and the SGE_ROOT variable is defined on your system, you can skip SGE installation by unchecking the Oracle Grid Engine checkbox from step 3 (General Configuration). The Pipeline configuration window will now have an additional checkbox to “Enable Grid Submission” which needs to be selected if you want to use Pipeline with your pre-installed SGE.

Upon checking the “Enable Grid Submission” checkbox, you will need to select a grid plugin. In order to communicate with SGE, Pipeline uses Grid Plugins. LONI provides two plugins for SGE: JGDI Plugin and DRMAA Plugin. If you are using SGE we highly recommend using JGDI Plugin as it supports more Pipeline features and is more reliable. You can choose DRMAA Plugin if you have other DRMAA supported Grid Manager installed and want to integrate Pipeline with it.

The last step is to choose the submission queue. The installer will list all of your available queues and you have to pick one for Pipeline. If you don’t have a special queue already set up for Pipeline then you can use the default queue of SGE (all.q). If you do not have any queues defined in SGE, you will have to create one yourself.

Installing Pipeline without SGE

If you don’t have SGE installed, and you uncheck the Oracle Grid Engine checkbox from step 3 (General Configuration), the installer will install Pipeline without Grid Engine. All jobs submitted to the Pipeline server will run locally on the server. You have to be careful with number of jobs submitted to the server as high number of jobs will negatively affect the server’s performance. Please see Maximum number of threads for active jobs if you want to set limits on the number of parallel running jobs.

2.1.4.5 Install Neuro Imaging Tools

In this section you can select which imaging software tools and server library files to install.

There are two components that can be selected for each NeuroImaging tool:
     • the tool itself (binaries, executables, and scripts)
     • the modules/workflows (.pipe files) associated with that tool.

You may select either or both options for any tool, but please note that you can only install workflows for tools that are already installed or you have selected to install.I f you select to install the workflows for a tool but not the tool itself, and the tool cannot be found in the default installation directory (shared file system path + “tools”) then you will be prompted to provide where that tool is installed (second image). If you find yourself here by mistake, click back and modify your selection.

If the installation type for a tool is “Automatic”, it will be installed automatically without the need for user input. Some tools are marked as “Semi-Automatic” (e.g. FSL and FreeSurfer), which means that they require you to manually download the installer files for that tool from the developer’s website. This is because of the licensing restriction imposed on the software. For these types of tools, you will be shown a window after clicking ‘Next’ which contains instructions on what website to visit, which files to install, and any other requirements for that tool.

When you satisfy all the requirements for a tool, it will begin installing in the background immediately. A green check mark will appear next to that tool in the drop menu, indicating that you have provided the necessary information and can move on to the next tool. You may preemptively cancel the installation of a tool by clicking the ‘Don’t install’ at the bottom of the window. When all tools are either installing or cancelled, this window will close automatically.

Install the tools without installing Pipeline or SGE

If, at a later time, you want to install updated versions of some tools, you can have it installed without installing the Pipeline and/or SGE. Simply check only the NeuroImaging Tools in the general configuration section of the installer, then click Next and it will go directly to the tools installation step, skipping the Pipeline and SGE installation steps.

Please note that NeuroImaging tools can only be installed if you also selected to install the Pipeline Server or already have the server installed. If you select to install these tools without selecting to install the server, and the preferences.xml file cannot be found in its default location, a browse button will appear so that the location to your preferences.xml file can be provided. If you don’t have a preferences file, it means you have not installed the server yet and it should be selected during the installation process.

2.1.4.6 Install Bioinformatics Tools

The process for installing Bioinformatics Tools is the same as NeuroImaging (outlined in the previous step) except there are currently no “Semi-Automatic” tools in this section. Note this is the final step before the Pipeline installation utility takes over and starts to download/install files so only hit ‘Install’ if you are sure that all of your previous settings are correct.

Install the tools without installing Pipeline or SGE

Just as with NeuroImaging tools, Bioinformatics tools can only be installed if you also selected to install the Pipeline Server or already have the server installed. If you select to install these tools without selecting to install the server, and the preferences.xml file cannot be found in its default location, a browse button will appear so that the location to your preferences.xml file can be provided. If you selected NeuroImaging tools as well and already indicated the path to your preferences file in the NI Tools Configuration panel, you will not see this button.

2.1.4.7 Finish Install

After the installation has successfully completed, you will be shown a summary screen. Clicking the Finish button with “Start the LONI Pipeline Server” checked will exit the installer and launch the Pipeline server. You can also check the “Start Client to validate the installation” option to launch the client and test run a workflow.

Additionally, you may want to configure advanced server preferences by clicking on “Configure the server with advanced options…”. This will automatically open the server configuration tool, where you can edit the details of your server.

If you have any questions, please contact pipeline@loni.usc.edu

2.1.4.8 Start the Server

If you checked the “Start the LONI Pipeline Server” option on the summary page of the installation, the Pipeline server process will be started. To check the logs of the Pipeline server, go to the Pipeline server’s directory (/usr/pipeline by default), specified in the Install Pipeline step. You will find files called outputStream.log and errorStream.log, which store output and error stream information. You can verify if the server started successfully by checking the contents of the outputStream.log file. It should look something like this:

[ 1/6 ] Connecting to Persistence Database..............DONE [117ms]
[ 2/6 ] Starting server on port 8001....................DONE [1152ms]
[ 3/6 ] Loading server library..........................DONE [31ms]
[ 4/6 ] Loading server packages info....................DONE [7ms]
[ 5/6 ] Checking to resume backlogged workflows.........DONE [0ms]
[ 6/6 ] Checking to resume active workflows.............DONE [0ms]
[ SUCCESS ] Server started.

You can stop and start the Pipeline server by calling (root access required):

/etc/init.d/pipeline stop
/etc/init.d/pipeline start

The Pipeline and persistence database will be started/stopped in order, and the pipeline user will run these processes.

If you don’t have root access, you can stop and start the Pipeline server as the pipeline user. It will be equivalent to the init.d method above. To stop and start the Pipeline server, go to the Pipeline server’s directory and type

./killServer.sh
./launchServer.sh

Always check if the server has started successfully by viewing the outputStream.log file. If it shows error on persistence database, you can stop and start the persistence database process by typing:

./db/stopDB.sh
./db/startDB.sh

After the persistence database has been restarted, restart the Pipeline server as noted above.

2.1.5 Command Line Installation

An alternative to using the GUI to install the Pipeline server is an automated method that relies on a configuration file. All of the fields that are entered via the GUI are represented within a hierarchical XML file. A default configuration file is included in the distribution directory (dist/install_files) of the installer, which you can download here). After you set up your configuration file, you can run the installation in automatic mode by typing the following into your shell:

tar -zxvf pipelineServerInstaller.tar.gz
cd pipelineServerInstaller
./launchInstaller.sh -auto dist/install_files/DefaultInstallationPreferencesFile.xml

A complete template for the XML file can be found here. If you use this template as a starting point, note that it has a lot of placeholders and is not set up to run “as is”, so you would have to make many modifications. For reference, each of the tags is documented below:

  • DistributedPipelineServerInstaller: root tag, contains all other tags
  • SharedFileSystemPath: path to a directory that is shared (via NFS) between the host running the Pipeline server and qmaster, admin, and execution hosts of SGE
  • JDKLocation: only include this tag if you don’t already have Oracle JDK running on the host where you’re installing the Pipeline server; the value should be the path to the JDK RPM, which you can install from the Oracle page
  • PipelineServer: use attribute enabled=”true” to indicate that you would like to install the Pipeline server; the children of this element will specify information about the server installation
    • InstallLocation: specifies location where Pipeline server is to be installed
    • Hostname: specifies the hostname of the host where Pipeline server is being installed
    • Port: specifies port on which the Pipeline server will be accepting connections from clients
    • Username: specifies user that will be running the Pipeline server
    • TempDir: specifies a directory where Pipeline modules will write intermediate files
    • ScratchDir: specifies a scratch directory where sample workflows will write their outputs; this value then becomes available to users through the pre-defined ${tempdir} variable, documented here
    • GridSubmission: use attribute enabled=”true” to indicate that you would like the Pipeline to submit jobs via grid engine to execution hosts; otherwise, the jobs will be run locally on the host running the Pipeline server
      • GridPlugin: options are JGDI or DRMAA
      • GridSubmissionQueue: the SGE queue where Pipeline should submit its jobs
    • UsePrivilegeEscalation: options are true or false; privilege escalation is documented here
    • DBInstallLocation: path to a directory where you would like to install the Pipeline database; if it doesn’t exist, it will be created by the installer
    • StartPipelilneOnSystemStartup: set value to true if you would like to configure the system to start the Pipeline server on startup; false, otherwise
    • AuthenticationModule: options are SSH, NIS, and NoAuth; these are documented here
    • ModifySudoers: use attribute enabled=”true” to indicate that you want to add the Pipeline user to the sudoers list
      • SuperUsers: comma-separated list of users that you don’t want the Pipeline server to sudo as (default: root)
    • MemoryAllocation: specify the amount of memory you would like to allocate to the Pipeline server/database, in megabytes
  • PreferencesPath: if the Pipeline server is not being installed (i.e., the PipelineServer element is missing or has attribute enabled=”false”), then the user must specify the path to the Pipeline server preferences file (by default, the path is /usr/pipeline/preferences.xml); if the Pipeline server is being installed, you can omit this element.
  • SGE: use attribute enabled=”true” to indicate that you would like to install Son of Grid Engine; the tags that follow will describe some of the preferences for the installation; you can find documentation on SGE here
    • SGERoot: path to directory where you would like to install SGE (default: /usr/local/sge)
    • SGECluster: name of cluster that you would like to install (default: cluster)
    • SubmitHosts: specify hostnames of machines which will be configured to handle job submission and control; you can do this using one hostname per Host element, as children of the SubmitHosts element
    • ExecHosts: specify hostnames of machines which will be execution hosts; use same format as for SubmitHosts
    • AdminHosts: specify hostnames of machines that will be used for SGE administration purposes; use same format as for SubmitHosts
    • AdminUsername: user that will serve as SGE administrator
    • SpoolDir: path to a directory that will be used for spooling during installation
    • Queue: use attribute configure=”true” to indicate that you would like to configure a queue at the end of SGE installation; this is documented here
      • Name: the name of the new queue that you would like to configure
      • Hosts: the hosts that you would like to add to the queue
      • Slots: the slots that you would like to add to the queue (the difference between hosts and slots is documented here)
  • Tools: use the attribute enabled=”true” to indicate that you would like to install some tools; also use the path attribute to specify the directory where you would like to install the tools (note that this should be in an NFS-shared directory)
    • NeuroImagingTools: use the attribute enabled=”true” to indicate that you would like to install one or more NeuroImaging tools; true/false values for the all_executables and all_serverlibs tags indicate that you want to install the executables and/or .pipe files for all NeuroImaging tools, regardless of what values each tool is set to.
      • Available neuroimaging tools: AFNI, AIR, BrainSuite, FSL, FreeSurfer, LONI, MINC, ITK, DTK, GAMMA, and SPM; for each of these, the executables=”true” attribute is used to activate the tool installation and the serverlib=”true” attribute is used to activate the .pipe files for that tool; note that FSL, FreeSurfer, and DTK require that the user specify a sub element, namely ArchivePath, whose value is the path to the archive file, downloaded manually from the software website.
    • BioinformaticsTools: same attributes as NeuroImagingTools tag
      • Available bioinformatics tools: EMBOSS, Picard, MSA, BATWING, BayesAss, Formatomatic, GENEPOP, Migrate, GWASS, MrFAST, Bowtie, SamTools, PLINK, MAQ, miBLAST; again, the enabled attributes can be used to indicate activation or deactivation of installation for each of these elements

2.1.6 Troubleshoot

The following is a list of common problems and explanation:

– The provided directory seems not to be a network file shared (NFS) directory.
The installer will verify the Shared File System location given. It is required to be on NFS if the server is set to use a grid. The shared file system is used for the Pipeline server to store intermediate files of workflows and to install Grid Engine, NeuroImaging, and Bioinformatics Tools.

– For a Grid Engine installation, the local hostname cannot be “localhost” and/or the IP address is like 127.0.*.*
You must provide fully qualified domain names as hostnames (such as “host1″); “localhost” or “127.0.0.1″ is not allowed.

– Cannot enable Grid submission as SGE doesn’t have any queue.
If you do not have any queue defined in SGE, you have to create one yourself and recheck “Enable Grid submission” checkbox and select the queue.

– Why I can’t connect to the server?
If you have the Pipeline server running but you can’t have your client connect to it (shows “Server not found” message), you need to check your firewall settings and enable port 8001.

– Why is my first workflow taking so long?
When you have SGE installed and you submit jobs for the first time, it may take a long time to get the jobs running. This is because initially the SGE sees the compute nodes loaded heavily, but as time passes, the loading information will be updated more accurately.

2.2 Conventional Installation (without DPS utility)

If you’d like to install the Pipeline server by hand, here are some instructions on how to get started. Note that if you choose this route, you’ll have to carry out quite a bit of configuration on your own. This is only recommended if you’ve done it before or have a thorough understanding of the inner workings of the Pipeline server. Otherwise, use the DPS utility.

2.2.1 Requirements

The Pipeline server can run on any system that is supported by JRE 1.6 or higher, so the first thing to do is head over to the official Java website to download the latest JRE/JDK. If you run the server on Windows, you will not be able to use privilege escalation (you might not even need/want it). Also the Failover feature is only supported by Unix/Linux systems. All other features are available for all platforms.

The amount of memory required varies based on the load you will expect on the server, but for a reference point, as of summer 2010, the main Pipeline server running at LONI has been set to accept a max load of 620 jobs, and its memory footprint hovers between 50-300MB depending on the load and garbage collection scheme.

2.2.2 Downloading

Head over to the Pipeline download page and download the latest version of the program for Linux/Unix. The server and the client are both in the same jar file, so you only need to change the Main entry point when starting up the server. Extract the contents of the download to the location you want to install the server at.

2.2.3 Starting the server

Now let’s start the server for the first time. Get to a prompt and switch to the directory where you copied the Pipeline.jar and lib directory and type:

$ java -classpath Pipeline.jar server.Main

Assuming you have java in your path, you should have received the following message back in your terminal window:

[ 1/6 ] Connecting to Persistence Database..............DONE [61ms]
[ 2/6 ] Starting server on port 8001....................DONE [747ms]
[ 3/6 ] Loading server library..........................DONE [336ms]
[ 4/6 ] Loading server packages info....................DONE [2ms]
[ 5/6 ] Checking to resume backlogged workflows.........DONE [46ms]
[ 6/6 ] Checking to resume active workflows.............DONE [0ms]
[ SUCCESS ] Server started.

That’s not enough to have a fully functional server yet, but we’re a step closer, so go ahead and break out of the process by hitting Ctrl-C and then let’s begin configuration process.

Previous: 1. Introduction Table of Contents Next: 3. Configuration

Handbook Handbook Handbook Handbook Handbook Handbook This booklet gives an introduction to LONI Pipeline, provides installation instructions. It has several concrete and complete neuroimaging processing examples from designing modules to executing workflows to viewing results. Download Pipeline handbook, V1.5 (.pdf, 12.8 MB) Based on Pipeline version 6.2.1 Download Pipeline brochure (.pdf, 8.5 MB)

  1. Requirements
  2. Downloading
  3. Setup and launching

2.1 Requirements

The only requirement of the Pipeline client is an installation of JRE 1.6 or higher, which can be downloaded from Oracle. Note to Linux users, your system may have java installed by default, but it may not be Oracle’s version. To check which version of java you are running, under terminal, type java -version. If you did not see something like “Java HotSpot(TM)”, then you need to download and install Java from Oracle.

In terms of memory consumption, it’s unlikely that you’ll need to worry about having sufficient RAM to run the Pipeline.

2.2 Downloading

To get the latest version of the LONI Pipeline, go to the Pipeline web site and click on the download link in the navbar at the top.

2.3 Setup and launching

OS X: To install the program, double click the disk image file you downloaded, and drag the LONI Pipeline application into the Applications folder. Once the program is done copying you can unmount (eject) the disk image and throw it in the trash. To start the Pipeline, just go to your Applications folder and double-click on the LONI Pipeline application.

Windows: To install on Windows, double-click the installer and follow the on-screen instruction. Once it finishes installing, you can throw away the installer and launch the program by going to the Start menu->Programs->LONI Pipeline and start the program.

Linux/Unix: Extract the contents of the file to a location on disk, and execute the PipelineGUI script. Make sure you have the java binary in your path.

Previous: 1. Introduction Table of Contents Next: 3. Interface Overview

Table of Contents

  1. Introduction
  2. Installation
    1. Requirements
    2. Downloading
    3. Setup and launching
  3. Interface overview
    1. Server library
    2. Personal library
    3. Workflow area
    4. Connection manager
    5. Provenance editor
    6. Preferences
    7. Search feature
    8. Checking for latest updates
    9. Starting GUI from command line
    10. Running from the command line
      1. Submitting workflows from command line
      2. Managing workflows from command line
  4. Building a workflow
    1. Dragging in modules
    2. Connecting modules
      1. Smartline
    3. Setting parameter values
    4. Data sources and data sinks
    5. Cloud sources and cloud sinks
    6. Study module
      1. Input data tab
      2. Grouping tab
      3. Matrix tab
    7. Conditionals
      1. File conditions example
      2. Arithmetical/Comparison example
      3. Metadata conditions example
    8. Web service modules
    9. Transformer modules
    10. Remote file browser
    11. Processing multiple inputs
    12. Enable/Disable parameters
    13. Annotations
    14. Variables
    15. IDA
    16. NDAR
    17. XNAT
    18. Cloud storage
    19. Server changer
  5. Execution
    1. Validation
    2. Executing a workflow
    3. Client disconnect/reconnect
    4. Server status
    5. Pausing a workflow
    6. Stopping a workflow
    7. Restart a module
    8. Module statuses
    9. Viewing output
    10. Debugging execution
    11. Report a bug
  6. Creating modules
    1. Module definition
      1. Info tab
        1. General module information
        2. Citation information
      2. Parameters tab
        1. General parameter information
        2. Parameter types
        3. File types
        4. Parameter arguments size
        5. Advanced parameter information
          1. Select dependencies
          2. Transformations
          3. Output/Error stream extraction
          4. Metadata extraction
          5. Output list file
      3. Execution tab
        1. Executable location
        2. Advanced options
      4. Metadata tab – Metadata Augmentation
    2. Alternative methods
      1. From help file
      2. Module Suggest
    3. Module groups
  7. Advanced Topics
    1. Syncing Execution Flow
    2. Exporting Pipeline Workflow to Script
    3. Remote GUI Invocation
    4. Workflow Diff Utility

Table of Contents

  1. Introduction
  2. Installation
    1. GUI Installer (Distributed Pipeline Server Installer)
      1. Requirements
      2. Downloading
      3. Start the Installer
        1. Select Components
        2. Install Grid Engine
        3. Install Pipeline
        4. Install Tools
        5. Finish Install
        6. Start the Server
      4. Troubleshoot
    2. Command Line Installation
      1. Requirements
      2. Downloading
      3. Starting the server
  3. Configuration
    1. General
      1. Hostname and port
      2. Temporary directory
      3. Scratch directory
      4. Log file location
      5. Use privilege escalation and Enable guests
      6. Persistence
      7. Days to persist session status
      8. History directory
      9. Crawler persistence URL
      10. Server library
    2. Grid
      1. Grid engine native specification
      2. Job name prefix
      3. Grid complex resource attributes
      4. Grid Variables Policy
      5. Max number of parallel submission threads
      6. Max number of resubmissions for “error stated” jobs
      7. Grid total slots
      8. Array jobs
      9. Grid plugin
      10. Finished job retrieval method
      11. Pipeline user is a grid engine admin
      12. Grid job accounting
    3. Access
      1. Server admins
      2. Directory access control
      3. User management
      4. Workflow management
    4. Mappings
      1. Packages
      2. Executables
      3. Utilities
    5. Advanced
      1. Failover
      2. Log email
      3. Network
      4. Maximum number of threads for active jobs
      5. HTTP query server
      6. Automatically clean up old files
      7. Maximum number of metadata threads
      8. Warn when free disk space is low
      9. Server status
      10. Directory source recursive timeout
      11. External network access queue
      12. Validation warning
      13. Check and verify output files
      14. Test server library
  4. Authentication
    1. Authentication Quickstart
  5. Monitor and Manage
  6. Grid Plugin API
  7. Grid Stat Plugin API