Installation

As the pipeline relies on several components of a standard Linux distribution it only works on Linux systems, preferably with as much RAM and hard drive space available as possible. It can easily consume RAM in excess of 12GB and will probably require at least 3-4 times as much storage space as the untranslated nucleotide database you are interested in searching. Being a Python program, the pipeline naturally requires Python to run (it should be included in any standard linux distribution). The Python version used when developing this pipeline was Python 2.4.3 (#1, Jun 11 2009, 14:09:37). It should be able to run on versions newer than this. It has been tested with Python version 2.6.6 and should work with Python version up to 2.7.x, but this I cannot guarantee :).

The pipeline is distributed as a single gzipped tar-archive and is ‘installed’ simply by extracting the contents of this to a directory of your choice. It could be convenient to add this directory to your path, or create a symbolic link to the program in your ~/bin directory. The following files are included in the archive:

qnrpipeline.py   Contains the pipeline, i.e. the "program"
fluff.py         Helper module with functiond for the pipeline
README.pdf       This document in PDF format
README.txt       This document in text format
tutorial/        A directory containing files for tutorial example
hmm/             A directory containing the HMM of known PMQR sequences

Here is an example of how to ‘install’ the pipeline:

$ tar -xf qnrpipeline-0.8067.tar.gz
$ ln -s qnrpipeline-0.8067/qnrpipeline.py ~/bin/qnrpipeline.py

In some cases it might be necessary to make the program executable by running:

$ chmod +x qnrpipeline-0.8067/qnrpipeline.py

Required software

Make sure that the following programs are installed and available on PATH:

  • HMMER version 3.0 or above,
  • NCBI Blastclust version 2.2.23 or above (from the legacy BLAST package),
  • MAFFT version 6.811b or above,
  • cdbfasta version 0.99 or above.

Please make sure that these are available and working as they should within your environment. To prepare the database for use with the pipeline the EMBOSS suite toolset can be used (e.g. translation to amino-acid sequence).

Make sure that the legacy BLAST suite is installed properly, especially take care to ensure that there is a BLASTMAT environment variable defined as this is needed for BLASTclust to run properly. This can be tested by running:

$ echo $BLASTMAT

the output you get should be a path to where the BLAST substitution matrices are stored. If it is not working, add the following to your ~/.bash_profile (make sure to enter the correct path for you system):

BLASTMAT=/path/to/blast-2.2.25/data/
export BLASTMAT

It should also be possible to supply it on the command line directly for simple usage for the current session only.

Table Of Contents

Previous topic

Introduction

Next topic

Pipeline

This Page