PASTA tutorial

Configuration

The behavior of PASTA is controlled by a large number of command-line arguments. To facilitate its usage, default values for these arguments can be written to a configuration file, so that it is not necessary to specify them on the command-line. This is especially useful for options that do not normally change, such as the installation path of external programs. This section describes how to modify the supplied example configuration file to suit your environment.

We will assume you downloaded the software and uncompressed it into a directory $PASTA_HOME. Go to the $PASTA_HOME/doc/ subdirectory and edit the configuration file conf.txt with a text editor (e.g., vi, nano). Lines in the configuration file specify the default values for command-line options, using the format name=value, where name is the command-line argument without the leading ‘-‘ character. For example, the command-line argument -o mm9 can be specified in the configuration file as x=mm9.

You should set at least the values for the following options:

i=/home/user/bowtieindex/mouse/mm9
s=/home/user/software/bowtie-1.0.0/bowtie
o=mm9
r=/home/user/genomes

These options are explained in detail below:

  1. The i option (corresponding to the -i command-line argument) specifies the basename of the bowtie index files. In this example, the index files start with prefix mm9 and are stored in the directory: /home/user/bowtieindex/mouse/. For example, the contents of this directory may look like this:

    mm9.1.ebwt mm9.2.ebwt mm9.3.ebwt mm9.4.ebwt mm9.rev.1.ebwt mm9.rev.2.ebwt

  2. The s option (corresponding to the -s command-line argument) specifies the path of the bowtie exectuable. In this case, bowtie is installed under /home/user/software/bowtie-1.0.0/bowtie.
  3. The o option (corresponding to the -o command-line argument) specifies the organism of interest. This will usually be a UCSC genome identifier such as ‘hg19’ or ‘mm9’.
  4. The r option indicates the name of the ‘parent’ directory containing reference sequences for the various organisms used by PASTA. This directory should contain a subdirectory with the same name as the value of the o option, which in turn contains on FASTA file for each chromosome. For example, in this case, the /home/user/genomes/mm9 directory should contain:

    chr1.fa chr2.fa chr3.fa chr4.fa chr5.fa chr6.fa chr7.fa chr8.fa chr9.fa chr10.fa chr11.fa chr12.fa chr13.fa chr14.fa chr15.fa chr16.fa chr17.fa chr18.fa chr19.fa chrX.fa chrY.fa chrM.fa

Test run

After editing the configuration file, you can test the program using an example dataset. Return to the $PASTA_HOME directory and execute ./sample.sh. You should see an output similar to the following:

$ ./sample.sh
==================================
This sample script will run pasta
using the following command line:

/home/shaojun/software/pasta-0.95/pasta \
-f1 /home/shaojun/software/pasta-0.95/sample/read1.fastq \
-f2 /home/shaojun/software/pasta-0.95/sample/read2.fastq \
-d /home/shaojun/software/pasta-0.95/output/ \
-conf /home/shaojun/software/pasta-0.95/doc/conf.txt \
-debug /home/shaojun/software/pasta-0.95/output/debug.log
==================================
Press enter to run pasta, or ctrl-c to quit.

==================================================================
* PASTA 0.95 - Patterned Alignments *
* for Splicing and Transcriptome Analysis *
* *
* © 2011, Shaojun Tang & Alberto Riva, University of Florida *
* Contact Shaojun Tang (sjtang@ufl.edu) with any problems. *
* *
* Developed with: International Allegro CL Enterprise Edition *
* 8.2 [64-bit Linux (x86-64)] (May 7, 2012 12:32) *
* © 1985-2011, Franz Inc, Oakland, CA, USA. All Rights Reserved. *
==================================================================

Reading configuration file /home/shaojun/software/pasta-0.95/doc/conf.txt.

Starting analysis at 12/16/2013 22:40:34
Reads file: /home/shaojun/software/pasta-0.95/sample/read1.fastq
Paired reads file: /home/shaojun/software/pasta-0.95/sample/read2.fastq
Read length: 40
Output directory: /home/shaojun/software/pasta-0.95/output/

Step 1: [12/16/2013 17:40:34] Reading and indexing short reads.
Total number of reads: 2500
Step 2: [12/16/2013 17:40:37] Starting paired-end short reads alignment.
Running bowtie...
Step 3: [12/16/2013 17:44:07] Extracting unaligned read pairs.
Total number of mappable reads: 1314
Step 4: [12/16/2013 17:44:08] Starting stepwise alignment with first set of paired reads.
Step 5: [12/16/2013 17:46:05] Starting stepwise alignment with second set of paired reads.
Step 6: [12/16/2013 17:48:02] Combining results on stepwise alignments of paired reads.
Step 7: [12/16/2013 17:48:12] Re-organizing reads files (just some cleanup).
Step 8: [12/16/2013 17:48:24] Performing dynamic local alignment on remaining reads.
Step 9: [12/16/2013 17:49:18] Generating final splice junctions.
Total number of splice junctions detected: 308

Running PASTA

You can use the command-line printed by the sample.sh command above as a template:

$PASTA_HOME/pasta \
-f1 … path to main fastq file … \
-f2 … path to paired fastq file … \
-d … path to output directory … \
-conf … path to configuration file … \
-debug … path to debugging logfile … \
-o organism code

Call pasta -help all to get a full printout of all available command-line options.