PASTA2 is the second module in the PASTA computational pipeline for the analysis of transcriptome data from RNA-Seq experiments. Building on the splice junction predictions generated by PASTA1, PASTA2 automatically builds alternatively spliced gene models and identifies differentially expressed isoforms in case-control experiments.
PASTA2 is distributed as a GNU/Linux command-line 64bit executable. The downloadable package contains the program, documentation, and sample data. Source code is available upon request.
PASTA2 requires the reference sequence for the genome under analysis. The reference sequence should be stored as one file per chromosome, named chrN.fa where N is the chromosome number.
Installation and testing
Extract the package in a suitable directory using the following command:
tar xvf pasta2-1.0.tar.gz
This will create a pasta2-1.0/ directory, containing the
pasta script and three subdirectories:
bin/ directory contains compiled code for the PASTA2 program, please do not modify anything in it. The
sample/ directory contains two sample files for testing PASTA2: exonjunctions.txt (tab-delimited file containing splice junction coordinates in the format “chr start end expression-level”) and alignments.txt (default alignment output from Bowtie).
To run the sample script, simply go to the pasta2-1.0 directory and run sample.sh like this:
./sample.sh /dir/mouse/mm9/ (Please provide your full directory path to the mouse genome). Feel free to examine sample.sh to see how it invokes the PASTA2 program.
To run an analysis, use the
bin/pasta2 script, supplying options on the command line and/or in a configuration file. The basic syntax of the pasta2 program is the following:
pasta2 [options] -dir dir -refseq referenceDir -exonjunction junctionsFile -mapping mappingFile
A complete description of all PASTA2 arguments can be found in this file, or can be obtained calling
pasta2 -help all. A short description of the arguments follows (when two consecutive arguments appear before a description, they are aliases of each other):
- Path to the directory where the results will be stored.
- Path to the directory where input files are stored.
- The name of the input file containing predicted splice junctions.
- The name of the input file containing RNA-Seq alignment results.
- Location of the reference sequence files.
- Prefix for output files.
- Minimum gene coverage.
-dir /home/user/rnaseq/data/ \
-exonjunction exonjunctions.txt \
-mapping alignments.txt \
-refseq /home/user/data/ReferenceGenome/mm9/ \