dibig_tools is a collection of useful tools and scripts for HiPerGator. To use this module, execute:

module load dibig_tools

This module currently includes the following tools:

  • FASTools
  • MethylMapper
  • submit.py – User-friendly replacement for qsub
  • actor.py – Self-documenting scripts for HiPerGator
  • split-sam.sh – Split large SAM/BAM files.
  • csvtoxls.py – Convert delimited files to Excel spreadsheets.

The submit.py command

submit.py is a Python script that replaces the standard qsub command to submit jobs to HiPerGator. Its main advantage is that it allows you to easily pass command-line arguments to the submitted script. Instructions on using submit.py can be obtained by calling it with no arguments:

$ submit.py
usage: submit.py [-after jobid] [-done name] scriptName [arguments…]
Submits script “scriptName” using qsub, passing the values of “arguments” to the script as $arg1, $arg2, $arg3, etc.
If “-after” is specified, the script will run after the job indicated by “jobid” has terminated successfully. Multiple “-after” arguments may be specified.
If “-done” is specified, the script will create a file called “name” when it terminates. This can be used to detect that execution of the script has finished.
This command returns the id of the submitted job, which is suitable as the -after argument for a subsequent job. For example:
STEP1=`submit.py step1.sh`
STEP2=`submit.py -after $STEP1 step2.sh`
Copyright (c) 2013, University of Florida
A. Riva (ariva@ufl.edu)

For example, a script to perform the grep command as a HiPerGator job could be written as follows:

grep $string $filename

Assuming the script above is saved as grep.qsub, it could be called as follows:

$ submit.py grep.qsub word filename.txt

The split-sam.sh command

The following is the documentation for split-sam.sh:

split-sam.sh – Split SAM or BAM files into multiple, smaller files for easier processing.
Usage: /apps/dibig_tools/1.0/bin/split-sam.sh inputfile [nlines] [prefix] [npost]
inputfile – required, should be either a .sam or .bam file.
nlines – number of lines in each output file (default: 1000000).
prefix – prefix of the output files (default: ‘part-‘).
npost – number of letters to use in filenames postfix (see below).
Output file names are generated by concatenating ‘prefix’ with the strings aaa, aab, aac, etc. This allows for at most 1000 output files. If this is not sufficient, you can increase the number of letters used with the npost argument. For example, if the value for that argument is 4, the strings will be aaaa, aaab, aaac, etc (10000 max output files). Output files are written in SAM format.
After alignment, the resulting BAM files can be concatenated together with the samtools ‘cat’ or ‘merge’ commands. Please see samtools documentation for details.

The csvtoxls.py command

A command-line tool to convert one or more delimited files to an Excel spreadsheet. Full documentation is available on the program’s GitHub page.