CSCALL

CSCALL is a program to detect single-base conversion events in short reads. It is specifically designed to analyze the results of sodium bisulfite sequencing experiments, and is optimized to work on large genomes and multiple BAM files at once. Moreover, CSCALL can be used to detect conversion events on a number of different sites (e.g. CG, GC, CHG, CHH, etc).

Requirements

CSCALL requires samtools to operate. If the samtools executable is not in the path, its location can be specified with the -samtools option.

Getting help

To get help use the -help command:

  cscall -help what

where what can be one of:

  • an individual command-line option, to get help on that option;
  • the name of a group of options, to get help on all the options in the group;
  • nothing, in which case a general help message will be printed.

Groups of options are: Commands, Arguments, and Other. In particular, the Commands group displays the seven top-level commands that can be used in CSCALL:

  help   - get help
  sites  - display list of known methylation contexts
  build  - create genome index
  desc   - describe contents of genome index
  filter - remove low-quality reads from FASTQ files
  call   - call methylation from BAM files
  align  - extract alignments for high-coverage regions

Genome indexing

CSCALL is able to detect C-T conversion events in a large number of contexts (e.g. CG, CHG). The following command displays the list of contexts known to CSCALL:

  cscall -sites

CSCALL requires an index of the genome to be built for each context to be analyzed. The index is created with the following command:

  cscall -build -r reference -s site

where reference is a file containing the reference genome in FASTA format and site is the type of site for which the index should be built. For example, to build an index of CG sites in the human genome, and assuming that the genome is in file hg38.fa, use:

  cscall -build -r hg38.fa -s CG

By default the index will be written to a file called hg38-CG.fa, unless a different filename is specified with the -o option. An optional report containing the number of sites found in each chromosome can be written using the -report option. Multiple index files for the different sites can be created with a single call specifying multiple sites after -s (in this case, the -o option cannot be used). For example:

  cscall -build -r hg38.fa -s CG CHG CHH

In order to ensure that the index used matches the reference genome used, the index file stores the pathname of the reference genome file it was built on, and its MD5 checksum. This means that if the reference file is modified in any way, the index will have to be rebuilt. The -desc command can be used to display meta-information stored in the index file:

  $ cscall -desc /path/to/hg38-CG.bin 
;;; CSCALL v1.0 - Converted Site Caller
;;; (c) 2016-2017, A. Riva (ariva@ufl.edu), DiBiG, ICBR Bioinformatics, University of Florida

Index file /path/to/hg38-CG.bin
Site:             CG
For reference:    /path/to/hg38.fa
Reference md5sum: 3c43d1af415a3ee471f981b734265a92
Created:          06/27/2017 13:44:06
Binary version:   1
Verified index file /path/to/hg38-CG.bin
  for site CG

Methylation calling

The -call command performs methylation calling. It takes three required arguments: -b to specify the BAM files to be analyzed (multiple BAM files can be specified, as separate arguments), -r to indicate the reference genome, and -i to indicate the index file. Output is sent to standard output, or to a file specified with the -o option. For example:

  cscall -call -r hg38.fa -i hg38-CG.bin -b aligned.bam -o CG-meth.bed

The output file is a tab-delimited file with one row for each potentially converted site in the genome, and fourteen columns. The contents of the columns are:

  1. Chromosome;
  2. Start position of the site;
  3. End position of the site (always equal to the previous field + the length of the site);
  4. Methylation rate at the site position (the ratio of fields 6 and 5);
  5. Total coverage (number of analyzed Cs);
  6. Number of methylated Cs at the site position;
  7. (ignore);
  8. (ignore);
  9. The character ‘+’;
  10. Number of Cs analyzed on top strand;
  11. Number of methylated Cs on top strand;
  12. The character ‘-‘;
  13. Number of Cs analyzed on bottom strand;
  14. Number of methylated Cs on bottom strand.

The -call command accepts a large number of optional arguments that control its behavior (e.g., handling of replicates, filtering by depth of coverage). Please use the -help command to display all optional arguments.


Download

Please fill the following form to download CSCALL for the GNU/Linux platform.

Download

  • Please enter a valid e-mail address.
  • Please enter your affiliation. Please note that this software is free for academic use. For all other uses, please contact ariva@ufl.edu.