Developed by:NPACT is an Open Source Project. Is was originally designed by Luciano Brocchieri, Assistant Professor at the Department of molecular Genetics & Microbiology and Genetics Institute, University of Florida, and engineered by Accelerated Data Works, Inc. Gainesville FL, under contract with the University of Florida, in collaboration with Luciano Brocchieri.
About NPACTNPACT is a computational and graphical representation tool for gene identification and sequence annotation. It uses a new quantitative approach that extends the principle of visual frame analysis (Bibb et al. 1984) to identify, within the input sequence, segments of any length that show statistically-significant 3-base compositional periodicities and are associated with an ORF structure (Oden and Brocchieri, 2015, Bioinformatics 31: 3254-61).
N-PACT produces a multi-page graphical output representing, along sequence positions, frame-specific GC content (or optionally content of any nucleotide combination) for frame analysis (Ishikawa and Hotta 1999; Brocchieri et al. 2005), published gene annotations (when available), hits, and newly identified ORFs with 3-base compositional periodicities.
AvailabilityN-PACT is available as a web-based application at http://genome.ufl.edu/npact/. NPACT can be also downloaded and installed locally (see “Development Mode” instructions in README.md file) from GitHub using the following link:
See NPACT-usage for more details.Input to N-PACT consists of a nucleotide sequence file in Genbank format that, minimally, must include a header with the fields DEFINITION, describing the sequence, and ORIGIN, followed by lines containing sequence data. Starting from the line following ORIGIN, all alphabetic characters are interpreted as sequence characters. Any alphabetic character other than A/a, C/c, G/g, T/t, U/u is treated as unknown. Non-alphabetic characters are ignored. Optional annotated coding sequences are read from the “CDS” fields, following the Genbank format. Gene names are read from the variables “gene” (first choice) or “locus_tag” (second choice).
N-profile and prediction graphical output
N-PACT produces a multi-page graphical ouput in Postscript or PDF format, containing the following information:
Newly Identified ORFs. These are ORFs newly identified by the N-PACT procedure (not included in the submitted CDS annotations) or potentially significantly modifying start of translation of an annotated CDS. ORFS are represented as pointing in the 5′-to-3′ orientation (see Coloring Scheme below coloring-key). Newly identified ORFs are in bold-face, whereas those matching coding regions corresponding to published genes but with modified starting position are displayed in lighter color. The names assigned to the newly identified ORFs are in the form:
For example: H-2335*A, G-123-t, where:
[H/G] identifies the type of hit:
H: H-type hit, identified using a scoring scheme based on compositional expectations of coding regions.
G: G-type hit, identified by any non-random association of base type with codon position.
####: A unique identification number followed by:
[-/*] identifies the E-value:
-: E-value > 0.01 (weaker compositional periodicity).
*: E-value >= 0.01 (stronger compositional periodicity).
[A/a/G/g/T/t/C/c/W/w] indicates the potential start-of-translation codon closest to the 5′-end of the hit. Upper case indicates that the predicted codon is within 15 codons of the start of the hit (and up to three codons within the hit), lower case indicates that it is more distant. The potential start-of-translation codons considered are, in the order:
ATG (A/a) or GTG (G/g)
Input file CDS. Coding regions reported in the annotation of the Genbank input file, from the CDS field (see Coloring Scheme below).
Hits. Graphical representation of sequence segments with compositional periodicity identified as colored segments on two tracks, corresponding to direct strand (upper track) and complementary strand (lower track). See COroring Scheme below for color assignment.
N-Profiles. These curves represent by default GC content profiles (S-Profiles) within a moving window of 201 nucleotides. The user can choose to display any other nucleotide combination (e.g., purines: A+G) with the understanding that any combination other than C+G or A+T will show nucleotide frequencies on the direct strand and the frequency of the complementary bases on the complementary strand. Each N-profile represents percent frequencies within a subsequence composed of every third nucleotide of the input sequence, starting, respectively, from position 1 (red), 2 (green) or 3 (blue). A light-gray curve indicates the average composition within the corresponding interval. Alternative nucleotide-type combinations (e.g., purines) can optionally be displayed from the Configure Analysis page.
The coloring scheme used throughout the graphical output follows the following rules. The nucleotide sub-sequences starting, respectively, from position 1, 2 or 3 of the sequence – composed of positions (i mod 3) where i = 1, 2 or 3 – are colored in red (R), green (G), and blue (B), respectively. Annotated genes, newly identified ORFs, and hits, are colored as the sub-sequence in phase with their third codon positions. An alternative coloring scheme is available.
- Brocchieri L (2016) Discovering Elusive Small Genes. J Phylogen Evolution Biol 4:e120. doi:10.4172/2329-9002.1000e120
- Brocchieri L (2015) Where are the genes missing from prokaryotic genomes? J Phylogen Evolution Biol. 3:e114. doi:10.4172/2329-9002.1000e114
- Oden S and Brocchieri L (2015) Quantitative frame analysis for the annotation of GC-rich (and other) prokaryotic genomes. An application to Anaeromyxobacter dehalogenans. Bioinformatics. 1-8. doi: 10.1093/bioinformatics/btv339
- Predicting coding potential from genome sequence: application to betaherpesviruses infecting rats and mice.
Brocchieri L, Kledal TN, Karlin S, Mocarski ES.
J Virol. 2005 Jun;79(12):7570-96.