Binding site detection with spiking neural networks
BioSpique is a suite of software tools to be used for transcription factor binding site (TFBS) modeling and discovery. At the heart of the suite is an event driven spiking neural network simulator (eSSNNS) that is based on the third generation of artificial neural networks, able to simulate spatio-temporal data. Binding sites may be characterized by complex patterns of nucleotides besides frequencies of nucleotides at each location. Adaptive spiking neurons can be used to test hypotheses whether complex patterns underlie TFBS data for successful TFBS detection.
BioSpique is distributed as a GNU/Linux command-line tool.
TFBS detection with BioSpique is a two-step process. First, a TFBS model is created with
eSSNNS and refined with
CHC-HS for a specific TF. Then, the evolved TFBS model can be used to detect novel TFBSs in a genomic sequence for a specific TF using
- An event-driven spiking neural network (SNN) simulator used for TFBS model generation. Artificial neurons, called spiking neurons are simulated based on precise spike timing events. Inputs and outputs are spike times.
- A variation on the CHC genetic algorithm, developed by Larry Eshelman, is used for TFBS model refinement. The user needs to define a problem-specific fitness function to run CHC-HS. A TFBS fitness function is used to optimize parameters of TFBS models, such as neuron and synapse parameters.
- A TFBS scanner used for testing the generated TFBS model. It scans DNA sequences provided as fasta files and classifies TFBS and NON-TFBS sites.
BioSpique also contains a python script to enable the user to extract identifier and DNA sequence information from an alignment file from the TRANSFAC database, a second Python script to generate random DNA sequences to be used by TFBS-Scan and a third Python script to insert known binding sites into a random DNA sequence at specific positions, generated with the previous script. Furthermore, a gnu plot script is included for visualizing spikes, membrane potentials and synaptic component potentials of spiking neurons.
After downloading the compressed BioSpique package, execute the following command:
tar -xvf biospique-1.0.tar.gz
This will create a directory called biospique-1.0/, which contains six subdirectories (bin/, data/, doc/, lib/, programs/ and scripts/). The programs/ and lib/ directories contain the three main executables (e-ssnns, chc-hs, tfbs-scan) and libraries for OSX and Linux. Please do not modify anything in these subdirectories or the program will not work correctly. The doc/ directory contains the BioSpique manual, and the data/ directory contains example files used by the demonstrations for the three programs. Finally, the scripts/ directory contains the three utility Python scripts and the gnuplot script.
The second step of the installation consists in going to the bin/directory and calling the install script:
This will create three shell scripts (called e-ssnns, chc-hs, tfbs-scan) that invoke the correct executables for the OS you are running. These scripts contain absolute pathnames, so you can move them to a directory in your path, if desired; alternatively you can add the bin/ directory to your path. See the messages printed by the install script for more information.
See the BioSpique manual (in the doc/ directory) for a description of the command-line arguments and a demonstration using sample data.
- Evolving Spiking Neural Networks for Predicting Transcription Factor Binding Sites
Sichtig H, Schaffer JD, Riva A.
In: Proceedings of IEEE International Joint Conference on Neural Networks (ICNN 2010) and World Congress on Computational Intelligence (WCCI 2010). Barcelona, Spain. 2010
- An SNN-GA Approach for the Prediction of Transcription Factor Binding Sites
Sichtig H, Riva A.
In: Proceedings of Bioinformatics for Regulatory Genomics (BioRegSIG) and ISMB 2010. Boston, MA. 2010