This is the direction of Genomic
studies. Take note, scientists are studying DNA coding and repair, describing
its functionality quantitatively and algorithmically. They apply the principles and methods of
information theory and coding theory.
Is a Genome a Codeword of an
Error-Correcting Code?
Since a genome is a discrete
sequence, the elements of which belong to a set of four letters, the question
as to whether or not there is an error-correcting code underlying DNA sequences
is unavoidable. The most common approach to answering this question is to
propose a methodology to verify the existence of such a code. However, none of
the methodologies proposed so far, although quite clever, has achieved that
goal. In a recent work, we showed that DNA sequences can be identified as codewords
in a class of cyclic error-correcting codes known as Hamming codes. In this
paper, we show that a complete intron-exon gene, and even a plasmid genome, can
be identified as a Hamming code codeword as well. Although this does not
constitute a definitive proof that there is an error-correcting code underlying
DNA sequences, it is the first evidence in this direction.
Citation: Faria LCB, Rocha ASL, Kleinschmidt JH, Silva-Filho
MC, Bim E, Herai RH, et al. (2012) Is a Genome a Codeword of an
Error-Correcting Code? PLoS ONE 7(5): e36644. doi:10.1371/journal.pone.0036644
Error correction algorithms for DNA repair
Maintaining integrity of genetic material is achieved
through DNA repair, a process in the cell in which damage are continually
monitored and corrected. In many organisms the genes and proteins that
participate in this process have been identified, but with a few exceptions the
role of many of them is given only descriptively. Understanding the role of
genetic and protein interactions during DNA-repair and its mathematical
formulation is obscured not only by the limitations of the existing experimental
methods, but also by deficiencies of their underlying theoretical frameworks or
lack of thereof.
In this project we study an information theoretical
framework for and of DNA repair which views it as an error correction system.
We are applying the principles and methods of information theory and coding
theory to incorporate phenomena observed on different levels of abstraction of
the genomic error correction system. This method for rigorous treatment of
DNA-repair enables describing its functionality quantitatively and
algorithmically.
Prof. Bane
Vasic
Vida
Ravanmher
Prof. David Gilbraith
Prof. Michael Marcellin
“The model
encompasses the different structures of the error correction system and
interactions not only among its different levels but also among other sub-systems
in the cell. In order to understand such a complex system, specialized repair
mechanisms, which have been the primary object of research in the past, must be
considered in the context of the global error correction machinery. The
proposed framework for rigorous treatment of DNA-repair enables describing its
functionality quantitatively and algorithmically.”
“Error-correction
coding theory of DNA repair,” Bane Vasic, David W. Galbraith, Shashi Kiran Chilappagari
and Michael W. Marcellin.
Here are some of the
potential benefits of such studies. Cofty, you might find these interesting:
Abstract: All cancers are caused by errors
(mutations) in the DNA sequence that cannot be detected or corrected by the
body’s repair mechanisms. A similar problem is faced in telecommunications when
sequences of information are transmitted over a noisy channel, introducing
multiple random errors. To overcome this problem, redundant bits are added to
each sequence before transmission which will help correct the errors later in
the receiver. This is called “Error Correction Coding” (ECC), a
well-established area in communications which started in 1948 and has been
perfected over time. There are immense functional similarities between the body
cell’s error correction mechanisms and the error correction techniques used in
telecommunications. This research exploits these similarities and combines
statistical methods with the powerful toolbox of algebraic error control coding
to understand and then improve the body’s repair mechanisms which hold the key
to treating cancer and other genetic diseases. The three components of
algebraic error correction are the encoder, channel, and decoder. Similarly,
genetic encoder, channel, and decoder are defined in this proposal. A
probabilistic model is first derived for the genetic channel through applying
statistical inference to the available data of DNA mutations across the
spectrum of human cancer types. Then the genetic decoder which is in charge of
correcting the DNA errors is analyzed. In this approach, the decoder is divided
into two components: “DNA repair mechanisms” and the “gene interactions
network” which activates such mechanisms. Two very effective analytical tools
are borrowed from coding theory, namely message passing and density evolution,
and applied to study the global and local error correction mechanisms in the
cells. The two tasks above provide the fundamental knowledge of how error
correction is carried out in the cells. Given this knowledge, a systematic
approach is proposed to use genome editing techniques more effectively.
Particularly, target genes are identified in cancer cells to be knocked out of
the DNA for the purpose of making the cells vulnerable to a particular drug.
Project Title: Error Correction for the Code of
Life in A New Era of Genome Editing
Partnering Institutions and Investigators: Wichita State University,
Ali Eslami (PI)
Funding Agency: Flossie E. West Memorial Foundation