Seminar Topics & Project Ideas On Computer Science Electronics Electrical Mechanical Engineering Civil MBA Medicine Nursing Science Physics Mathematics Chemistry ppt pdf doc presentation downloads and Abstract

Full Version: DNA Motifs Detection Algorithms in Long Sequences
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Abstract— The identification of DNA motifs remains an active
challenge for the researchers in the bioinformatics domain. A
considerable effort in this area was concentrated on
understanding the evolution of the genome by identifying the
DNA binding sites for transcription factors. The evolution in
genome sequencing has led to the appearance of numerous
computational methods for finding the short DNA segments,
known as motifs. In this study, we present some of the
computational methods that exist and try to evaluate their
performance in case of long sequences.
Keywords - Computational methods, motif finding algorithms,
binding sites, bioinformatics, motifs localization
I. INTRODUCTION
In recent years, a considerable number of algorithms have
been designed for identifying novel regulatory elements in
DNA sequences. Most of the algorithms were developed with
the scope to detect the regulatory elements by taking into
consideration the regulatory regions of several co-regulated
genes that belong to a single genome [1]. The algorithms
identify the regulatory regions and search for overrepresented
motifs which are classified as potential candidates for
regulatory elements. Some of the algorithms use phylogenetic
footprinting to identify well conserved sites among regulatory
regions. Most of the algorithms perform over multiple
sequences or an entire gene. The gene is the fundamental unit
of inherited information in deoxyribonucleic acid (DNA) and is
defined as a section of base sequences that are used as a
template for the copying process called transcription. The
process of regulating gene expression is done with the help of
transcription factors by activating or inhibiting the process of
transcription. An important challenge in molecular biology is
to completely understand the mechanism that regulates gene
expression. A major task in this challenge is to identify the
DNA binding sites for transcription factors. Computational
methods are expected to offer promising solutions and as
consequence, researchers have invested considerable efforts
into these methods.
The detection of regulatory elements problem may be
announced in the following form: considering a group of N
sequences, look for a pattern M of length l which is found
frequently. If the pattern M of length l occurs in each sequence
from the group of N sequences, then a simple enumeration of
the l letters of the pattern M gives the regulatory element. The
main issue when we are dealing with the analysis of DNA
sequences is that the regulatory elements by which we search
may have mutations of nucleotides.
In the analysis of the mechanism that regulates gene
expression, sequence motifs have become very important. A
DNA motif can be defined as a short, recurring pattern in DNA
that is presumed to have some biological function (often they
represent binding sites for transcription factors -TF) [2]. A part
of the motifs contribute to complex processes that occur at the
RNA level, including ribosome binding, mRNA processing and
transcription termination [3]. Motifs are relatively short – they
have a length between five and twenty base pairs (bp) and can
be localized in different genes or even within the same gene.
Besides this classification based on length, there are two
special types of motifs that are recognized: palindromic motifs
and space dyad (gapped) motifs [4]. We called a motif
palindromic if it matches its complementary base sequence
read backwards (for example ‘TCTCGCGAGA’ it’s a
palindromic motif). Motifs that are formed from two sites of
short length, well conserved and usually separated by a spacer
are called space dyad (gapped) motifs. Because the
transcription factor (TF) usually binds as a dimer, the gap is
often located in the middle of the motif. Typically, the
positions where TF binds to the DNA are well conserved and
have a length between three and five base pairs.
In the past, methods like footprinting and gel-shift or
reporter construct assays [5] were used for determining binding
sites. Nowadays, computational methods are strongly involved
in determining overrepresented DNA patterns in a sequence or
set o sequences. A motif is overrepresented if it’s encountered
more often into the analyzed sequence than one would
expected by chance [4]. Most of the motifs finding algorithms
have great results in the case of lower organisms (including
yeast) but their performance is relatively poor in the case of
higher organisms. Some recent algorithms, which use
phylogenetic footprinting, proved to be more efficient in motif
detection for genomic sequences [5].
The identification of DNA motifs remains an active challenge for researchers in the field of bioinformatics. A considerable effort in this area was focused on understanding the evolution of the genome by identifying the DNA binding sites for the transcription factors. The evolution in genome sequencing has led to the emergence of numerous computational methods to find short segments of DNA, known as motifs.

In recent years, a considerable number of algorithms have been designed to identify new regulatory elements in DNA sequences. Most of the algorithms were developed with the scope to detect regulatory elements taking into account the regulatory regions of several co-regulated genes belonging to a single genome. Algorithms identify regulatory regions and look for overrepresented motifs that are classified as potential candidates for regulatory elements. Some algorithms use phylogenetic footprinting to identify well conserved sites between regulatory regions. Most algorithms perform more than multiple sequences or a complete gene. The gene is the fundamental unit of information inherited in deoxyribonucleic acid (DNA) and is defined as a section of base sequences that are used as a template for the copying process called transcription. The process of regulating gene expression is performed with the help of transcription factors by activating or inhibiting the transcription process. A major challenge in molecular biology is to fully understand the mechanism that regulates gene expression. An important task in this challenge is to identify DNA binding sites for transcription factors.