Data mining and bioinformatics

seminar class · 05-05-2011, 12:21 PM

Machine Learning for Analyzing Genome-Wide ExpressionProfiles and Proteomics Data Sets
Biological research is becoming increasinglydatabase driven, motivated, inpart, by the advent of large-scale functionalgenomics and proteomics experimentssuch as those comprehensivelymeasuring gene expression. These providea wealth of information on each ofthe thousands of proteins encoded by a genome.Consequently, a challenge inbioinformatics is integrating databases toconnect this disparate information as wellas performing large-scale studies to collectivelyanalyze many different data sets.This approach represents a paradigm shiftaway from traditional single-gene biology,and it often involves statistical analysesfocusing on the occurrence ofparticular features (e.g., folds, functions,interactions, pseudogenes, or localization)in a large population of proteins.Moreover, the explicit application of machinelearning techniques can be used todiscover trends and patterns in the underlyingdata. In this article, we give severalexamples of these techniques in agenomic context: clustering methods toorganize microarray expression data, supportvector machines to predict proteinfunction, Bayesian networks to predictsubcellular localization, and decisiontrees to optimize target selection forhigh-throughput proteomics.
Biological Research IsDatabase Oriented
Databases have defined the informationstructure of molecular biology for over adecade, archiving thousands of protein andnucleotide sequences and three-dimensional(3-D) structures. As large-scalegenomics and proteomics move to the forefrontof biological research, the role of databaseshas become more significant thanever. The current landscape of biologicaldatabases includes large public archives,such as GenBank, DDBJ, and EMBL fornucleic acid sequences [1]; PIR andSWISS-PROT for protein sequences [2];and the Protein Data Bank for 3-D proteinstructure coordinate sets [3]. Anothersource of sequence data is dbEST [4], a divisionof GenBank storing expressed sequencetags (ESTs) from cell lines, whichprovide information about gene expressionin various tissues. Databases such as thesehave been steadily accumulating gene sequencesand protein structures for morethan a decade, which are submitted on aper-instance basis from disparate laboratoriesin the biological sciences community.In addition to these general repositoriesof biomolecular data, specialized systemshave been developed that extend itsinterpretation by providing a context forindividual sequences and structures. TheSCOP, CATH, and FSSP [5] databasesclassify proteins based on structural similarity,Pfam and ProtoMap [6] identifyfamilies of proteins based on sequencehomology, while PartsList andGeneCensus [7] give dynamic reports onthe occurrence of protein families in variousgenomes. Databases have also beendeveloped to provide comprehensive accessto sequence, expression, and functionaldata for all the known genes ofspecific model organisms

Download full report
http://papers.gersteinlabe-print/integ-datamine-ieee/integ-datamine-ieee.pdf

Possibly Related Threads…
Thread		Author	Replies	Views	Last Post
	data mining full report	project report tiger	37	374,184,749	16-03-2019, 05:22 PM Last Post: TitkinWY
	A Novel Data Embedding Method Using Adaptive Pixel Pair Matching Report	project girl	3	4,489	15-01-2018, 01:56 PM Last Post: dhanabhagya
	Detecting False Data in Wireless Sensor Network using Efficient Becan Scheme	seminar tips	1	3,235	20-09-2017, 01:03 PM Last Post: jaseela123
	Different Initialization Data and the Performance by the BFM	seminar flower	1	680	20-09-2017, 12:44 PM Last Post: jaseela123
	Wide Area Mobile Data Services	seminar ideas	1	2,373	19-09-2017, 02:35 PM Last Post: jaseela123
	Integrating and Designing the Data Mining Technique System Based on Customer	seminar projects maker	1	782	15-09-2017, 02:45 PM Last Post: jaseela123
	Survey of Privacy Protection for Medical Data	project maker	1	649	13-09-2017, 01:14 PM Last Post: jaseela123
	Using Rapid Prototyping Data to Enhance a Knowledge-Based Framework for Product Redes	smart paper boy	1	115,120	13-09-2017, 09:54 AM Last Post: jaseela123
	Image Clustering and Retrieval using Image Mining Techniques REPORT	project girl	1	1,221	09-09-2017, 04:45 PM Last Post: jaseela123
	SECRET DATA HIDING IN IMAGE USING ENCRYPTION AND DECRYPTION KEY PPT	study tips	1	983	09-09-2017, 10:07 AM Last Post: jaseela123

Quick Reply
Message Type your reply to this message here. Disable Smilies	You have selected one or more posts to quote. Quote these posts now or deselect them.