BioInfOmics; or from Genomics via Transcriptomics and Proteomics to ProteoGenomics Jens Allmer Molecular Biology and Genetics, Izmir Institute of Technology.

1 BioInfOmics; or from Genomics via Transcriptomics and P...
Author: 鹤 孟
0 downloads 3 Views

1 BioInfOmics; or from Genomics via Transcriptomics and Proteomics to ProteoGenomicsJens Allmer Molecular Biology and Genetics, Izmir Institute of Technology Boğaziçi University,

2 A Journey To BioinformaticsEducation MSc University of Münster PhD University Jena University of Pennsylvania University of Münster Post Doc Work Experience University of Münster (Research Assistant) University Jena (Research Assistant) University of Pennsylvania (Visiting Scholar) University of Münster (Research Assistant/ Post Doc) İzmir University of Economics (Instructor) İzmir Institute of Technology (Assistant Professor) İzmir Institute of Technology (Associate Professor)

3 Topics Studied at jLab BioinformaticsGenomics (Visam, Caner, Mehmet, Tesfa, Fulya) Sequence Assembly Sequence Annotation Transcriptomics (Duygu, Hamid, Çağrı, Çağlayan, Volkan) miRNA Gene Prediction miRNA Targeting miRNA Regulatory Networks Proteomics (Canan, Savaş, Ulaş, Aybuge, Şule, Belgin) De novo sequencing Database search Proteogenomics (Canan, Mehmet, Elif, Cem) Annotation of human genome

4 Sequence Assembly (Visam)

5 Genome Assembly Quality (Visam)

6 Sequence Cleaning ProblemThe Percentage of Sequences Cleaned rawUV cleanUV appUV Every 600th EST 31.00 30.94 31.79 P. Somniferum EST 17.26 18.03 Artificial Data 87.50 75.00 100.00

7 Solution (Caner Bağcı)Reverse engineered UniVec from NCBI NCBI is relatively non redundant Contains all relevant vectors Contains relevant circularization Problems with EST data and Adapters persist

8 Functional Annotation

9 Visualization

10

11 Prediction of MicroRNA Genes

12 MicroRNA Regulatory Network T. Gondii (Volkan)

13 Cancer Related Regulative miRNA Network

14 Full Putative Regulatory Network (Duygu, Hamid, Canan)Setenay, Buğra (Çakabey Lisesi) Acıbadem üniversitesi Fikrine sağlık proje yarışmasında Türkiye 3. MEF schools İstanbul  Proje yarışmasında Biyoloji dalında Türkiye 3. Tübitak Ege bölgede sergilenme hakkı kazandılar

15 Some Hairpin Features Dicer Drosha Mature miRNA

16 Sequence Based Features

17 Thermodynamic Features

18 Some Features are Correlated

19 MicroRNA Gene PredictionDe novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures, Ng & Mishra 2007 MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features, Jiang et al. 2007 Identifying Human MicroRNAs, Bentwich 2008 MiRenSVM: towards better prediction of microRNA precursors using an ensemble SVM classifier with multi-loop features, Ding et al. 2010

20 Ng & Mishra 2007 6 more not shown here

21 Jiang et al. 2007 6 more not shown here

22 Bentwich 2008

23 Ding et al. 2010

24 Accuracy measurements for human miRNAs (positive dataset) and pseudo miRNAs (negative dataset).Studies Accuracy Values Best Average Standard Deviation Ng and Mishra 2007 0.930 0.895 0.060 Bentwich 2008 0.986 0.983 0.002 Ding et al. 2010 0.996 0.599 0.198 Jiang et al. 2007 0.910 0.877 0.018

25 Duygu Saçar and Jens AllmerComparison of four Ab Initio MicroRNA Prediction Tools Duygu Saçar and Jens Allmer miRBase Entries miRTarBase Entries Classifier Sensit ivity Specificity CA Specifi city SVM 0.85 0.86 0.94 0.92 0.93 Naïve Bayes 0.90 0.82 0.91 Logistic Regression

26 One Genome Different Proteomes  Different Phenotypes

27 Practical Proteomics Peng J and Gygi SP (2001) “Proteomics: The Move to Mixtures”, J. Mass Spectrometry 36(10):1083 35 Relative Abundance 15 60 85 1157.5 703.2 885.0 578.8 765.9 400 600 800 1000 1200 1400 1600 1800 2000 m/z 5 10 20 25 30 40 45 50 55 65 70 75 80 90 95 100 626.3 835.5 982.4 610.2 1054.4 1156.2 852.2 503.9 1217.7 445.1 1469.7 1259.8 Protein Identification e.g.: gi| |gb|AAQ

28 Peptide Sequence MatchesMS vs Sequence Database Search Strategies MS/MS vs Sequence MS/MS vs theo MS/MS MS/MS vs MS/MS Library PST vs Sequence De Novo vs Sequence Profile vs Sequence De Novo Sequencing Example Algorithm Sequence

29 ? ? Protein Assignment Data can stem from differentb X!Tandem c Peptide Z4 c,d OMSSA d b Sequest b,c b,c a Peptide A2 ? Database search EST Sequences Gene Models Genomic database Protein Sequences Chloroplast Sequences Mitochondrion Sequences Possible Contaminants R6 Mod C57 Mod Other Mods Quality Control Peptide A1 Data can stem from different databases (sequence files) ?

30 RAy (Database Search; Şule)Supported by TÜBITAK – 111E139

31 De Novo Sequencing Algorithms under developmentGenetic Algorithm (MSNovoGen) Me Ant Colony Optimization (COMAS) Canan, Savaş, Şule Ion Naming (DeNovoN) Canan, Belgin Algorithm exploiting new fragmentation model (ANovoStar)

32 DNML (Savaş)

33 DNMSO (Savaş) MS / MS PepNovo Lutefisk result DNML Api PTMConverter result DNML Api PTM Data Formats New De Novo Software File Handling Spectra PTMS %100 concentration on algorithm

34 MSNovoGen Generate «Random» Sequences Score Sequences With Fitness Function Delete Sequences With low scores Mate Crossover Sequence Pool Mutate Repeat Display Results WPAWDDHWSPYGW 0.3 EMFDCNTPMDKDRT FEGYCDHSDCDFME HTIYTAISWCDYET FEGYCDHCVCDFME MMFDGCLSDWYDSM DMYVCWSLMGYFW DMYVCWSLMGPHMM DMYVCWSIMGPHMM NMYRCESLMGYDAF FMMVCWSLMGPHMM FMMVCWSIGMPHMM FEMRCCIIGMFWY WFYACETVMGFWY WQPRCETVMGFWY WQPRCMVVMGFWY WFYACMVVMGYFW WQPRCMVVMGWYF FTYRCMVVMGWYF 0.7

35 COMAS (De Novo, Canan, Şule)

36 COMAS Example Optimum solution: KTGQAPGFSYTDANK

37 Abandoned for a new graph-based approach

38 Workflow for Gene Annotation

39 Annotation and Peptide MappingValidation of current gene models Exon extentions upstream/downstream Alternative start site selection Discovery of new exons or new gene models * * *

40 Mapping of human blood Plasma proteomic data to human genome24 Human blood proteome collections 19 collections were analyzed on LCQ mass spectrometry 5 collections were analyzed on QTOF mass spectrometry

41 Human Blood Plasma ProteogenomicsMert, Umutcan, Çakabey Lisesi DOESEF Yarışması’ndan International Conference of Young Scientists Yarışması’na katılma hakkı kazandılar International Conference of Young Scientists Yarışmasında 2.

42 Exon Addition The GenScan track represents the currently used gene model. The CCDS track shows the measured cDNA for the model. P1 -P5 are peptides identified in our study.

43 Exon Extension

44 Exon Joining

45 Puzzled Genomics Proteomics Genome Transcriptomics ProteogenomicsIdentification Annotation Confirmation Assembly Genome Annotation Validation Discovery Discovery Transcriptomics Proteogenomics

46 Acknowledgements