Bioinformatics Day

Sunday, September 5th, 2010

 

 

 

Abstracts

 


 

Integrated Analysis of Diverse Functional Genomics Data or From a Sea of Data to Understanding Human Disease

Olga Troyanskaya

Princeton University

This talk will focus on addressing the challenges of integrative modeling of human functional genomics data ranging from sequencing to protein-protein interaction studies with the goal of mechanistic study of disease on the systems level.  The complexity and scale of human molecular biology make it difficult to integrate this body of data, model the underlying biological mechanisms on the whole-genome scale, and apply these models to the study of specific pathways or genetic disorders. These challenges are further exacerbated by the biological complexity of metazoans, including diverse biological processes, individual tissue types and cell lineages, and by the increasingly large scale of data in higher organisms. We address these challenges through Bayesian integration of functional genomics data to construct context-specific functional relationship networks that we have experimentally validated and have used to study disease. To enable genome-wide mapping of pathways based on functional genomic data, we have developed a novel methodology for simultaneous genome-wide inference of physical, genetic, regulatory, and functional pathway components. I will also describe our work starting to model these systems-level processes in a cell-type/tissue specific context, starting with accurate predictions (and experimental confirmation) of tissue-specific expression.

 

 


 

 

Seeking Overlooked Functions: Hidden Connections Among Short Proteins

Michal Linial

Dept of Biological Chemistry, The Surarsky Center for Computational Biology,

The Hebrew University of Jerusalem, Israel

Most animal toxins are short proteins that appear in venom and vary in sequence, structure and function. Considering the appearance of homologous venom proteins in evolutionary remote species, it is plausible that homologues of such proteins may be found in non-venomous species as long as they fulfill some biological function. Indeed, sporadic instances of endogenous toxin-like proteins that function in non-venom context have been reported. Herein we show that many families of toxin-like proteins remain undiscovered. For the goal of discovering overlooked short functional proteins, we turned to developing a computational method that can characterize and thus detect such proteins. We have successfully utilized machine learning methodology, based on sequence-derived features and guided by the notion of structural stability, a common characteristic of toxins, in order to create a robust characterization of toxin and toxin-like proteins. We screen and applied large-scale search for these proteins in insect, mammalian but also less studied genomes. Our method detected dozens of putative novel toxin-like proteins. When such search was applied for viral proteomes, we identified about 500 putative toxin-like proteins. We could propose some surprising cross-talk among viruses and their hosts. We will demonstrate a biological validation for some of these proteins. Furthermore, we show that the construction of a tree family scaffold of all proteins exposes hidden connections among many proteins many of them belong to viruses. We suggest that a systematically detection of viral protein families as well as toxin-like proteins may lead to novel pharmaceutical targets and to a deeper understanding of the evolutionary link between toxins, viral proteins and cell modulators.

This research is supported by the ISF grant and the Sudarsky Center for Computational Biology.

· Naamati G, and Linial M. (2010) A predictor for toxin-like proteins exposes cell modulator candi-dates within viral genomes. Bioinformatics (in press)

· Naamati G, Askenazi M, Linial M. (2009) ClanTox: a classifier of short animal toxins. Nucl. Acids Res. 37:W363-W368.

· Loewenstein Y, Linial M. (2008) Connect the dots: Exposing hidden protein family connections from the entire sequence tree. Bioinformatics 24:i193-199.

 


 

Unraveling the Transcription-Splicing Co-regulatory Network in Human

Yael Mandel-Gutfreund

Faculty of Biology, Technion- Israel Institute of Technology, Haifa Israel, 32000

To-date there is increasing evidence for coupling and co-regulation in the gene expression pathway, specifically between mRNA transcription and splicing. In this study we derived a transcription-splicing co-regulatory network including 3 types of nodes representing splicing factors, transcription factors and kinases and two types of edges including alternative splicing and transcription regulation. The splicing regulation was predicted using SFmap, a recent method we recently developed to map splicing factor binding sites, while transcription regulation is predicted using the human-mouse conserved transcription factor database.

Analysis of the co-regulatory network revealed that the network has characteristics of other regulatory networks, with a significantly high clustering coefficient and a power law outdegree distribution. Moreover, we identified two network motifs which are significantly more frequent compared to 1000 random networks. Among the most significant motifs we found a pure splicing and a mixed transcription-splicing feed forward loop. Interestingly, we found that in the co-regulatory network, splicing factors were significantly more regulated by splicing regulation while transcription factors were significantly more regulated at the transcription level. Further, we searched for significant preferences of pairs of splicing factors and transcription factors to co-regulate the different targets. Very interestingly, only in the case of the splicing factors we found highly significant preference of specific pairs of transcription factors and splicing factors to regulate the splicing factor genes, suggesting a combinatorial regulation at the transcription and post-transcription levels. Overall our systematic analysis of the splicing-transcription co-regulatory network suggests an extensive cross talk between the two major processes in the gene expression pathway in human cells.