Former supervisor: Nicholas Provart, Department of Cell and Systems Biology

Ryan successfully graduated from the GBB Program on 1 May 2009, as the fourth GBB Program Graduate. Ryan is now a compbio post-doc in the Centre for the Analysis of Genome Evolution & Function.

The de novo prediction of functionally significant sequence motifs in Arabidopsis thaliana

Thesis abstract

This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance.

The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families.

The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Significance levels for all predictions are standardized using a novel discriminative approach to statistical over-representation and results are interpreted using a utility created for the statistical filtering and graphically analysis of TFBS motifs. Both the novel enumerative method and the overall integrative methodology are shown to outperform existing approaches to TFBS prediction.

Former supervisors: Corey Nislow and Gary Bader, Department of Molecular Genetics

Using Genomics To Understand Infantile Hemangiomas

Infantile hemangiomas (IHs) are benign vascular tumors that occur in 4‐10% of children under a year of age with a greater occurrence in females. A tenth of all IHs require treatment due to cosmetic risks or due to their occurring in life‐threatening locations. Corticosteroids have traditionally been the first‐line treatment for IHs, but they are only partially effective and carry significant side effects. A recent serendipitous
discovery found that propranolol, a β adrenergic receptor antagonist, exhibited fast and consistent therapeutic effects shortening the natural course of IHs. This novel therapy is poised to become a first‐line treatment for IHs. The mechanism of action of propranolol on IHs is independent of its known antagonist
activity. We have applied two analyses to understand both the cause of IHs as well as the mechanism whereby propranolol exerts its therapeutic action. We used a novel chemical genomic approach, human
Multi‐copy Suppression Profiling, an assay in which human ORFs were expressed in yeast to screen 12,000 human genes and identified dual specificity phosphatases 10 and 16 as novel targets of propranolol. Followup
molecular analyses, including a DUSP10 in vitro activity assay, have confirmed these results, suggesting that therapeutic effects may be due to propranolol’s action on these phosphatases. In a complementary
effort, we have begun to sequences the exomes of cell lines derived from IHs patients. Exonic sequences were enriched from genomic DNA samples via capture by RNA oligonucleotide probes. After target
enrichment, we performed high‐throughput sequencing and identified genomic coding variations that may be responsible for the development of the IHs phenotype. Combining these studies, we hope to learn more about the formation and treatment of IHs.

Former supervisor: John Parkinson, Collaborative internship proposed with Andrew Emili

Understanding how protein assemblies, metabolic pathways, and genetic transcription are all incorporated into a living organism is of great significance. My project will focus on determining the large scale organization of bacterial genetic networks; the main challenge of this research lies in incorporating existing genome-scale datasets (e.g., PPI, functional networks, phylogenetics, microarray, eSGA, etc…) to yield biologically interpretable results.

Ultimately this project will be valuable in understanding how bacterial genetic networks evolve, how they interact with their hosts in health and disease, which will enable us to develop better therapeutics.

Former Supervisor: William Stanford, Institute of Medical Science

The Role of the RING Finger Protein Makorin-1 in Embryonic Stem Cell Self-Renewal

The viability of pluripotent cells as a therapeutic source is reliant on the improved knowledge of the molecular events controlling their derivation and fate (self-renewal versus commitment). Accordingly, to identify novel regulators of ESC fate, we combined temporal expression microarray analysis on early committed ESCs with promoter occupancy studies. In this study, Makorin-1 (Mkrn1) was identified to be transcriptionally co-regulated with known regulators of ESC pluripotency. The function of Mkrn1 in ESC self-renewal is currently unknown however; its expression is dependent on the undifferentiated state of the ESC. To further investigate the role of Mkrn1 in ESC self-renewal we induced Mkrn1 knockdown with shRNA in stable ESC clones. The knockdown of Mkrn1 hastened differentiation and led to a concomitant decrease in Oct4 mRNA and protein levels when cultured in self-renewal conditions. Conversely, the enforced expression of Mkrn1 in ESCs hinders differentiation when cultured in differentiation conditions as evident by higher Oct4 mRNA and protein levels. The data indicate that Mkrn1 functions as a novel regulator of ESC self-renewal; however, its mechanism of action remains unknown. Mass spectrometry analysis of the Mkrn1 protein interaction network in undifferentiated ESCs revealed that Mkrn1 interacts with a number of known mRNA-binding proteins suggesting that Mkrn1 regulates ESC fate through a previously uncharacterized post-transcriptional complex.

Current work is focused on identifying the subset of Mkrn1-bound transcripts in ESCs through the use of ribonucleoprotein immunoprecipitation-sequencing (RIP-Seq) analysis. Our goal is to construct a Mkrn1 post-transcriptional to further our insight into the regulatory networks that control ESC fate decisions.

Former supervisor: John Parkinson, Research Institute of the Hospital for Sick Children, enrolled in Department of Molecular and Medical Genetics

iGEM 2009 – Building a standard platform to investigate the potential of Enzyme Channeling

Enzyme channeling has the potential to increase the efficiency of some otherwise thermodynamically unfavorable reactions through the co-localization of enzymes catalyzing adjacent steps in a biochemical reaction. In addition, channeling may also be involved in pathway switching, preventing escape of small molecules, preventing accumulation of toxic intermediates and preventing the breakdown of unstable intermediates. In nature, this is accomplished by a number of mechanisms including protein fusion together with further surface charge adaptations which optimize delivery of reaction intermediates from one active site to another, trafficking of enzymes to the same cellular compartment, the use of complexes and membrane scaffolds. Recently, Chris Sanford in our laboratory has explored the effect of channeling using a three dimensional lattice simulation environment (Cell++) which has been uniquely designed to consider the spatial relationships of component objects when simulating cell processes. This is particularly important because the role of spatial organization in biological pathways has typically been neglected in simulations. The results suggest that the simple colocalization of enzyme pairs with certain properties can affect the rates of accumulation of reaction intermediates in a pathway. Under the auspices of the international genetically engineered machine (iGEM) competition, a team of undergraduates will test this prediction using a synthetic biology approach to construct a standard, re-usable platform capable of co-localizing chosen enzyme pairs. We will use this platform to investigate channeling in predicted enzyme pairs with an emphasis on pathways that may be of biological or commercial interest. See

Former supervisor: Rae Yeung, Department of Immunology

Collaborative internship with: Quaid Morris

The Biological Basis of Clinical Heterogeneity in Childhood Arthritis

Former supervisor: Dinesh Christendat, Department of Cell and Systems Biology

Functional genomics of the plant SK superfamily

Former supervisor: John Parkinson, Department of Biochemisty

Collorative internship proposed with Fred W. Keely, Department of Biochemisty

Elastin polymorphisms associated with increased risk of cardiovascular disease

Elastin, a polymeric protein and a member of extra cellular matrix, is playing a major role in elasticity of many tissues including skin, lung parenchyma and large arteries. It is a major structural protein in the walls of large blood vessels such as aorta and is responsible for elasticity of vascular tissues. Elastin fibers are remarkably stable with little or no normal turnover over the life-span of an individual therefore; they should be able to withstand millions of cycles of extension and recoil in tissues such as arteries without mechanical failure. We hypothesize that any subtle variation in elastin sequence can impact Elastin durability in arteries and consequently increase susceptibility to cardiovascular diseases. Applying the Solexa next generation sequencing platform, we have sequenced the elastin gene (ELN) from 800 subjects diagnosed with thoracic aortic aneurysm and dissection (TAAD) in addition to 400 control samples from Ontario residents. Our goal is to identify and characterize those SNPs in the elastin gene that are enriched in TAAD cohort.

Sequence variants in elastin and their association with late-onset of cardiovascular disease

Elastin, a polymeric protein and a member of extra cellular matrix, is playing a major role in elasticity of many tissues including skin, lung parenchyma and large arteries. It is a major structural protein in the walls of large blood vessels such as aorta and is responsible for elasticity of vascular tissues. Elastic properties of vascular tissues is very important for their physiological function, therefore abnormalities in elastin production or assembly can result in cardiovascular conditions, such as aneurysms, hypertension and atherosclerosis. Better understanding of elastin sequence variability between patients diagnosed with heart disease and healthy individuals and the impact of discovered sequence variants on elastin biomechanical properties and function will have applications in design of novel diagnostics and biomarkers for late-onset of cardiovascular disease.

In my research I plan to 1- Characterize elastin sequence variants in patients diagnosed with late-onset cardiovascular diseases. 2- Explore the impact of sequence variants on elastin biomechanical properties 3- Experimentally validate sequence variants by generating recombinant polypeptides For the first stage of my research, samples from 800 Thoracic Aortic Aneurysm and Dissection (TAAD) patients were collected by Dr. Dianna Milewicz from University of Texas Medical School. These samples were sequenced along with 400 samples from OPGP (Ontario Population Genomic Repository), by The Centre for Applied Genomics (TCAG) at SickKids using next generation sequencing Solexa. For sequence alignment and SNP calling, I used MAQ (Mapping and Assembly with Quality). I also used Perl and R for parsing files created by MAQ and analyzing data. Currently I am in the process of selecting SNPs for genotyping and further study their impact on Elastin integrity and function which will help us increase our understanding of pathologies of late-onset of cardiovascular

Former supervisor: John Parkinson, Department of Biochemistry (enrolled with Department of Molecular Genetics)

Elastin: from sequence to structure and function

Elastin is an essential vertebrate protein which provides elasticity to various tissues. Elastin has the ability to self-assemble into elastic fibres capable of confering elasticity while maintaining consistently high performance over billions of strech cycles. These notable properties renders elastin a perfect candidate to study for clues to design similar materials for use in a wide variety of fields. Furthermore, abnormalities in the elastin protein, which weaken its elasticity or durability often leads to diseases with grave consequences for human health. Yet despite the many recent advances in the field of elastin research, there is no clear understanding of how elastin sequence directly impacts its ability to self-assemble and confer elasticity. I have adopted a novel graph-based pattern searching algorithm to identify over-represented repetitive elements in elastin from different species. These elements have been hypothesized to play a direct role in the assembly and elastic properties of elastin fibres. Data-mining existing human SNP databases has shown that several SNPs fall within these elements, which may disrupt the durability or elasticity of the resulting elastin fibres. These mutations and their effects on the physical properties of elastin will be examined in vitro.

Former supervisor: John Parkinson, Department of Molecular Genetics

Description of Research Project

The Apicomplexa is a large phylum of unicellular eukaryotes, from which a number of medically relevant parasites are members. More specifically, species from the genuses Plasmodium, Toxoplasma, and Cryptosporidium are the etiological agents of the most common diseases caused by apicomplexans which are malaria, toxoplasmosis, and cryptosporidiosis, respectively. Currently, there is a lack of effective vaccines or treatments against many apicomplexans, and the increasing prevalence of drug resistant strains has stressed the urgency to develop novel drug therapies.

To meet these global health care challenges, several international consortia have generated vast amounts of sequence data, offering opportunities to gain insight into apicomplexans through in silico analyses. Information from the genome sequences combined with proteome, transcriptome, and other high-throughput datasets are being exploited to better understand apicomplexan biology. Our current knowledge on apicomplexan metabolism has revealed a number of pathways specific to the phylum, but for which many enzymes have not yet been elucidated. Due to selection pressures associated with surviving in an obligate host, these enzymes are either absent from the parasite, or present in the parasite and have evolved to be highly divergent from the host organism. To answer key questions concerning the evolution and conservation of apicomplexan parasites, I aim to apply comparative network analyses to identify parasite-specific enzymes that are critical for survival and up-regulated during stages of parasite growth and infection. These represent adaptations of the parasite to persist in the host and are important targets for therapeutic intervention.

Former supervisors: Krishna Mahadevan and Elizabeth Edwards, Department of Chemical Engineering and Applied Chemistry

Genome-scale modeling of the metabolism of Dehalococcoides bacteria: from genome to pan-genome

Dehalococcoides are important for the bioremediation of chlorinated solvent contaminated sites. However, how this dehalogenating capability has been acquired and employed by these microbes is not well understood, specifically at the metabolism level. In addition, the low growth yield of these bacteria is a major impediment to faster bioremediation process. Hence, genome-scale reconstruction of the entire metabolic network and subsequent modeling of Dehalococcoides will be beneficial to understanding and overcoming these issues. Moreover, such a model is an excellent platform for exploring the metabolic capability of a microbe as well as for generating experimentally testable hypotheses regarding the microbe’s physiology. Hence in this presentation, I’ll talk about a genome-scale reconstruction of the entire metabolic network and subsequent modeling of Dehalococcoides species strain CBDB1, a dechlorinating bacterium unique for its metabolic niche of degrading toxic and persistent ground water pollutants. In addition, I’ll also talk about the Dehalococcoides pan-metabolic model that has been developed using strain CBDB1 metabolic model and the published genome sequences of 4 Dehalococcoides isolates –Dehalococcoides sp. strain CBDB1, Dehalococcoides ethenogenes strain 195, Dehalococcoides sp. strain BAV1, Dehalococcoides sp. strain VS. The pan-model reveals the remarkable similarities exist among the isolates from the context of core metabolic processes.

Former supervisor: Jeremy Squire, Department of Medical Biophysics

Causes and Consequences of Genomic Instability in Prostatic Carcinogenesis

The evolution of prostate cancer from normal epithelium via the preneoplastic lesion of high-grade prostatic intraepithelial neoplasia to invasive carcinoma is characterised by a number of particular genomic abnormalities that are predominantly generated in the preneoplastic phase. Whilst there are numerous candidates for the cause of these alterations, telomere dysfunction is thought to be a major contributor. Telomeres are the terminal ends of human chromosomes, and when dysfunctional can lead to break-fusion-bridge cycles and multi-polar mitoses that generate numerical and structural chromosomal instability.

The results presented reinforce the association of telomere dysfunction with the generation of certain markers of genomic instability such as abnormalities of the arms of chromosome 8. Furthermore, this work clarifies that the TMPRSSS2-ERG aberrations are not telomere related phenomena and are associated with a genomic deletion in a proportion of cases. Similarly, the PTEN microdeletions did not appear to have an association with telomere attrition. A previously unrecognised association between the telomere length in various types of prostatic epithelia and adjacent stroma is defined, suggesting evidence of a micro-environmental field effect in the generation of prostatic neoplasia. Finally, when examined retrospectively, it appears that telomere attrition, both in the HPIN epithelium and the stroma has independent prognostic value in the diagnosis of prostate cancer after a previous diagnosis of HPIN.

Taken together, the research presented suggests important avenues for further research to determine the nature of barriers to the evolution of prostatic carcinogenesis such as oncogene- and telomere-induced senescence that may be exploited for therapeutic gain. These understandings may also help tailor management for prostate cancer such as risk stratification for men with HPIN and the use of targeted agents such as AKT inhibitors and telomerase inhibitors. In more advanced disease, translational application of this work has enabled a clinical trial of cytarabine in the treatment of metastatic hormone refractory prostate cancer.

Former supervisor: Liliana Attisano, collaborative internship proposed with Jeff Wrana.

The role of signaling crosstalk between TGFβ /Smad and Hippo/TAZ pathways in determining cell fate

In my research project, I plan to use a multidisciplinary approach to model the crosstalk between hippo and TGFβ/Smad signaling pathways and identify the unknown parameters of the nonlinear model using advanced control theory. The proposed research plan involves the completion of two steps: 1) identifying and modeling of crosstalk between TGFβ/Smad signaling pathways and hippo pathways 2) validation of the model in cultured cells. To achieve these goals, I will consider different scenarios for the unknown mechanisms (unmodeled dynamics in the control theory sense) and by using experimental measurements I will try to identify the unknown parameters of the models. In modeling biological networks, the unknown parameters are often fitted based on experimental measurements. However, the measurements are very noisy and limited. In this research, I will use a dynamic recursive estimator, known as Unscented Kalman Filter (UKF), to estimate the model parameters and identify the unmodeled dynamics in different scenarios. Finally, after obtaining the models to validate them, I will perform structural perturbation experiments, gene silencing and inducible overexpression. For this I will use siRNA libraries and several small molecule screens, designed and implemented in our lab to identify positive and negative chemical inhibitors of the pathways.

Former supervisors: James Dennis, Department of Biochemistry & Christopher Hogue, Department of Biochemistry

Ken Lau successfully completed the GBB program in April 2008 – and in fact was the first GBB graduate! He has gone on to a post-doctoral position at Harvard.

A systems biology approach to decoding the function of complex N-glycans

Thesis Abstract

Embryogenesis, tissue repair and adaptive immunity involve developmental sequences of cell proliferation followed by differentiation and cell cycle arrest. Growth factors and other cytokines bind glycoprotein receptors to stimulate growth or arrest signaling, with the net response dependent on the availability of both ligands and receptors. The number of N-glycans (n), a distinct feature of each glycoprotein sequence, cooperates with the physical properties of the Golgi pathway to regulate surface levels of receptors. The Golgi pathway is ultrasensitive to hexosamine flux for the production of tri- and tetraantennary N-glycans, which bind galectins to form a molecular lattice that opposes glycoprotein endocytosis. Glycoproteins with few N-glycans (low n – e.g. TβR, CTLA-4, GLUT4) exhibit enhanced cell surface expression with switch-like responses to increasing hexosamine concentration, whereas glycoproteins with high numbers of N-glycans (high n – e.g. EGFR, IGFR, FGFR, PDGFR) exhibit hyperbolic responses. A bioinformatics survey shows that receptor kinases with high n play roles in metabolism and growth, while those with low n have functions in arrest and differentiation pathways. Computational modeling and experimental data reveal that these features impose a sequence of growth-to-arrest/differentiation, where growth-promoting high n receptors stimulate nutrient flux first, which then drives arrest/differentiation programs by increasing surface levels of low n glycoproteins. Interaction of the N-glycan branching pathway and N-glycan number play important roles in diseases, as evidenced by the identification of synergistic polymorphisms in Mgat1 and CTLA-4 in human multiple sclerosis patients. In order to fully understand the interaction of N-glycan functions with cellular phenotypes, we applied an unbiased screening approach to examine perturbation to N-glycan branching on a global scale. By utilizing microarray analysis and siRNA knockdown, we have identified GIcNAc-sensitive pathways that validate and extend our model, and simultaneously affect growth regulation, complex N-glycan processing, and constitutive endocytosis. Non-intuitive genes of interest identified by our screen include those associated with ionic regulation and proteoglycan biosynthesis. Our results reveal a mechanism for the metabolic regulation of the cellular transition between growth and arrest in mammals that arises from the apparent co-evolution of N-glycan number and branching.

mammals that arises from the apparent co-evolution of N-glycan number and branching.

Former supervisor: David Guttman, Department of Ecology and Evolutionary Biology and Department of Cell and Systems Biology

Identification of novel innate immunity elicitors using molecular signatures of natural selection

The innate immune system protects eukaryotes against invading microbes upon perception of microbial motifs called pathogen-associated molecular patterns (PAMPs). Despite their central role in innate immunity, and their potential as antimicrobial agents, few PAMPs have been identified, and no systematic method has been developed for their discovery.

We show that competing signatures of negative selection to preserve core functions, and positive selection to avoid host recognition can be used to identify novel PAMPs. A selection analysis confirmed that all 1322 core genes identified from six phytopathogenic bacteria exhibited strong negative selection at the whole-gene level, while 56 genes also showed localized regions of positive selection. We show that these candidate PAMPs differ from the core genome with respect to the number and clustering of positively-selected sites. Finally, we functionally confirmed the candidates’ ability to induce innate immunity in Arabidopsis thaliana via callose deposition and virulence suppression assays.

Former supervisor: Janet Rossant, Department of Molecular Genetics

Sox17-mediated conversion of mouse embryonic stem cells (ESCs) into functional extraembryonic endoderm stem (XEN) cells identifies dynamic networks controlling cell fate decisions

The extraembryonic endoderm (ExEn) of the mammalian conceptus is important for patterning of the embryo proper, gives rise to support tissues such as the primary yolk sac, and can be maintained in vitro as self-renewing XEN cells. Little is known about the regulatory networks distinguishing XEN cell lines from the extensively characterized ESC. An intriguing regulatory network candidate is the transcription factor Sox17, which is essential for XEN cell derivation and self-renewal. To test the ability of Sox17 to drive XEN cell fate, we overexpressed Sox17 in ESCs, generating cells with cell morphology indistinguishable from embryo-derived XEN cells. Upon injection into host blastocysts, Sox17-XEN cells integrate and proliferate in the parietal endoderm of E8.5 mouse embryos. To identify dynamic regulatory networks involved in Sox17-mediated XEN conversion, time series RNA-sequencing was performed, revealing distinct stages of gene expression during conversion. Using the Dynamic Regulatory Events Miner algorithm, we generated a dynamic regulatory map of gene expression throughout Sox17-mediated XEN conversion. Mapping of gene expression bifurcation points revealed 39 dynamic gene expression paths throughout the conversion process. By overlaying transcription factor binding data on top of our dynamic regulatory map, we have identified novel putative regulators of ExEn cell fate. Based on this analysis, we have drafted three classes of ExEn cell fate regulators including ExEn cell fate repressors in ESCs, activators of ExEn cell fate in XEN cells and transcription factors active in both ESCs and XEN cells, acting to either repress or activate ExEn genes respectively. Taken together, our findings suggest that Sox17-mediated XEN conversion is a robust system to study cell fate decisions and can be used to identify novel transcriptional network modules regulating these changes. To confirm predicted ExEn regulators, we are currently perturbing the expression of transcription factors in ESCs and XEN cells to induce cell fate changes.

Supervisor: Peter Liu, Institute of Medical Science

Collaborative internship with: Gary Bader

Heart failure (HF) is the cardiovascular epidemic of the 21st century. The goal of our research program is to explore novel pathways leading to heart failure, so that it can be effectively prevented. Bioinformatic tools are used to identify the novel target to study, possible players of the progression of disease, interactions and networks among players and screening tools like biomarker. In my study, my novel target, mindin, has been identified to markedly change in heart failure through microarray analysis. Mindin is a new member of the Mindin-F-spondin family of proteins that is highly conserved in evolution. It is expressed in heart and seems to mediate innate immunity. The aims of this research are first to identify the functions of mindin in cardiac remodeling and cardiac failure due to MI, second to determine the pathways that mindin is involved (includes mindin’s interactions with other players), third to identify the potential therapeutic intervention of mindin and fourth to identify the potential usage of mindin as a biomarker. Mindin’s target will be identified first with in silico analysis of mindin’s gene and protein sequences, then mindin’s interaction with potential targets will be analyzed by high throughput proteomic analysis. Microarrays and mass-specs can be used to identify mindin’s potential usage as a biomarker.

Former supervisor: Quaid Morris, Department of Computer Science

Predicting protein function by integrating multiple large-scale biological datasets

Most successful computational approaches for protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. The most accurate of these algorithms have long running times, making them unsuitable for real-time protein function prediction in large genomes. In the first part of this talk, we discuss an algorithm, GeneMANIA, that is as accurate as the leading methods, while capable of predicting protein function in real-time. In particular, GeneMANIA, represents each data source as a functional association network and combines these networks into a function-specific composite network. The composite network is then used predict protein function, using a label propagation algorithm. We show that GeneMANIA is one of the leading methods on yeast and mouse benchmark data.

In the second part of this talk, we discuss a modification of GeneMANIA that allows us to accurately predict protein function when there are only a few positive examples (annotated proteins). We show that this modification improves the prediction accuracy in yeast, mouse, and human benchmark data. In addition, we show that our method improves on two previously proposed modes of network integrations. Finally, we verify some of our most likely predictions by conducting a literature survey.

Former supervisors: Andrew Emili &  Zhaolei Zhang

Collaborative traineeship with Charlie Bloom and Corey Nislow/Gury Giaever

The majority of my research has focused on the study of protein interaction networks, specifically how duplicated genes fit within them. Protein interactions are crucial to all eukaryotic cellular pathways and dictate virtually every aspect of cell function. Two large-scale interaction surveys were recently published in yeast each identifying over 300 distinct protein complexes with over 10,000 combined protein interactions. My initial focus was to compare the protein interactions of 450 duplicated yeast genes in an effort to determine global patterns of functional divergence following gene duplication. By contrasting the nature of complex membership of duplicated genes on a large scale I was able to determine what may have inspired duplicated proteins to functionally diverge. Importantly, I found a marked difference in the expression and conservation of duplicate genes that remain functionally overlapped. This suggested a redundancy for non-diverged genes that may be crucial in cell stress and perturbation, and led me to investigate the genetic interactions of this same group of paralogs. These experiments will be completed midway through the year and should be very useful both in the study of functional divergence, and in determining the inter-relationship between gene and protein interactions.

Visit Biomatica Inc.
Check out our Bioinformatics forum
Click here for my CV page


Former supervisor: Alan Moses, Department of Cell and Systems Biology

Collaborative internship proposed with: Brenda Andrews

Proteins contain intrinsically disordered regions – they consist of protein sequences with no apparent structure under native conditions. Although rare in bacteria and archaea, these regions are prevalent in eukaryotes. For example, they populate roughly 15-20% of the proteins in our genome. Their role is not fully understood, however, they are thought to be enriched in protein regulatory elements or short linear motifs. These short linear motifs are critical for the function of proteins as they act as functional switches which modulate the location, activity and degradation of the protein. The goal of my project is to perform a systematic genome-wide prediction of short linear motifs using evolutionary conservation.

I will begin by collecting several characterized instances of short linear motifs, including phosphorylation sites, localization signals, interaction motifs and degradation signals. These short linear motifs will be examined for their evolutionary properties and used to design a phylogenetic hidden Markov Model. This algorithm can assess evolutionary conservation by taking advantage of the phylogenetic tree between species. I will then perform a genome-wide search for short conserved sequences. This search will uncover novel short linear motifs and interesting conserved sequences will be functionally assayed in the lab by performing single-site mutagenesis on the predicted sequences. Another aspect of this project is to use the predicted sequences to uncover amino acid patterns which define short linear motifs. While several patterns are known to the scientific community, it is estimated that many more exist. I have already shown that evolutionary conservation alone is sufficient in uncovering the pattern specificity of two kinases. By grouping the predicted short linear motifs using sequence similarity, novel patterns will be discovered and function can be deduced. I will test if the patterns are directly responsible for the inferred function.

Former supervisor: Andrew Emili, Department of Molecular Genetics

Collaborative Internship with: Shoshana Wodak

My research focuses on interactions among chromatin-associated proteins with particular emphasis on core methylation systems. Histone methyltransferases are important regulators of chromatin structure and gene expression programs and are key determinants of cell fate and function. Several families of histone methyltransferases are encoded by the human genome with distinct catalytic and functional properties. To identify interactors for these enzymes, I am applying an affinity capture LC-MS/MS approach using two complementary proteomics approaches: lentiviral-based TAP-tagging and phage-derived synthetic antibodies. Systematic analysis of interacting partners for these enzymes will improve our understanding of the molecular mechanisms underlying histone methylation and transcriptional regulation.

Former supervisor: Peter Zandstra, Institute of Biomaterials and Biomedical Engineering | Former collaborative traineeship supervisor: Quaid Morris

Hematopoietic stem cells (HSCs) from umbilical cord blood are valuable resources for blood transplantation. Unfortunately, the small HSC numbers per cord blood collection restrict the wide application of cord blood transplantation. The capability of qualitative and quantitative in vitro HSC expansion is therefore important for enhancing the clinical benefits of HSCs. Our previous studies showed that proper manipulation of the intercellular signaling between HSCs and other cell populations can enhance in vitro HSC expansion. However, HSC expansion culture is dynamic and heterogeneous system. To date, it is challenging to identify the key factors that we may manipulate. The goal of my project is to use computation tools to gain better understanding about the intercellular signaling between hematopoietic cells and the regulation of dynamic environment on HSC fates.

So far, we have employed microarray data and bioinformatics tools in study the intercellular signaling patterns between HSCs, progenitors and mature cells. As a next step, we seek to implement informatics or mathematical models of the dynamic regulation of environmental cues on HSC fates.

Former supervisor: Krishna Mahadevan, Chemical Engineering and Applied Chemistry

Collaborative internship with Elizabeth Edwards, Chemical Engineering and Applied Chemistry

The ultimate goal of this study is to develop methods for modeling and engineering the metabolism of a clostridial co-culture, and improving the biobutanol production rate with the use of a consolidated bioprocessing approach. Genome-scale metabolic models of microorganisms from different domains of life have been developed and been applied for analyses of metabolism in pure cultures; however systems biology of microbial co-cultures will extend our knowledge on pure culture physiology to microbial co-cultures, where metabolic interactions along with inter-species transport of metabolites are present. System-level understanding of the Clostridium cellulolyticum and Clostridium acetobutylicum co-culture metabolism, which can be applied for biobutanol production from cellulosic biomass, facilitates the analyses and design of strategies for process and metabolic optimization; thus improving the biobutanol production rate. Therefore, the development of computational methods to investigate the interactions between microorganisms in microbial co-cultures, based on the community genome sequences and physiology, is beneficial for the ultimate engineering of these co-cultures; consequently the focus of this proposal is the development of such methods.

Former supervisor: Michael Taylor, Department of Laboratory Medicine and Pathobiology

Proposed collaborative internship with: Gary Bader

My overall objective is to analyze various types of high-throughput data on medulloblastoma samples for the purposes of understanding different mechanisms of tumourigenesis, classifying tumour samples into biologically relevant subtypes, and identifying common and divergent disruptions of signalling pathways.

To this end, I am devising and implementing strategies to integrate various sources of data including SNP array, expression array, and in the future, RNA-seq data. I will help refine existing molecular subtypes of medulloblastoma, and determine how the molecular subtypes differ in terms of disrupted genes and pathways, with the hopes of developing therapeutic strategies specific for each subtype. I will further determine how medulloblastoma changes in response to treatment, by comparing the genetic profiles of primary and recurrent tumours, and subsequently inferring aberrations in signalling networks.

In later stages of the project, the bioinformatic predictions will be validated using medublastoma cell lines and mouse models.

Supervisor: Anthony Pawson, Samuel Lunenfeld Research Institute, enrolled in Institute of Medical Science

Investigating Cellular Decision-Making in Apoptosis

Networks of kinases play a role in the transmission and integration of signals from the membrane to the nucleus. We aim to elucidate kinase phosphorylation and interaction partners in these networks through the immuno-precipitation and mass spectrometric analysis of a representative set of 100 Flag-tagged kinases stably expressed in human colorectal cancer cells. The goal is to generate a comprehensive set of interactions and dynamic phosphorylation sites which correlate with cell phenotypes such as apoptosis and proliferation. The techniques of mass-spectrometry have allowed for the identification of proteins and their phosphorylation sites in complex samples. However, kinases usually work in the context of particular signaling stimuli. We aim to characterize the role of these over-expressed kinases in the context of Trail-induced apoptosis. This is particularly relevant to tumorigenesis in that many cancers are resistant to apoptosis and recombinant Trail therapies are currently undergoing clinical trials. We present assays to correlate the proliferative ability and sensitivity to apoptosis of various stable cell lines with kinase expression levels through flow cytometry. We also present efforts to trace downstream signaling through the monitoring of MAP kinase phosphorylation using a high-throughput bead array

Former supervisor: Gil Privé, Department of Medical Biophysics

Sequence and Structural Analysis of the BTB Domain

The BTB domain is a eukaryotic protein-protein interaction motif found in variety of proteins. This thesis describes an investigation into the general and specific properties of the sequence, structure and self-association properties of this domain.

The work is divided by two complementary approaches.

Chapter 2 describes computational work in assembling a collection of BTB domain sequences from completely sequenced eukaryotic genomes. This chapter describes analyses on this collection including the genomic distribution, domain architectures, identification of putative novel domains and predictions of interactions.

Chapters 3, 4 and 5 are founded on experimental analyses on BTB domains from human BTB-ZF proteins.

Chapter 3 describes the structure of the BTB domain from Leukemia/Lymphoma Related Factor (LRF). The structure closely resembles the previously determined structures of BTB domains. The structure showed a large number of sequence substitutions on the surface of the LRF BTB domain that is equivalent to the surface involved in an interaction between the BTB domain from B-Cell Lymphoma 6 (BCL6) and a peptide derived from the SMRT co-repressor (the SMRT-BBD). We show the LRF BTB domain does not interact with this peptide.

Chapter 4 describes the structures of the BTB domains from FAZF and Miz-1. These proteins conserve most of the BTB fold but show some unexpected changes. The BTB domain from FAZF lacks domain swapping which is a novel feature. The BTB domain from Miz-1 contains a naturally truncated N-terminus and a novel movement of 10 residues away from a conserved three-stranded b-sheet. We show these BTB domains are dimeric within a specific concentration range and that they do not interact with the SMRT-BBD.

Chapter 5 describes the structure of the BTB domain from Kaiso. This structure showed interactions between Kaiso BTB domain dimers that extend through the crystal. We identified similar interactions between dimers in a number of other structures of other BTB domains which suggested a common mode of oligomerization.

Former supervisor: Jeff Wrana, Department of Molecular and Medical Genetics

In the news: Breast cancer survival predicted by new Canadian tool – 1 Feb 2009. See story describing Ian Taylor and Jeff Wrana’s predictive tool, published in Nature Biotechnology, in the Financial Post. Quaid Morris and Tony Pawson of the GBB program were also co-authors on this paper.

Supervisor: Boris Steipe, Department of Biochemistry

The Discovery and Analysis of Structural Motifs, from Specifying the General, to Generalizing the Specifics

Recurring structural patterns observed across non-homologous protein families can be hypothesized as products of convergent evolution and may be associated with low conformational energy. These recurring patterns, which we call motifs, will give valuable insights in areas such as stability engineering and protein structure prediction. To specifically look at packing motifs in proteins, I have represented patterns in a way that can probe for general packing patterns that include both local and non-local interactions. After these tens of thousands of structural motifs were discovered, necessary statistical analyzes were done to uncover the biological meaning of these motifs. In working with structural motifs I have encountered several key challenges. I have highlighted the challenges with selected motif examples from the results.

Former supervisor: Shoshana Wodak, Department of Biochemistry

Interaction Landscape of Membrane Protein Complexes in Saccharomyces Cerevisiae

Most cellular processes are mediated by macromolecular assemblies. Whereas extensive affinity purification studies of soluble protein complexes have been published for yeast and other models, no large‐scale characterization of eukaryotic membrane protein complexes has ever been reported. To this end, we performed an exhaustive proteomic survey of 2,075 integral and lipid‐anchored membrane proteins affinity purified in the presence of non‐denaturing detergents from Saccharomyces cerevisiae. We derived a highconfidence
physical interaction network encompassing hundreds of putative heteromeric complexes
associated with diverse processes, including vesicle transport, secretion, endocytosis, signalling, lipid metabolism and formation of membrane compartments like the vacuole and peroxisome. The availability of
a global map of membrane protein complexes addresses an essential prerequisite for understanding eukaryotic membrane systems.

Former supervisor: William Stanford, Institute of Biomaterials and Biomedical Engineering

The polycomb group protein PCL2 in embryonic stem cell commitment

PCL2 (polycomb-like 2) is a highly conserved polycomb group protein identified in a genome-wide screen for novel regulators of self-renewal and pluripotency (Walker et al., Cell Stem Cell 1(1), 2007). Highly expressed in undifferentiated embryonic stem cells (ESCs), it is immediately down-regulated upon both the removal of LIF and the addition of retinoic acid. PCL2 co-immunoprecipitates with the PRC2 complex, which is responsible for tri-methylation of lysine 27 on histone 3. To study the function of Pcl2 in ESCs, we generated stable shRNA knockdown ESC lines which expressed 15-30% of Pcl2 compared to the shRNA mismatch-control ESC line. Microarray analysis of knockdowns resulted in a list of genes regulated by Pcl2, while chromatin immunoprecipitation coupled with high throughput sequencing identified which were direct targets. Differentiation markers were reduced by as much as 7-fold. Markers of undifferentiated ESCs as well as genes involved in chromatin remodeling, DNA damage response and cell cycle were up-regulated. Single-cell immunofluorescence analysis revealed that OCT4 protein levels were heightened in undifferentiated knockdowns and remained heightened even after 72 hours in -LIF and -BMP4 conditions. In colony forming assays, knockdown cells formed undifferentiated, alkaline phosphatase positive colonies (80-95%) at a much greater efficiency than mismatch controls (40%). Knockdown cells are unable to differentiate into neural precursor cells, or form mature embryoid bodies (EBs) and qPCR revealed a delay in onset of early differentiation markers. Thus, it appears that Pcl2 is responsible for maintaining the balance between self-renewal and the initiation of commitment to differentiation.

Former supervisor: Michael Brudno, Department of Computer Science

Active Pathways: Visualization and Analysis of Pathways and Expression Data

Gene expression profiling is used to identify genes modulated in biological processes. To gain a better  understanding at the system level, existing analysis methods can be used to identify (a) functional sets (e.g.
DAVID, GSEA) or (b) network modules (e.g. Active Modules) enriched by modulated genes. Functional sets are often derived from curated annotations such as the Gene Ontology (GO), whereas networks capture
pairwise gene relations such as protein physical interactions. Tools are also available to color static pathway maps (diagrams) according to gene expression (e.g. GenMAPP). The rapid and accelerating growth of
pathway databases available in a common interchange format (BioPAX) enable new, more fully integrated analysis tools for exploring and explaining expression profiles using high‐quality annotated pathway models.
The Active Pathways project aims at integrating these analysis strategies for pathways within the Cytoscape
network visualization environment, and enabling a more dynamic, exploratory approach to the search for functional mechanisms. Methods such as GSEA can be applied to gene sets derived from pathways and their induced networks; the resulting enrichment scores for each pathway or sub‐pathway can then be displayed
on a dynamic, interactive network map in Cytoscape. These maps can be generated for any combination of
pathways using a browser which filters and organizes pathways and their constituents according to chosen statistical criteria. The main goal of our toolset is to enable biologists to move seamlessly from a bird’s‐eye
view of high‐throughput expression data across all pathways in a database. To this end we have developed several interrelated displays of pathway networks which move from coarse‐ to fine‐grained representations
of enriched pathway components (subpathways, steps, interactions, molecules), along with search and navigation tools to assist in bringing up the most relevant parts of this “active map” quickly. A secondary
goal is to use pathway databases as an alternative and high‐quality source of functional (annotation/set) and
network data as input to existing and novel analysis methods. To that end we provide tools for translating
 pathway data into these simpler representations. Using both components it is possible to generate an “active cell map” showing expression characteristics at the molecular level across all annotated pathways,
starting only with raw expression data.

Former supervisor: Lincoln Stein

Proposed collaborative internship with: Benjamin Blencowe

Registered in: Department of Molecular Genetics

My doctoral research project aims to better understand the role of pre-mRNA splicing and cancer development. There has been previous precedent in the literature demonstrating that perturbations of the splicing patterns within a cell can promote cellular transformation and cancer development. This leads to the primary hypothesis of my project that, “Perturbations in the splicing patterns within cancer cells contributes to the promotion of cellular transformation”. I will be utilizing pairedend RNA-seq data with a minimum read-size of 100bp that is being produced at the Ontario Institute for Cancer Research (OICR). This sequencing data will be generated using Illumina sequencing technology. The samples sequenced will primarily be derived from pancreatic ductal adenocarcinoma primary tumours, mouse xenografts, or cell lines that are being sequenced for the International Cancer Genome Consortium by OICR. To analyze this data, I first surveyed the existing RNA-seq alignment tools such as TopHat. However, I found that these tools produced alignment artifacts that caused deleterious effects on my down-stream analysis. I attempted to develop post-processing tools to fix these artifacts but there were always residual problems with the alignments. To remedy these artifacts I have begun to develop a tiered RNA-seq alignment pipeline. The tiered RNA-seq alignment pipeline I have been developing has two major steps. The first step attempts to align each read-pair individually to known splice junctions and the reference genome. I will be using a junction sequence database tailored specifically for the read-size produced by the sequencing reaction and Novoalign for the alignment. After the alignment a post-processing step will be performed to remove redundant and ambiguous alignments and to resolve the read-pairs. The second major step attempts to find novel junctions within the reads that did not align in the first step. I will be using a splicing aware aligner such as BLAT or a de novo assembler for this step. Initial testing of my pipeline compared to TopHat showed a significant increase in sensitivity for known splice-site alignments. Furthermore, my pipeline typically mapped more reads to known-splice sites. Finally, after the alignment steps, I will develop a pipeline that uses custom and/or published tools to analyze the samples splicing patterns. This step will require transcript assembly and abundance calculations and tools to normalize these values for comparison to other samples.

Former supervisor: Jane McGlade, Department of Biophysics

Current research: It seems that much more time and effort is spent developing and refining high-throughput experimental procedures than developing and refining the analysis procedure for the high-throughput results. It is clear that the method of analysing high-throughput results can greatly impact the conclusions drawn and subsequent experiments performed. My research focuses on developing an analysis method for results from protein array screens that provides an overview on which new hypotheses about the molecule under study can be generated. This involves first testing the method on published protein array results to demonstrate that new knowledge that can be uncovered and then applying the method to novel protein array results.

Former supervisor: Alan Moses, Department of Cell & Systems Biology

Collaborative traineeship with Brenda Andrews, Department of Molecular Genetics

My research explores evolutionary constraint in intrinsically disordered regions (IDRs) of proteins. IDRs are characterized by their lack of a stable secondary or tertiary structure, and comprise close to 40% of eukaryotic proteomes. Many IDRs have been shown to play important roles in the cell, particularly in signaling and regulation. However, in comparison to ordered regions of proteins, most IDRs appear highly diverged at the level of the primary amino acid sequence. We propose that these IDRs could have quantitative, sequence-encoded functions that are under stabilizing selection, wherein individual amino acids are under weak evolutionary constraint, but collectively contribute to a quantitative function that is under selection. So far, we have shown evidence for stabilizing selection in vivo and in silico for one IDR in budding yeast. I hope to apply our in silico method to detect stabilizing selection on quantitative features in IDRs proteome-wide, and further test these predictions in vivo.