Former supervisor: Nicholas Provart, Department of Cell and Systems Biology
Ryan successfully graduated from the GBB Program on 1 May 2009, as the fourth GBB Program Graduate. Ryan is now a compbio post-doc in the Centre for the Analysis of Genome Evolution & Function.
The de novo prediction of functionally significant sequence motifs in Arabidopsis thaliana
This thesis performs de novo predictions for functionally significant sequence motifs in the Arabidopsis genome under two separate contexts. Each study applies the use of genomic positional information, statistical over-representation and several biologically contextual filters to maximize the visibility of biological signal in prediction results. Numerous literature supported motifs are prevalent in the results of both studies and a number of novel motif patterns possess a strong potential for in planta significance.
The first study examines the statistical over-representation of C-terminal tripeptides as a means for identifying eukaryotic conserved protein targetting signatures. Comparative genomics is applied to the analysis of tripeptide frequencies in the C-terminus of 7 eukaryotic proteomes. While biological signal is maximized through the filtering of both simple sequences and homologous sequences present across protein families.
The second study introduces a methodology for the effective prediction of transcription factor binding sites in Arabidopsis. A collection of motif prediction algorithms and a novel enumerative strategy are applied to the prediction of cis-acting regulatory elements within the promoters of genes found coexpressed within distinct tissues and under specific abiotic stress treatments. Significance levels for all predictions are standardized using a novel discriminative approach to statistical over-representation and results are interpreted using a utility created for the statistical filtering and graphically analysis of TFBS motifs. Both the novel enumerative method and the overall integrative methodology are shown to outperform existing approaches to TFBS prediction.
WiP Seminar, 16 December 2008
The Role of the RING Finger Protein Makorin-1 in Embryonic Stem Cell Self-Renewal
The viability of pluripotent cells as a therapeutic source is reliant on the improved knowledge of the molecular events controlling their derivation and fate (self-renewal versus commitment). Accordingly, to identify novel regulators of ESC fate, we combined temporal expression microarray analysis on early committed ESCs with promoter occupancy studies. In this study, Makorin-1 (Mkrn1) was identified to be transcriptionally co-regulated with known regulators of ESC pluripotency. The function of Mkrn1 in ESC self-renewal is currently unknown however; its expression is dependent on the undifferentiated state of the ESC. To further investigate the role of Mkrn1 in ESC self-renewal we induced Mkrn1 knockdown with shRNA in stable ESC clones. The knockdown of Mkrn1 hastened differentiation and led to a concomitant decrease in Oct4 mRNA and protein levels when cultured in self-renewal conditions. Conversely, the enforced expression of Mkrn1 in ESCs hinders differentiation when cultured in differentiation conditions as evident by higher Oct4 mRNA and protein levels. The data indicate that Mkrn1 functions as a novel regulator of ESC self-renewal; however, its mechanism of action remains unknown. Mass spectrometry analysis of the Mkrn1 protein interaction network in undifferentiated ESCs revealed that Mkrn1 interacts with a number of known mRNA-binding proteins suggesting that Mkrn1 regulates ESC fate through a previously uncharacterized post-transcriptional complex.
Current work is focused on identifying the subset of Mkrn1-bound transcripts in ESCs through the use of ribonucleoprotein immunoprecipitation-sequencing (RIP-Seq) analysis. Our goal is to construct a Mkrn1 post-transcriptional to further our insight into the regulatory networks that control ESC fate decisions.
Supervisor: John Parkinson, Research Institute of the Hospital for Sick Children, enrolled in Department of Molecular and Medical Genetics
WiP Seminar, 12 May 2009
iGEM 2009 – Building a standard platform to investigate the potential of Enzyme Channeling
Enzyme channeling has the potential to increase the efficiency of some otherwise thermodynamically unfavorable reactions through the co-localization of enzymes catalyzing adjacent steps in a biochemical reaction. In addition, channeling may also be involved in pathway switching, preventing escape of small molecules, preventing accumulation of toxic intermediates and preventing the breakdown of unstable intermediates. In nature, this is accomplished by a number of mechanisms including protein fusion together with further surface charge adaptations which optimize delivery of reaction intermediates from one active site to another, trafficking of enzymes to the same cellular compartment, the use of complexes and membrane scaffolds. Recently, Chris Sanford in our laboratory has explored the effect of channeling using a three dimensional lattice simulation environment (Cell++) which has been uniquely designed to consider the spatial relationships of component objects when simulating cell processes. This is particularly important because the role of spatial organization in biological pathways has typically been neglected in simulations. The results suggest that the simple colocalization of enzyme pairs with certain properties can affect the rates of accumulation of reaction intermediates in a pathway. Under the auspices of the international genetically engineered machine (iGEM) competition, a team of undergraduates will test this prediction using a synthetic biology approach to construct a standard, re-usable platform capable of co-localizing chosen enzyme pairs. We will use this platform to investigate channeling in predicted enzyme pairs with an emphasis on pathways that may be of biological or commercial interest. See www.igemtoronto.org.
Supervisor: John Parkinson, collorative internship proposed with Fred W. Keely, Department of Biochemisty, University of Toronto.
WiP Seminar, 22 Nov 2011
Elastin polymorphisms associated with increased risk of cardiovascular disease
Elastin, a polymeric protein and a member of extra cellular matrix, is playing a major role in elasticity of many tissues including skin, lung parenchyma and large arteries. It is a major structural protein in the walls of large blood vessels such as aorta and is responsible for elasticity of vascular tissues. Elastin fibers are remarkably stable with little or no normal turnover over the life-span of an individual therefore; they should be able to withstand millions of cycles of extension and recoil in tissues such as arteries without mechanical failure. We hypothesize that any subtle variation in elastin sequence can impact Elastin durability in arteries and consequently increase susceptibility to cardiovascular diseases. Applying the Solexa next generation sequencing platform, we have sequenced the elastin gene (ELN) from 800 subjects diagnosed with thoracic aortic aneurysm and dissection (TAAD) in addition to 400 control samples from Ontario residents. Our goal is to identify and characterize those SNPs in the elastin gene that are enriched in TAAD cohort.
Sequence variants in elastin and their association with late-onset of cardiovascular disease
Elastin, a polymeric protein and a member of extra cellular matrix, is playing a major role in elasticity of many tissues including skin, lung parenchyma and large arteries. It is a major structural protein in the walls of large blood vessels such as aorta and is responsible for elasticity of vascular tissues. Elastic properties of vascular tissues is very important for their physiological function, therefore abnormalities in elastin production or assembly can result in cardiovascular conditions, such as aneurysms, hypertension and atherosclerosis. Better understanding of elastin sequence variability between patients diagnosed with heart disease and healthy individuals and the impact of discovered sequence variants on elastin biomechanical properties and function will have applications in design of novel diagnostics and biomarkers for late-onset of cardiovascular disease.
In my research I plan to 1- Characterize elastin sequence variants in patients diagnosed with late-onset cardiovascular diseases. 2- Explore the impact of sequence variants on elastin biomechanical properties 3- Experimentally validate sequence variants by generating recombinant polypeptides For the first stage of my research, samples from 800 Thoracic Aortic Aneurysm and Dissection (TAAD) patients were collected by Dr. Dianna Milewicz from University of Texas Medical School. These samples were sequenced along with 400 samples from OPGP (Ontario Population Genomic Repository), by The Centre for Applied Genomics (TCAG) at SickKids using next generation sequencing Solexa. For sequence alignment and SNP calling, I used MAQ (Mapping and Assembly with Quality). I also used Perl and R for parsing files created by MAQ and analyzing data. Currently I am in the process of selecting SNPs for genotyping and further study their impact on Elastin integrity and function which will help us increase our understanding of pathologies of late-onset of cardiovascular
Supervisor: John Parkinson
Registered in: Department of Molecular Genetics
Description of Research Project
The Apicomplexa is a large phylum of unicellular eukaryotes, from which a number of medically relevant parasites are members. More specifically, species from the genuses Plasmodium, Toxoplasma, and Cryptosporidium are the etiological agents of the most common diseases caused by apicomplexans which are malaria, toxoplasmosis, and cryptosporidiosis, respectively. Currently, there is a lack of effective vaccines or treatments against many apicomplexans, and the increasing prevalence of drug resistant strains has stressed the urgency to develop novel drug therapies.
To meet these global health care challenges, several international consortia have generated vast amounts of sequence data, offering opportunities to gain insight into apicomplexans through in silico analyses. Information from the genome sequences combined with proteome, transcriptome, and other high-throughput datasets are being exploited to better understand apicomplexan biology. Our current knowledge on apicomplexan metabolism has revealed a number of pathways specific to the phylum, but for which many enzymes have not yet been elucidated. Due to selection pressures associated with surviving in an obligate host, these enzymes are either absent from the parasite, or present in the parasite and have evolved to be highly divergent from the host organism. To answer key questions concerning the evolution and conservation of apicomplexan parasites, I aim to apply comparative network analyses to identify parasite-specific enzymes that are critical for survival and up-regulated during stages of parasite growth and infection. These represent adaptations of the parasite to persist in the host and are important targets for therapeutic intervention.
Genome-scale modeling of the metabolism of Dehalococcoides bacteria: from genome to pan-genome
Dehalococcoides are important for the bioremediation of chlorinated solvent contaminated sites. However, how this dehalogenating capability has been acquired and employed by these microbes is not well understood, specifically at the metabolism level. In addition, the low growth yield of these bacteria is a major impediment to faster bioremediation process. Hence, genome-scale reconstruction of the entire metabolic network and subsequent modeling of Dehalococcoides will be beneficial to understanding and overcoming these issues. Moreover, such a model is an excellent platform for exploring the metabolic capability of a microbe as well as for generating experimentally testable hypotheses regarding the microbe’s physiology. Hence in this presentation, I’ll talk about a genome-scale reconstruction of the entire metabolic network and subsequent modeling of Dehalococcoides species strain CBDB1, a dechlorinating bacterium unique for its metabolic niche of degrading toxic and persistent ground water pollutants. In addition, I’ll also talk about the Dehalococcoides pan-metabolic model that has been developed using strain CBDB1 metabolic model and the published genome sequences of 4 Dehalococcoides isolates –Dehalococcoides sp. strain CBDB1, Dehalococcoides ethenogenes strain 195, Dehalococcoides sp. strain BAV1, Dehalococcoides sp. strain VS. The pan-model reveals the remarkable similarities exist among the isolates from the context of core metabolic processes.
The role of signaling crosstalk between TGFβ /Smad and Hippo/TAZ pathways in determining cell fate: In my research project, I plan to use a multidisciplinary approach to model the crosstalk between hippo and TGFβ/Smad signaling pathways and identify the unknown parameters of the nonlinear model using advanced control theory. The proposed research plan involves the completion of two steps: 1) identifying and modeling of crosstalk between TGFβ/Smad signaling pathways and hippo pathways 2) validation of the model in cultured cells. To achieve these goals, I will consider different scenarios for the unknown mechanisms (unmodeled dynamics in the control theory sense) and by using experimental measurements I will try to identify the unknown parameters of the models. In modeling biological networks, the unknown parameters are often fitted based on experimental measurements. However, the measurements are very noisy and limited. In this research, I will use a dynamic recursive estimator, known as Unscented Kalman Filter (UKF), to estimate the model parameters and identify the unmodeled dynamics in different scenarios. Finally, after obtaining the models to validate them, I will perform structural perturbation experiments, gene silencing and inducible overexpression. For this I will use siRNA libraries and several small molecule screens, designed and implemented in our lab to identify positive and negative chemical inhibitors of the pathways.
Former supervisors: James Dennis, Department of Biochemistry & Christopher Hogue, Department of Biochemistry
Ken Lau successfully completed the GBB program in April 2008 – and in fact was the first GBB graduate! He has gone on to a post-doctoral position at Harvard.
A systems biology approach to decoding the function of complex N-glycans
Embryogenesis, tissue repair and adaptive immunity involve developmental sequences of cell proliferation followed by differentiation and cell cycle arrest. Growth factors and other cytokines bind glycoprotein receptors to stimulate growth or arrest signaling, with the net response dependent on the availability of both ligands and receptors. The number of N-glycans (n), a distinct feature of each glycoprotein sequence, cooperates with the physical properties of the Golgi pathway to regulate surface levels of receptors. The Golgi pathway is ultrasensitive to hexosamine flux for the production of tri- and tetraantennary N-glycans, which bind galectins to form a molecular lattice that opposes glycoprotein endocytosis. Glycoproteins with few N-glycans (low n – e.g. TβR, CTLA-4, GLUT4) exhibit enhanced cell surface expression with switch-like responses to increasing hexosamine concentration, whereas glycoproteins with high numbers of N-glycans (high n – e.g. EGFR, IGFR, FGFR, PDGFR) exhibit hyperbolic responses. A bioinformatics survey shows that receptor kinases with high n play roles in metabolism and growth, while those with low n have functions in arrest and differentiation pathways. Computational modeling and experimental data reveal that these features impose a sequence of growth-to-arrest/differentiation, where growth-promoting high n receptors stimulate nutrient flux first, which then drives arrest/differentiation programs by increasing surface levels of low n glycoproteins. Interaction of the N-glycan branching pathway and N-glycan number play important roles in diseases, as evidenced by the identification of synergistic polymorphisms in Mgat1 and CTLA-4 in human multiple sclerosis patients. In order to fully understand the interaction of N-glycan functions with cellular phenotypes, we applied an unbiased screening approach to examine perturbation to N-glycan branching on a global scale. By utilizing microarray analysis and siRNA knockdown, we have identified GIcNAc-sensitive pathways that validate and extend our model, and simultaneously affect growth regulation, complex N-glycan processing, and constitutive endocytosis. Non-intuitive genes of interest identified by our screen include those associated with ionic regulation and proteoglycan biosynthesis. Our results reveal a mechanism for the metabolic regulation of the cellular transition between growth and arrest in mammals that arises from the apparent co-evolution of N-glycan number and branching.
mammals that arises from the apparent co-evolution of N-glycan number and branching.
WiP Seminar, 20 January 2009
Identification of novel innate immunity elicitors using molecular signatures of natural selection
The innate immune system protects eukaryotes against invading microbes upon perception of microbial motifs called pathogen-associated molecular patterns (PAMPs). Despite their central role in innate immunity, and their potential as antimicrobial agents, few PAMPs have been identified, and no systematic method has been developed for their discovery.
We show that competing signatures of negative selection to preserve core functions, and positive selection to avoid host recognition can be used to identify novel PAMPs. A selection analysis confirmed that all 1322 core genes identified from six phytopathogenic bacteria exhibited strong negative selection at the whole-gene level, while 56 genes also showed localized regions of positive selection. We show that these candidate PAMPs differ from the core genome with respect to the number and clustering of positively-selected sites. Finally, we functionally confirmed the candidates’ ability to induce innate immunity in Arabidopsis thaliana via callose deposition and virulence suppression assays.
Proposed collaborative internship with: Gary Bader
Heart failure (HF) is the cardiovascular epidemic of the 21st century. The goal of our research program is to explore novel pathways leading to heart failure, so that it can be effectively prevented. Bioinformatic tools are used to identify the novel target to study, possible players of the progression of disease, interactions and networks among players and screening tools like biomarker. In my study, my novel target, mindin, has been identified to markedly change in heart failure through microarray analysis. Mindin is a new member of the Mindin-F-spondin family of proteins that is highly conserved in evolution. It is expressed in heart and seems to mediate innate immunity. The aims of this research are first to identify the functions of mindin in cardiac remodeling and cardiac failure due to MI, second to determine the pathways that mindin is involved (includes mindin’s interactions with other players), third to identify the potential therapeutic intervention of mindin and fourth to identify the potential usage of mindin as a biomarker. Mindin’s target will be identified first with in silico analysis of mindin’s gene and protein sequences, then mindin’s interaction with potential targets will be analyzed by high throughput proteomic analysis. Microarrays and mass-specs can be used to identify mindin’s potential usage as a biomarker.
Former supervisor: Quaid Morris, Department of Computer Science
WiP Seminar, 6 Jan 2009
Predicting protein function by integrating multiple large-scale biological datasets
Most successful computational approaches for protein function prediction integrate multiple genomics and proteomics data sources to make inferences about the function of unknown proteins. The most accurate of these algorithms have long running times, making them unsuitable for real-time protein function prediction in large genomes. In the first part of this talk, we discuss an algorithm, GeneMANIA, that is as accurate as the leading methods, while capable of predicting protein function in real-time. In particular, GeneMANIA, represents each data source as a functional association network and combines these networks into a function-specific composite network. The composite network is then used predict protein function, using a label propagation algorithm. We show that GeneMANIA is one of the leading methods on yeast and mouse benchmark data.
In the second part of this talk, we discuss a modification of GeneMANIA that allows us to accurately predict protein function when there are only a few positive examples (annotated proteins). We show that this modification improves the prediction accuracy in yeast, mouse, and human benchmark data. In addition, we show that our method improves on two previously proposed modes of network integrations. Finally, we verify some of our most likely predictions by conducting a literature survey.
Former supervisor: Alan Moses, Department of Cell and Systems Biology
Collaborative internship proposed with: Brenda Andrews
Proteins contain intrinsically disordered regions – they consist of protein sequences with no apparent structure under native conditions. Although rare in bacteria and archaea, these regions are prevalent in eukaryotes. For example, they populate roughly 15-20% of the proteins in our genome. Their role is not fully understood, however, they are thought to be enriched in protein regulatory elements or short linear motifs. These short linear motifs are critical for the function of proteins as they act as functional switches which modulate the location, activity and degradation of the protein. The goal of my project is to perform a systematic genome-wide prediction of short linear motifs using evolutionary conservation.
I will begin by collecting several characterized instances of short linear motifs, including phosphorylation sites, localization signals, interaction motifs and degradation signals. These short linear motifs will be examined for their evolutionary properties and used to design a phylogenetic hidden Markov Model. This algorithm can assess evolutionary conservation by taking advantage of the phylogenetic tree between species. I will then perform a genome-wide search for short conserved sequences. This search will uncover novel short linear motifs and interesting conserved sequences will be functionally assayed in the lab by performing single-site mutagenesis on the predicted sequences. Another aspect of this project is to use the predicted sequences to uncover amino acid patterns which define short linear motifs. While several patterns are known to the scientific community, it is estimated that many more exist. I have already shown that evolutionary conservation alone is sufficient in uncovering the pattern specificity of two kinases. By grouping the predicted short linear motifs using sequence similarity, novel patterns will be discovered and function can be deduced. I will test if the patterns are directly responsible for the inferred function.
Supervisors: Andrew Emili
Collaborative Internship with: Shoshana Wodak
Registered in: Department of Molecular Genetics
My research focuses on interactions among chromatin-associated proteins with particular emphasis on core methylation systems. Histone methyltransferases are important regulators of chromatin structure and gene expression programs and are key determinants of cell fate and function. Several families of histone methyltransferases are encoded by the human genome with distinct catalytic and functional properties. To identify interactors for these enzymes, I am applying an affinity capture LC-MS/MS approach using two complementary proteomics approaches: lentiviral-based TAP-tagging and phage-derived synthetic antibodies. Systematic analysis of interacting partners for these enzymes will improve our understanding of the molecular mechanisms underlying histone methylation and transcriptional regulation.
Former supervisor: Peter Zandstra, Institute of Biomaterials and Biomedical Engineering | Former collaborative traineeship supervisor: Quaid Morris
Hematopoietic stem cells (HSCs) from umbilical cord blood are valuable resources for blood transplantation. Unfortunately, the small HSC numbers per cord blood collection restrict the wide application of cord blood transplantation. The capability of qualitative and quantitative in vitro HSC expansion is therefore important for enhancing the clinical benefits of HSCs. Our previous studies showed that proper manipulation of the intercellular signaling between HSCs and other cell populations can enhance in vitro HSC expansion. However, HSC expansion culture is dynamic and heterogeneous system. To date, it is challenging to identify the key factors that we may manipulate. The goal of my project is to use computation tools to gain better understanding about the intercellular signaling between hematopoietic cells and the regulation of dynamic environment on HSC fates.
So far, we have employed microarray data and bioinformatics tools in study the intercellular signaling patterns between HSCs, progenitors and mature cells. As a next step, we seek to implement informatics or mathematical models of the dynamic regulation of environmental cues on HSC fates.
Supervisor: Krishna Mahadevan
collaborative internship with Elizabeth Edwards
The ultimate goal of this study is to develop methods for modeling and engineering the metabolism of a clostridial co-culture, and improving the biobutanol production rate with the use of a consolidated bioprocessing approach. Genome-scale metabolic models of microorganisms from different domains of life have been developed and been applied for analyses of metabolism in pure cultures; however systems biology of microbial co-cultures will extend our knowledge on pure culture physiology to microbial co-cultures, where metabolic interactions along with inter-species transport of metabolites are present. System-level understanding of the Clostridium cellulolyticum and Clostridium acetobutylicum co-culture metabolism, which can be applied for biobutanol production from cellulosic biomass, facilitates the analyses and design of strategies for process and metabolic optimization; thus improving the biobutanol production rate. Therefore, the development of computational methods to investigate the interactions between microorganisms in microbial co-cultures, based on the community genome sequences and physiology, is beneficial for the ultimate engineering of these co-cultures; consequently the focus of this proposal is the development of such methods.
Proposed collaborative internship with: Gary Bader
Entered into home PhD program: January, 2011
My overall objective is to analyze various types of high-throughput data on medulloblastoma samples for the purposes of understanding different mechanisms of tumourigenesis, classifying tumour samples into biologically relevant subtypes, and identifying common and divergent disruptions of signalling pathways.
To this end, I am devising and implementing strategies to integrate various sources of data including SNP array, expression array, and in the future, RNA-seq data. I will help refine existing molecular subtypes of medulloblastoma, and determine how the molecular subtypes differ in terms of disrupted genes and pathways, with the hopes of developing therapeutic strategies specific for each subtype. I will further determine how medulloblastoma changes in response to treatment, by comparing the genetic profiles of primary and recurrent tumours, and subsequently inferring aberrations in signalling networks.
In later stages of the project, the bioinformatic predictions will be validated using medublastoma cell lines and mouse models.
WiP Seminar, 21-Oct-2008
Investigating Cellular Decision-Making in Apoptosis
Networks of kinases play a role in the transmission and integration of signals from the membrane to the nucleus. We aim to elucidate kinase phosphorylation and interaction partners in these networks through the immuno-precipitation and mass spectrometric analysis of a representative set of 100 Flag-tagged kinases stably expressed in human colorectal cancer cells. The goal is to generate a comprehensive set of interactions and dynamic phosphorylation sites which correlate with cell phenotypes such as apoptosis and proliferation. The techniques of mass-spectrometry have allowed for the identification of proteins and their phosphorylation sites in complex samples. However, kinases usually work in the context of particular signaling stimuli. We aim to characterize the role of these over-expressed kinases in the context of Trail-induced apoptosis. This is particularly relevant to tumorigenesis in that many cancers are resistant to apoptosis and recombinant Trail therapies are currently undergoing clinical trials. We present assays to correlate the proliferative ability and sensitivity to apoptosis of various stable cell lines with kinase expression levels through flow cytometry. We also present efforts to trace downstream signaling through the monitoring of MAP kinase phosphorylation using a high-throughput bead array
In the news: Breast cancer survival predicted by new Canadian tool – 1 Feb 2009. See story describing Ian Taylor and Jeff Wrana’s predictive tool, published in Nature Biotechnology, in the Financial Post. Quaid Morris and Tony Pawson of the GBB program were also co-authors on this paper.
WiP Seminar, 4 November 2008
The Discovery and Analysis of Structural Motifs, from Specifying the General, to Generalizing the Specifics
Recurring structural patterns observed across non-homologous protein families can be hypothesized as products of convergent evolution and may be associated with low conformational energy. These recurring patterns, which we call motifs, will give valuable insights in areas such as stability engineering and protein structure prediction. To specifically look at packing motifs in proteins, I have represented patterns in a way that can probe for general packing patterns that include both local and non-local interactions. After these tens of thousands of structural motifs were discovered, necessary statistical analyzes were done to uncover the biological meaning of these motifs. In working with structural motifs I have encountered several key challenges. I have highlighted the challenges with selected motif examples from the results.
Supervisor: Shoshana Wodak, Department of Biochemistry
Supervisor: Lincoln Stein
Proposed collaborative internship with: Benjamin Blencowe
Registered in: Department of Molecular Genetics
My doctoral research project aims to better understand the role of pre-mRNA splicing and cancer development. There has been previous precedent in the literature demonstrating that perturbations of the splicing patterns within a cell can promote cellular transformation and cancer development. This leads to the primary hypothesis of my project that, “Perturbations in the splicing patterns within cancer cells contributes to the promotion of cellular transformation”. I will be utilizing pairedend RNA-seq data with a minimum read-size of 100bp that is being produced at the Ontario Institute for Cancer Research (OICR). This sequencing data will be generated using Illumina sequencing technology. The samples sequenced will primarily be derived from pancreatic ductal adenocarcinoma primary tumours, mouse xenografts, or cell lines that are being sequenced for the International Cancer Genome Consortium by OICR. To analyze this data, I first surveyed the existing RNA-seq alignment tools such as TopHat. However, I found that these tools produced alignment artifacts that caused deleterious effects on my down-stream analysis. I attempted to develop post-processing tools to fix these artifacts but there were always residual problems with the alignments. To remedy these artifacts I have begun to develop a tiered RNA-seq alignment pipeline. The tiered RNA-seq alignment pipeline I have been developing has two major steps. The first step attempts to align each read-pair individually to known splice junctions and the reference genome. I will be using a junction sequence database tailored specifically for the read-size produced by the sequencing reaction and Novoalign for the alignment. After the alignment a post-processing step will be performed to remove redundant and ambiguous alignments and to resolve the read-pairs. The second major step attempts to find novel junctions within the reads that did not align in the first step. I will be using a splicing aware aligner such as BLAT or a de novo assembler for this step. Initial testing of my pipeline compared to TopHat showed a significant increase in sensitivity for known splice-site alignments. Furthermore, my pipeline typically mapped more reads to known-splice sites. Finally, after the alignment steps, I will develop a pipeline that uses custom and/or published tools to analyze the samples splicing patterns. This step will require transcript assembly and abundance calculations and tools to normalize these values for comparison to other samples.
My research explores evolutionary constraint in intrinsically disordered regions (IDRs) of proteins. IDRs are characterized by their lack of a stable secondary or tertiary structure, and comprise close to 40% of eukaryotic proteomes. Many IDRs have been shown to play important roles in the cell, particularly in signaling and regulation. However, in comparison to ordered regions of proteins, most IDRs appear highly diverged at the level of the primary amino acid sequence. We propose that these IDRs could have quantitative, sequence-encoded functions that are under stabilizing selection, wherein individual amino acids are under weak evolutionary constraint, but collectively contribute to a quantitative function that is under selection. So far, we have shown evidence for stabilizing selection in vivo and in silico for one IDR in budding yeast. I hope to apply our in silico method to detect stabilizing selection on quantitative features in IDRs proteome-wide, and further test these predictions in vivo.