Alternative splicing is a tightly regulated process which forms a crucial layer of gene expression and exerts its effects in a tissue-specific manner. My project is centred on identifying the elements involved in the regulation of alternative splicing. The cis-elements are sequence determinants of alternative splicing which are recognized by trans-factors which result in diverse splicing patterns. We have employed a random library approach to identify these elements and study their effects on splicing where minigene reporters with diverse random decamers as potential cis-elements are introduced into C.elegans and parallel in-vivo measurements are made by RNAseq. This leads to identification of activators, repressors and cryptic splice site inducers. This is followed by wet-lab validation involving reverse transcription PCR, bioinformatic analysis for identification of interacting trans-factors and locating these elements genome-wide. My main emphasis would be on developing sophisticated computational models to understand the regulation of alternative splicing.
My project aims to develop a tool to detect SVs from long-read sequencing data exploiting the benefits of genome assembly graph based SV calling. I aim to use structures in genome assembly graphs such as bubbles, branches and loops to detect SVs such as indels, translocations, inversions and duplications. I will develop methods to detect each type of variation from an assembly graph directly without the need for a reference, building on the work done for short read assembly based SV callers. These methods will be integrated together into a single tool for long read graph based SV detection. These methods will be subsequently tested and validated by comparing their SV calls to current reference based approaches using mixed lineage leukemia samples.
I am interested in molecular basis for adaptations to metabolic stress. I am addressing this quest by using South American weakly electric fishes as a study system. Despite not being a model fish species, electric fishes are a rich system to investigate metabolic stress, as they harbor novel skeletal-muscle derived organ, the electric organ, to generate electric pulses. Electric organs rely on voltage-gated sodium channels, coupled with Na-K ATPases, both of which are known to consume a large number of ATP molecules. In addition, a number of electric fishes live in habitats that are deprived of dissolved oxygen seasonally, and this adds more metabolic burden to adapt to. Using high-throughput genomic and transcriptomic sequencing data, both generated in our lab and by other labs in the field, I am interested identifying the rates of nonsynonymous mutations in electric fishes, compared to non-electric, in genes implicated in mitochondrial respiration and metabolic adaptations. In addition, I am using RNA-seq data to get an insight into expression of those genes. Finally, I can test the importance of many of the non-synonymous mutations identified using genomic/transcriptomic data analyses in cell cultures, and employing site-directed mutagenesis.
Supervisor: Melissa Holmes, Department of Cell and Systems Biology
Puberty is an essential developmental process in mammals. Previous studies have identified genes and regulators critical to puberty onset, suggesting this process is regulated epigenetically.1 However, no gene regulatory network for pubertal onset has been produced. The naked mole-rat (Heterocephalus glaber, NMR) is a unique mammal exhibiting socially-mediated reproductive suppression and whose potential for studying puberty is unmet. NMRs reside in large colonies of adult subordinates who remain in a prepubertal state due to the presence of a dominant breeding female.2 Most NMRs will never go through puberty unless they are removed from the suppressive cues of their colony. Only then do they exhibit the morphological, endocrine, and behavioural hallmarks of mammalian puberty, 3,4 providing an exceptional opportunity for experimental control of pubertal timing. The proposed studies will use NMRs to elucidate the genes (and their pathways) involved in reproductive suppression and subsequent activation. By identifying a gene regulatory network associated with pubertal delay, we aim to understand the biological mechanisms controlling pubertal timing in mammals.
Supervisor: Jennifer Mitchell, Department of Cell and Systems Biology
Collaborative internship with: Alan Moses
My research projects are focused on identifying tissue specific transcriptional regulatory elements in mammalian genomes using next-generation sequencing, bioinformatics and comparative genomics. Enhancers are one of the major components of the complex non-coding genome which regulate gene expression in a tissue specific manner with the help of tissue specific transcription factors (TFs). The overall goal of my PhD is to develop a better bioinformatics enhancer prediction model for mouse embryonic stem cell (mES).
Supervisor: Michael Hoffman, Department of Computer Science
Many transcription factors initiate transcription only in particular sequence contexts, providing the means for sequence specificity of transcriptional control. The position weight matrix (PWM) model allows for the computational identification of transcription factor binding sites (TFBSs), by characterizing a transcription factor’s position-specific preference over the DNA alphabet. This four-letter alphabet, however, only partially describes the possible diversity of nucleobases a transcription factor might encounter. For instance, cytosine is often present in a covalently modified form: 5-methylcytosine (5mC). 5mC can be successively oxidized to 5-hydroxymethylcytosine (5hmC), 5-formylcytosine (5fC), and 5-carboxylcytosine (5caC). Just as transcription factors distinguish one unmodified nucleobase from another, some have been shown to distinguish unmodified bases from these covalently modified bases. Modification-sensitive transcription factors provide a mechanism by which widespread changes in methylation and hydroxymethylation can dramatically shift active gene expression programs.
To understand the effect of modified nucleobases on gene regulation, I developed methods to discover motifs and identify TFBSs in DNA with covalent modifications. My models expand the standard A/C/G/T alphabet, adding m (5mC) and h (5hmC), f (5fC), and c (5caC). I created an expanded-alphabet sequence using whole-genome maps of 5mC and 5hmC in naive mouse T cells. Building upon my modified sequence, I discover TFBS motifs de novo and by using a hypothesis testing approach, on modified sequences in regions implicated by existing chromatin immunoprecipitation-sequencing (ChIP-seq) data. I elucidated various known methylation binding preferences, including the preference of ZFP57 and C/EBPβ for methylated motifs. I demonstrated that my method is robust to parameter perturbations, with transcription factors’ sensitivities for 5mC and 5hmC broadly conserved across a range of modified base calling thresholds. I am now beginning to discover novel transcription factor binding preferences, and am in the process of mining all Mouse ENCODE ChIP-seq data for these modified binding preferences. We plan to follow-up with collaborators, who will perform in vivo validation, via ChIP-seq experiments for the transcription factors that I predict to have altered 5mC/5hmC binding affinities.
A pre-print of our preliminary work is available on bioRxiv: http://dx.doi.org/10.1101/043794.