PhD Thesis Defenses
2022
January 4th
Kritika Karri
Title: Computational Characterization of Long Non-Coding RNAs (LncRNAs) and Study Their Role in Rodent Liver Disease, Xenobiotic Exposure, and Sex-Specific Responses Using Bulk and Single Cell RNA-Sequencing
Major Professor: David Waxman
AB
June 10
Aaron Chevalier
Title: Tools for Mutational Signature Discovery and Methods for Prediction of Drug Response
Major Professor: Joshua Campbell
ABSTRACT:
Mutational signatures are patterns of somatic alterations in the genome caused by carcinogenic exposures or aberrant cellular processes. Specifically, this dissertation focuses on the analysis of mutational signatures in human cancer and its application to stratification of patients for drug response.
To provide a comprehensive workflow for preprocessing, analysis, and visualization of mutational signatures, I created the Mutational Signature Comprehensive Analysis Toolkit (musicatk) package. musicatk enables users to select different schemas for counting mutation types and easily combine count tables from different schemas. Multiple distinct methods are available to deconvolute signatures and exposures or to predict exposures in individual samples given a pre-existing set of signatures. Additional exploratory features include the ability to compare signatures to the COSMIC database, embed tumors in two dimensions with UMAP, cluster tumors into subgroups based on exposure frequencies, identify differentially active exposures between tumor subgroups, and plot exposure distributions across user-defined annotations such as tumor type.
I then use musicatk to analyze the largest tumor sequencing dataset from a Chinese population to date. I identified differences in the levels of signature exposures compared to similar data from a Western cohort. Specifically, COSMIC signature SBS25 was higher in the Chinese dataset for Melanoma and Renal Cell Carcinoma patients and Melanoma patients had lower levels of SBS7a/b (Ultraviolet Light). My analysis also revealed a putative novel signature enriched in pancreatic cancers.
Lastly, I assess the ability of mutational signatures to identify patients who may respond to irofulven, a drug for late-stage cancer patients who have defects in the Transcription Coupled Nucleotide Excision Repair (TC-NER) pathway. As the functional understanding of which mutations successfully disrupt this pathway is incomplete, I develop an approach that classifies patients based on evidence of this pathway being disrupted based on levels of mutational signatures. I build a model that successfully predicts patients who will respond to treatment without a known relevant mutation in the TC-NER pathway.
The work from this study furthers our understanding of mutational signatures in different populations and demonstrates the feasibility of using mutational signatures to identify patients eligible for drug trials.
August 19
Lucas Schiffer
Title: Multimodal, Longitudinal, and Mega-Analysis of Biomedical Data
Major Professor: W. Evan Johnson
AB
Challenges related to multimodal analysis of biomedical data will be explored through the development of MultimodalExperiment, a data structure that appropriately and efficiently represents multiomics data that is hierarchical, multimodal, and/or longitudinal in nature. A schematic of and methods for the data structure will be presented along with example usage to demonstrate how current challenges of alternative data structures are overcome, ease of data management is improved, and computational/storage efficiency is optimized.
Challenges related to longitudinal analysis of biomedical data will be explored in the context of a cohort study of cancer patients being treated with anti-programmed cell death protein 1/programmed cell death ligand 1 immunotherapies at Boston Medical Center. The progression-free survival status of study participants will be analyzed using linear mixed effects models which incorporate longitudinal high-dimensional metabolomics data. Maps of metabolic pathways and a hypothesis will be presented to explain serum metabolites that are associated with progress-free survival status and possibly therapeutic efficacy.
Challenges related to mega-analysis of biomedical data will be explored through the creation of a pipeline to preprocess transcriptomics data from human host infected with tuberculosis to support machine learning and other tasks. The details of original software developed to provide more than 10,000 samples of clean high-quality machine learning ready data from all related and eligible studies in the Gene Expression Omnibus repository will be illustrated. The importance improving diagnostic testing and therapeutic interventions for tuberculosis disease will be highlighted in the context of these data, and the specifics of why they represent a key ingredient for machine learning that helps overcome current challenges in the field will be explained.
August 24
Boting Ning
Title: Leveraging Transcriptomic Regulation to Understand, Diagnose and Intercept Early Lung Cancer Pathogenesis
Major Professor: Marc Lenburg
ABSTRACT:
Lung cancer is the leading cause of cancer death in the US, largely due to the lack of treatment options to intercept the progression of early lung cancers and methods to diagnose lung cancer at early stages. Prior studies indicated that the lack of immune surveillance is associated with the progression of bronchial premalignant lesions (PMLs) and the gene alterations in the nasal epithelium can be leveraged for the early detection of lung cancer. Yet, the regulatory mechanism of these gene expression alterations is still less understood. Thus, there are unmet needs to study the gene expression regulation for better disease management of early lung cancer, including further understanding the biology of early lung cancer development, identifying potential interception strategies, and improving the lung cancer diagnosis.
我的
Collectively, my thesis investigated the gene expression regulation mechanisms to facilitate the understanding, interception, and diagnosis of early lung cancer pathogenesis.
November 17th
Rebecca Panitch
Title: Understanding the Mechanisms and Pathways of Alzheimer’s Disease in APOE Genotype Sub-Populations
Major Professor: Lindsay Farrer
AB
November 21st
Dileep Kishore
Title: Computational Study of Microbe-Microbe Interactions and Their Interplay with Their Environment
Major Professor: Daniel Segrè
AB
The first part of my thesis is focused on improving the workflow for the inference of microbial co-occurrence relationships from abundance data. Toward this goal, we developed Microbial Co-occurrence Network Explorer or MiCoNE, a pipeline that infers microbial co-occurrences from 16S ribosomal RNA (16S rRNA) amplicon data. The second part of my thesis focuses on microbe-host interactions rather than microbe-microbe associations. In particular, we sought to predict the effects of microbial metabolites on human receptors and their associated regulatory pathways. In the final part of my thesis, we turn to the question of whether computational algorithms can help control microbial community growth to achieve specific objectives. We describe the development of a reinforcement learning algorithm to learn optimal environmental control strategies to steer a microbial community towards a particular goal, such as reaching a specific taxonomic distribution or producing desired metabolites.
Overall, the work presented in this thesis demonstrates how microbe-microbe and microbe-environment (including microbe-host) interactions represent plastic system-level properties whose understanding can help unravel the role of microbial communities in specific diseases. Correspondingly, manipulating these interactions, e.g., by appropriately modifying environmental conditions, can serve as a promising strategy for steering communities towards desired states, including producing valuable molecular products.
December 9th
Rui Hong
Title: Building an Analytical Framework for Quality Control and Meta-Analysis of Single-Cell Data to Understand Heterogeneity in Lung Cancer Cells
Major Professor: Joshua Campbell
ABSTRACT:
Single-cell RNA sequencing (scRNA-seq) has been a powerful technique for characterizing transcriptional heterogeneity related to tumor development and disease pathogenesis. Despite the advances of the technology, there is still a lack of software to systematically and easily assess the quality and different types of artifacts present in scRNA-seq data and lack of statistical frameworks for understanding heterogeneity in the gene programs of cancer cells.
In this dissertation, I first introduced novel computational software to enhance and streamline the process of quality control for scRNA-seq data called SCTKQC. SCTK-QC is a pipeline that performs comprehensive quality control (QC) of scRNA-seq data and runs a multitude of tools to assess various types of noise present in scRNA-seq data as well as quantification of general QC metrics. These metrics are displayed in an user-friendly HTML report and the pipeline has been implemented in two cloud-based platforms.
Most scRNA-seq studies only profiled a small number of tumors and provided a narrow view of the transcriptome in tumor tissue. Next, I developed a novel framework to perform a large-scale meta-analysis of cancer cells from 12 studies with scRNA-seq data from patients with non-small-cell lung cancer (NSCLC). I discovered interpretable gene co-expression modules with celda and demonstrated that the activity of gene modules accounted for both inter- and intra-tumor heterogeneity of NSCLC samples. Furthermore, I used CaDRa to determine that the levels of some gene modules were significantly associated with combinations of underlying genetic alterations. I also show that other gene modules are associated with immune cell signatures and may be important for communication with the cancer cells and the immune microenvironment.
Finally, I presented a novel computational method to study the association between copy number variation (CNV) and gene expression at single-cell level. The diversity of CNV profile was identified in tumor subclones within each sample and I discovered cis and trans gene signatures which have expression value associated with specific somatic CNV status. This study helped us prioritize the potential cancer driver genes within each CNV region.
Collectively, this work addressed the limitation in the quality control of scRNAseq data and provided insights for understanding the heterogeneity of NSCLC samples.
2021
December 2
Emma Briars
Title: Development Of Methods To Diagnose And Predict Antibiotic Resistance Using Synthetic Biology And Computational Approaches
Major Professor: Ahmad (Mo) Khalil
AB
巴勒斯坦权力机构
November 18
Anthony Federico
Title: Development of Methods for Omics Network Inference and Analysis and Their Application to Disease Modeling
Major Professor: Stefano Monti
AB
My dissertation aims to address these challenges by presenting new approaches for high-dimensional network inference with limited samples as well as methods and tools for integrated network analysis applied to multiple research domains in cancer genomics. First, I introduce a novel method for reconstructing gene regulatory networks called SHINE (Structure Learning for Hierarchical Networks) and present an evaluation on simulated and real datasets including a Pan-Cancer analysis using The Cancer Genome Atlas (TCGA) data. Next, I summarize the challenges with executing and managing data processing workflows for large omics datasets on high performance computing environments and present multiple strategies for using Nextflow for reproducible scientific workflows including shine-nf – a collection of Nextflow modules for structure learning. Lastly, I introduce the methods, objects, and tools developed for the analysis of biological networks used throughout my dissertation work. Together – these contributions were used in focused analyses of understanding the molecular mechanisms of tumor maintenance and progression in subtype networks of Breast Cancer and Head and Neck Squamous Cell Carcinoma.
August 4
Brian Haas
Title: Bioinformatic Tool Developments with Applications to RNA-Seq Data Analysis and Clinical Cancer Research
Major Professors: Simon Kasif & Aviv Regev
AB
July 29
Tanya Karagiannis
Title: Single Cell Analysis and Methods To Characterize Peripheral Blood Immune Cell Types in Disease and Aging
Major Professors: Stefano Monti & Paola Sebastiani
ABSTRACT
In the past decade, RNA-sequencing (RNA-seq)-based genome-wide expression studies have contributed to major advances in understanding human biology and disease. However, for heterogeneous tissues such as peripheral blood, RNA-sequencing masks the expression of different populations of cells that may be important in understanding different conditions and disease progression. With the advent of single cell RNA-sequencing (scRNA-seq), it has become possible to study the gene expression of each single cell and to explore cellular heterogeneity in the context of disease and under the influence of medications or other substances. In this dissertation, I will present three projects that demonstrate how single cell sequencing methods can be used to characterize novel changes in the peripheral immune system in human disease and aging. I will also describe novel methodological approaches I created to analyze cell type composition and gene expression level changes.
First, I investigated the cell type specific changes due to opioid use in human peripheral blood. Utilizing single cell transcriptomic methods, I identified a genome-wide suppression of antiviral gene expression across immune cell types of chronic opioid users, and similarly under acute exposure to morphine.
Second, I investigated the immune cell type specific changes of gene expression and composition in the context of human aging and longevity. I developed novel approaches to measure and compare overall cell type composition between samples, and identified significant overall differences in immune cell type composition, including pro-inflammatory cell populations, between extreme longevity and younger ages. In addition, I generated cell type-specific signatures associated with longevity after accounting for age-related changes that demonstrate an upregulation in immune response and metabolic processes important in the activation of immune cells in extreme long-lived individuals compared to normally aging individuals.
Finally, I investigated whether aging of the immune system is accelerated in opioid-dependent individuals. I utilized the unique aging signatures generated in the aging project and discovered higher expression of aging signatures in specific cell types of opioid-dependent individuals, suggesting chronic opioid use causes premature aging of the immune system that may contribute to the increased susceptibility to infections in these individuals.
March 24th
Marzie Rasekh
Title: Characterizing VNTRS in Human Populations
Major Professor: Gary Benson
AB
I identified over 35,000 minisatellite VNTRs and over 4,000 macrosatellite VNTRs, most previously unknown. A small subset in each VNTR class was validated experimentally and in silico. The detected VNTRs were further studied for their effects on gene expression, ability to distinguish human populations, and functional enrichment. Unlike STRs, mini- and macrosatellite VNTRs are enriched in regions with functional importance, e.g., introns, promoters, and transcription factor binding sites. A study of VNTRs across 26 populations shows that minisatellite VNTR genotypescan be used to predict super-populations with >90% accuracy. In addition, genotypes for 195 minisatellite VNTRs and 24 macrosatellite VNTRs were shown to be associated with differential expression in nearby genes (eQTLs).
Finally, I developed a computational tool, mlZ, to infer undetected VNTR alleles and to detect false positive predictions. mlZ is applicable to other tools that use read support for predicting short variants.
Overall, these studies provide the most comprehensive analysis of mini- and macrosatellites in human populations and will facilitate the application of VNTRs for clinical purposes.
April 8th
Zhe Wang
Title: Enhancing Preprocessing and Clustering of Single-Cell RNA Sequencing Data
Major Professor: Joshua Campbell
ABSTRACT
Single-cell RNA sequencing (scRNA-seq) is the leading technique for characterizing cellular heterogeneity in biological samples. Various scRNA-seq protocols have been developed that can measure the transcriptome from thousands of cells in a single experiment. With these methods readily available, the ability to transform raw data into biological understanding of complex systems is now a rate-limiting step. In this dissertation, I introduce novel computational software and tools which enhance preprocessing and clustering of scRNA-seq data and evaluate their performance compared to existing methods.
First, I present scruff, an R/Bioconductor package that preprocesses data generated from scRNA-seq protocols including CEL-Seq or CEL-Seq2 and reports comprehensive data quality metrics and visualizations. scruff rapidly demultiplexes, aligns, and counts the reads mapped to genomic features with deduplication of unique molecular identifier (UMI) tags and provides novel and extensive functions to visualize both pre- and post-alignment data quality metrics for cells from multiple experiments.
Second, I present Celda, a novel Bayesian hierarchical model that can perform simultaneous co-clustering of genes into transcriptional modules and cells into subpopulations for scRNA-seq data. Celda identified novel cell subpopulations in a publicly available peripheral blood mononuclear cell (PBMC) dataset and outperformed a PCA-based approach for gene clustering on simulated data.
Third, I extend the application of Celda by developing a multimodal clustering method that utilizes both mRNA and protein expression information generated from single-cell sequencing datasets with multiple modalities, and demonstrate that Celda multimodal clustering captured meaningful biological patterns which are missed by transcriptome- or protein-only clustering methods.
Collectively, this work addresses limitations present in the computational analyses of scRNA-seq data by providing novel methods and solutions that enhance scRNA-seq data preprocessing and clustering.
April 8th
Ke Xu
Title: Airway Gene Expression Alterations in Association with Radiographic Abnormalities of the Lung
Major Professor: Marc Lenburg
ABSTRACT
High-resolution computed tomography (HRCT) of the chest is commonly used in the diagnosis of a variety of lung diseases. Structural changes associated with clinical characteristics of disease may also define specific disease-associated physiologic states that may provide insights into disease pathophysiology. Gene expression profiling is potentially a useful adjunct to HRCT to identify molecular correlates of the observed structural changes. However, it is difficult to directly access diseased distal airway or lung parenchyma routinely for profiling studies.
Previously, we have profiled bronchial airway in normal-appearing epithelial cells at the mainstem bronchus, detecting distinct gene expression alterations related to the clinical diagnosis of chronic obstructive pulmonary disease (COPD) and lung cancer. These gene expression alterations offer insights into the molecular events related to diseased tissue at more distal airways and in the parenchyma, which we hypothesize are due to a field-of-injury effect. Here, we expand this prior work by correlating airway gene expression to COPD and bronchiectasis phenotypes defined by HRCT to better understand the pathophysiology of these diseases. Additionally, we classified pulmonary nodules as malignant or benign by combining HRCT nodule imaging characteristics with gene expression profiling of the nasal airway.
First, we collected brushing samples from the main-stem bronchus and assessed gene expression alterations associated with COPD phenotypes defined by K-means clustering of HRCT-based imaging features. We found three imaging clusters, which correlated with incremental severity of COPD: normal, interstitial predominant, and emphysema predominant. 41 genes were differentially expressed between the normal and the emphysema predominant clusters. Functional analysis of the differentially expressed genes suggests a possible induction of inflammatory processes and repression of T-cell related biologic pathways, in the emphysema predominant cluster.
我们
Finally, we identified gene expression alterations within the nasal epithelium associated with the presence of malignant pulmonary nodules. A computational model was constructed for determining whether a nodule is malignant or benign that combines gene expression and imaging features extracted from HRCT. Leveraging data from single-cell RNA sequencing, we found genes increased in patients with lung cancer are expressed at higher levels within a novel cluster of nasal epithelial cells, termed keratinizing epithelial cells.
In summary, we leveraged gene expression profiling of the proximal airway and discovered novel biological pathways that potentially drive the structural changes representative of physiologic states defined by chest HRCT in COPD and BE. This approach may also be combined with chest HRCT to detect weak signals related to malignant pulmonary nodules.
2020
December 3rd
Tyler Faits
Title: The Evaluation, Application, and Expansion of 16S Amplicon Metagenomics
Major Professor: W. Evan Johnson
AB
I used Qiime 2 to analyze 16S data from human subjects in a controlled dietary intervention study with a focus on dietary carbohydrate quality. I correlated alterations in the gut microbiome with various cardiometabolic risk factors, and identified increases in some butyrate-producing bacteria in response to complex carbohydrates. I also constructed a metatranscriptomics pipeline to analyze paired rRNA-depleted RNAseq data.
October 14th
Alan Pacheco
Title: Environmental Modulation of Microbial Ecosystems
Major Professor: Daniel Segre
AB
Th
Future work could further quantify how microbial community phenotypes depend on each of the individual factors explored in this thesis, while also leveraging emerging knowledge on interaction mechanisms to design synthetic consortia.
August 24th
Devanshi Patel
Title: Tissue-Dependent Analysis of Common and Rare Genetic Variants for Alzheimer’s Disease Using Multi-Omics Data
Major Professor: Lindsay Farrer
AB