Computing and Data Science PhD Student Seminar Series

The Boston University PhD program is home to a wide range of students all studying various facets of data science. To help give students a friendly opportunity to practice and develop their research skills, we are launching the Computing and Data Science PhD Student Seminar Series. This series is focused on allowing doctoral students to present their research within a supportive and collaborative environment. Each seminar offers students a chance to share their findings, practice presentation skills, and receive constructive feedback from peers and faculty in a friendly, non-judgmental setting. This format not only helps students refine their work but also fosters essential communication skills that are crucial for their academic and professional careers.

BU CDS Seminar SeriesBeyond the academic benefits, the seminar series is a community-building endeavor that seeks to strengthen connections among CDS students. By creating a space for students to share their work with the public, students from various backgrounds can learn from each other's experiences and methodologies.

The seminar series, organized by students Freddy Reiber and Lingyi Xu, meets bi-weekly throughout the year on Fridays from 11 AM to noon, with lunch after the talk. Students interested in giving a talk should reach out to the organizers through email.


CANCELED: Attention-Based Deep Learning for Analysis of Pathology Images and Gene Expression Data in Lung Squamous Premalignant Lesions with Lingyi Xu

April 18, 12 PM - CDS 1646

Abstract: Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions through a histologic progression from normal to hyperplasia, metaplasia, dysplasia, carcinoma in situ and invasive carcinoma. Endobronchial biopsies obtained via various bronchoscopy techniques are formalin fixed paraffin embedded, and hematoxylin and eosin stained (H&E) to access the pathologic features and histologic grade of the tissue. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging.

Here we proposed a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images of endobronchial biopsies and bulk gene expression data from previously published studies as well as new data obtained from high-risk patients. Our framework maximizes the use of training data by allowing sample inputs with one or both data modalities. The flexibility of our framework to make predictions when a data modality is missing and its ability to integrate data from different modalities and studies is important for advancing our stratification of bronchial premalignant lesions.

Bio: Lingyi is a PhD student in Computing & 数据科学 at Boston University. Her research investigates the potential of multimodal medical data in enhancing disease diagnosis and assessment. Her current work involves applying graph models and machine learning algorithms in digital pathology and cancer genomics to advance the understanding of lung precancerous conditions.


Past Talks

Gender Inclusivity Fairness Index (GIFI): A Multilevel Framework for Evaluating Gender Diversity in Large Language Models with Zhengyang Shan

April 4, 12 PM - CDS 1646

Abstract: We introduce a comprehensive framework for assessing gender fairness in large language models (LLMs), particularly in their treatment of both binary and non-binary genders. Existing research has largely focused on binary gender distinctions, neglecting the inclusivity of non-binary identities. To address this, the authors propose a novel metric that evaluates LLMs across seven dimensions. The study conducts extensive evaluations on 15 popular LLMs, revealing significant discrepancies in their ability to fairly represent diverse gender identities.

Bio: Zhengyang is a third-year PhD student at Boston University’s Faculty of Computing and 数据科学. Her research interests lie in the evaluation, interpretability, and fairness of Large Language Models (LLMs).


Computing and Collective Action with Freddy Reiber

March 21, 12 PM - CDS 1646

Abstract: Labor unions are a critical component of ensuring dignified working conditions for laborers. However, as a byproduct of neoliberalization American labor unions have been in a free-fall in terms of membership numbers. Drawing from sociological work on organized labor, we seek to analyze how labor organizers try to bring workers together, to act collectively, through digital communication technologies. Towards that end, we interviewed ~19 labor union members, who engaged in digital worker-to-worker organizing through tools like Slack or Discord focusing on how workers utilized and interacted with each other on these digital platforms. In this talk we provide preliminary results on this study along with some early discussions around developing useful technical tools for supporting worker-to-worker organizing.

Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.


Audited Auctions: Addressing Externalities in One-sided Mechanisms with Tejovan Parker & Gabe Maayan

February 21, 12 PM - CDS 1646

Abstract: We consider the form of externalities where some agent(s) have preferences over the outcome of a mechanism, and their preferences cannot be known before the mechanism is run. Often, this problem is solved by an auctioneer inserting dummy bids to represent the externalities. However, there may be ethical, trust, or power issues with delegating the determination of one's values to a central entity. And, it is extremely unreasonable to have all agents constantly estimate and report their values for the actions of all other agents in a system. Even if it were acceptable for a central entity to estimate externalities, it is more efficient to only audit what is more likely to be harmful, rather than auditing everything.

To address this, we consider auctions where the auctioneer has the power to (randomly) audit bidders to learn their externality, and impose penalties accordingly. In this setting, the power to audit results in equivalent bidder behavior as letting the auctioneer set individualized entry fees for bidders as a function of their non-manipulable externality type, and this results in thresholds of participation as functions of externality.

This setting is motivated by a variety of practical scenarios. For example, an auctioneer might run a social or traditional media platform where bidders compete to post news or ads on user feeds. In this setting, end users can experience bidders' posts as nuisance costs, incurring negative externalities.

Our objective is to maximize total welfare, i.e. the sum of individual value and externalities. In this paper, we show how penalty functions induce thresholds of participation, and prove analytically that welfare optimal participation thresholds in the i.i.d. setting with no competition are linear. Additionally, in the setting with competition for a single item and where i.i.d. bidders may only take two discrete types, the optimal threshold is linear and behaves analogously to Myersonian revenue-maximizing reserve-prices.

To illustrate results in more complicated settings, we use simulation with computational optimization to characterize welfare increases over participation threshold functions. We collect a dataset from X (formerly Twitter) to create an empirical joint-distribution of sender and receiver value, and simulate auctions from this empirical data. We find that optimal thresholds shift welfare from producers to users and increase overall welfare in all settings. We also observe that optimal thresholds are linear even with the empirical type distributions. However, the penalty functions will not be linear in general, which makes an interesting comparison to linear contracts. Our results suggest that auditing and penalizing externalities in real-world sponsored-search and advertising auctions have the potential to create substantial increases in social welfare.

Bio: Tejovan Parker is a third-year PhD student at Boston University’s Faculty of Computing and 数据科学. Previously, he studied Mechanical and Global Engineering at the University of Colorado Boulder. He is interested in better management of social, political, and economic systems through mathematical and algorithmic methods. Tejovan began his PhD studies at BU in Fall 2022. In his first two years at CDS, he is building his expertise and looking to assist in existing research within misinformation markets.

Gabe is a third-year PhD student at Boston University’s Faculty of Computing and 数据科学. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.


The Suicidal Mind with Gabe McDonnell-Maayan

January 31, 12 PM - CDS 1646

Abstract: I present the "Suicidal Mind", a theory synthesis modeling exercise aiming to simulate a distressed mind. I will briefly cover the topics of the first talk: the challenges of, and different approaches to suicide research, relevant theories of suicide, and an overview of our model. I will then focus on various forms of model validation, including parameter sweeps, scenario recreation, GAM surface fitting, and multivariate time-series clustering.

Bio: Gabe is a third-year PhD student at Boston University’s Faculty of Computing and 数据科学. Previously, he worked on a variety of projects at the MITRE Corporation and received his Bachelor of Science in Computer Science at Rensselaer Polytechnic Institute. His research interests are in Complexity Science, Complex Systems Analysis and Modeling, and Agent-Based Modeling.


Labor Unions and Digital Democracy with Freddy Reiber

November 22, 11 AM - CDS 1646

Abstract: Labor Unions have served an important role in giving workers a voice within the economy, however, this does not mean they are without critique. Central to many union critiques is the lack of meaningful democracy within unions, or what Robert Michels calls the “Iron law of oligarchy”. In the 2000s researchers thought that information and communication technology might serve as a solution to these problems, however as empirical literature developed, it became clear that ICTs were not the silver bullet theorists had originally hoped. This talk reviews literature on both the theorizing and empirical work of labor scholars and HCI researchers as to why ICTs didn’t provide a meaningful shift in union democracy as well as some ideas for future work.

Bio: Freddy is a third-year PhD student in the Computing and Data Science department at Boston University, and advised by the fantastic Allison McDonald. His work explores how power dynamics are shifted by technology with a focus on applying human-driven methods to complex issues. Currently, his projects are on 2nd order dynamics in digital spaces within labor unions and the motivations used by cryptographers for their research.