|
Professor, Penn State University
Cis-regulatory elements (CREs) are noncoding DNA segments that regulate transcription of genes residing on the same chromosome. Connecting millions of candidate CREs (cCREs) in the human genome with their target genes is critical for decoding the cis-regulatory mechanisms of gene expression and disease risk, but this remains a major and open challenge. We present Linkreg, a Bayesian variable selection framework that identifies cCRE-gene linkages from genome-wide transcriptomic and epigenomic data across diverse biosamples. Extensive benchmarking analyses show that Linkreg consistently outperforms state-of-the-art methods in simulations based on real epigenomic data from 304 human biosamples, as well as in both CRISPR perturbation and chromatin conformation experiments on human cell lines. Applying Linkreg to matched transcriptomic and epigenomic data of 31 blood and immune-related biosamples yields a high-quality genome-wide atlas of cCRE-gene linkages. Integrating this atlas with expression quantitative trait loci in whole blood and genome-wide association studies of 22 blood and immune-related traits not only demonstrates significantly stronger enrichments of biosample-relevant genetic signals than those obtained by existing methods from the same data, but also highlights putative mechanisms at GWAS loci of white blood cell trait and autoimmune disease. Overall, Linkreg provides an interpretable and efficient solution for the genome-wide identification of biosample-specific cCRE-gene linkages.
University of Massachusetts, Amherst
Hidden confounding biases hinder identifying causal protein biomarkers for Alzheimer’s disease in non-randomized studies. While Mendelian randomization (MR) can mitigate these biases using protein quantitative trait loci (pQTLs) as instrumental variables, some pQTLs violate core assumptions, leading to biased conclusions. To address this, we propose MR-SPI, a novel MR method that selects valid pQTL instruments using Leo Tolstoy’s Anna Karenina principle and performs robust post-selection inference. Integrating MR-SPI with AlphaFold3, we developed a computational pipeline to identify causal protein biomarkers and predict 3D structural changes. Applied to genome-wide proteomics data from 54,306 UK Biobank participants and 455,258 subjects (71,880 cases and 383,378 controls) for a genome-wide association study of Alzheimer’s disease, we identified seven proteins (TREM2, PILRB, PILRA, EPHA1, CD33, RET, and CD55) with structural alterations due to missense mutations. These findings offer insights into the etiology and potential drug targets for Alzheimer’s disease.
executive director at AstraZeneca
Circulating tumor DNA (ctDNA) is becoming central to oncology R&D—enabling patient selection, minimal residual disease (MRD) detection, on-treatment response monitoring, and resistance profiling. A key principle is fit-for-purpose assay selection: different clinical questions demand different ctDNA characteristics (sensitivity, breadth, turnaround time, logistics, and cost). Broadly, assays fall into tumor-naïve and tumor-informed categories. Tumor-naïve panels offer simpler workflows and faster turnaround but lower sensitivity, whereas tumor-informed assays achieve higher sensitivity at the expense of more complex tissue requirements, longer timelines, and higher costs. For mutation-based assays, clonal hematopoiesis (CHIP) can confound results unless white-blood-cell sequencing or robust bioinformatic filters are employed. Clinical signals are emerging across settings. In neoadjuvant immunotherapy for NSCLC, early ctDNA clearance correlates with pathologic complete response (pCR) and improved event-free survival (EFS), supporting MRD as a pharmacodynamic and prognostic readout. Post-surgery MRD status can guide escalation/de-escalation in resectable disease, and in metastatic NSCLC, a ≥50% ctDNA drop (“molecular response”) associates with better outcomes. To translate these insights into routine decision-making, the field needs harmonization: standardized pre-analytics, CHIP handling, common definitions for MRD and molecular response, cross-platform QC, and transparent statistical plans aligned with regulatory expectations.
Assistant Professor, Department of Biostatistics at Columbia University
Hidden confounding biases hinder identifying causal protein biomarkers for Alzheimer’s disease in non-randomized studies. While Mendelian randomization (MR) can mitigate these biases using protein quantitative trait loci (pQTLs) as instrumental variables, some pQTLs violate core assumptions, leading to biased conclusions. To address this, we propose MR-SPI, a novel MR method that selects valid pQTL instruments using Leo Tolstoy’s Anna Karenina principle and performs robust post-selection inference. Integrating MR-SPI with AlphaFold3, we developed a computational pipeline to identify causal protein biomarkers and predict 3D structural changes. Applied to genome-wide proteomics data from 54,306 UK Biobank participants and 455,258 subjects (71,880 cases and 383,378 controls) for a genome-wide association study of Alzheimer’s disease, we identified seven proteins (TREM2, PILRB, PILRA, EPHA1, CD33, RET, and CD55) with structural alterations due to missense mutations. These findings offer insights into the etiology and potential drug targets for Alzheimer’s disease.
University of Connecticut
As tensor-valued data become increasingly common in time series analysis, there is a growing need for flexible and interpretable models that can handle high-dimensional predictors and responses across multiple modes. We propose a unified framework for high-dimensional tensor stochastic regression based on CANDECOMP/PARAFAC (CP) decomposition, which encompasses vector, matrix, and tensor responses and predictors as special cases. Tensor autoregression naturally arises as a special case within this framework. By leveraging CP decomposition, the proposed models interpret the interactive roles of any two distinct tensor modes, enabling dynamic modeling of input-output mechanisms. We develop both CP low-rank and sparse CP low-rank estimators, establish their non-asymptotic error bounds, and propose an efficient alternating minimization algorithm for estimation. Simulation studies confirm the theoretical properties and demonstrate the computational advantage. Applications to mixed-frequency macroeconomic data and spatio-temporal air pollution data reveal interpretable low-dimensional structures and meaningful dynamic dependencies. We will also discuss possible applications of the method to medical/neuroimaging at the end of the talk.
Henry Ledyard Goddard University Professor of Biostatistic, Brown University
Abstract:
Associate Professor of Psychiatry, Biostatistics and Intelligent Systems, University of Pittsburgh
Multisite studies offer several advantages including increased statistical power and enabling the generalization of research outcomes; however, data harmonization and standardization across different clinical domains including Positron Emission Tomography (PET) imaging and blood biomarkers continue to hinder our ability to accurately estimate differences across different clinical groups. In this study we present different harmonizations methods for PET imaging and blood biomarkers outcomes in Alzheimer’s Disease studies and show variability in effect sizes across different groups before and after harmonization methods.
Senior Statistical Reviewer in the Office of Biostatistics (OB) at the Center for Drug Evaluation and Research, Food and Drug Administration
This talk will provide a brief overview of the evolving regulatory landscape for artificial intelligence and machine learning (AI/ML) in the development of drugs and biologics, with a focus on initiatives led by the U.S. Food and Drug Administration (FDA). It will discuss the use of AI/ML for estimating and inferring causal effects, including key regulatory challenges and considerations as well as emerging opportunities. This talk aims to contribute to the ongoing dialogue on AI/ML for medical product development by providing insights into regulatory thinking and methodological strategies for robust causal inference, while emphasizing the need for continued collaboration between stakeholders to ensure responsible and transparent AI/ML use in this field.
Professor, Stanford University
Hybrid clinical trials, which integrate real-world data (RWD) from sources such as patient registries, claims databases, and electronic health records (EHRs) to enhance randomized clinical trials, are gaining significant attention. In their forthcoming study, Xu et al. (BRS, 2025) propose an advancement to the two-step design introduced by Yuan et al. (2019), focusing on effective type-I error control. This talk will provide an overview of the newly developed two-step hybrid design, highlighting its enhancements for tighter control over type-I error rates. Additionally, I will discuss methods and algorithms for the optimal selection of design parameters aimed at minimizing sample size, maximizing statistical power, or achieving a favorable treatment-to-control ratio. This research is a collaborative effort by Jiapeng Xu (SU), Arlina Shen (SU), Ruben van Eijk (UMUC), and Lu Tian (SU).
Medical Affairs and HTA Statistics at CSL
TBD
Executive Director, Statistics Group Head of Non-malignant Hematology, Pfizer
The rapid advancement of computational technologies has ushered in a transformative era for drug development, where in silico methods are increasingly pivotal. This presentation delves into the relevance and importance of computer-simulated approaches in pharmaceutical research, particularly focusing on their application in clinical trials. In silico methods offer a powerful alternative to traditional experimental approaches, enabling researchers to simulate complex biological processes and predict drug behavior in virtual environments. Such methods are invaluable in streamlining the drug development pipeline, reducing time and costs, and enhancing safety protocols. A prime example of in silico applications in clinical trials is the simulation of pharmacokinetic and pharmacodynamic profiles, which helps in optimizing dosing regimens and predicting adverse reactions. Moreover, these techniques facilitate the identification of potential biomarkers and the assessment of drug efficacy and toxicity across diverse patient cohorts. This presentation will introduce a compelling case study that employs in silico methodologies to address organ impairment issues. By generating virtual cohorts of healthy participants based on existing data from prior studies, we can circumvent the ethical concerns associated with exposing real participants to experimental treatments. This approach not only minimizes potential harm to healthy individuals but also presents significant cost-saving opportunities for sponsors, thereby enhancing the overall efficiency of the trial process. The case study will elucidate the practical implementation and benefits of in silico methods in drug development, underscoring their role in fostering more ethical, economical, and scientifically robust clinical trials. As the pharmaceutical industry continues to evolve, embracing these innovative techniques will be crucial for addressing complex challenges and driving the future of medicine.
Professor, School of Medicine, University of Alabama at Birmingham
Professor, Yale University
In this lecture, Dr. Deng will highlight several exciting applications of AI and digital twins for precision medicine and discuss some of the major challenges in implementing AI and digital twins for healthcare and medicine in the coming years.
Professor, University of Massachusetts Lowell
Large language models (LLMs) have shown promising capabilities across diverse domains, yet their application to complex clinical prediction tasks remains limited. In this study, we present CARE-AD (Collaborative Analysis & Risk Evaluation for Alzheimer’s Disease), a multi-agent LLM-based framework for forecasting Alzheimer’s disease (AD) onset by analyzing longitudinal electronic health record (EHR) notes. CARE-AD assigns specialized LLM agents to extract signs and symptoms relevant to AD and conduct domain-specific evaluations—emulating a collaborative diagnostic process. In retrospective evaluation, CARE-AD achieved higher accuracy (0.53 vs. 0.26–0.45) than baseline single-model approaches in predicting AD risk 10 years prior to the first recorded diagnosis code. These findings highlight the feasibility of using multi-agent LLM systems to support early risk assessment for AD and motivate further research into their integration in clinical decision support workflows.
Executive Director, Head Statistical and Quantitative Sciences, Neuroscience & Chief Statistical Office at Takeda Pharmaceuticals
Traditional clinical trial monitoring, reliant on labor-intensive site visits and manual data review via Electronic Data Capture systems, is both time-consuming and resource-intensive. The emergence of risk-based monitoring (RBM) and quality tolerance limit (QTL) frameworks offers a more efficient alternative by proactively identifying systematic risks to patient safety and data integrity. In this paper, we propose an advanced machine learning (ML) approach to enable real-time, automated QTL risk assessment. The QTL-ML framework integrates multi-domain clinical data to predict diverse QTLs at the program, study, site, and patient levels, bypassing limitations of static thresholds and single-source data dependency. Our assumption-free approach leverages dynamically accumulating trial data to detect and mitigate risks in an automated manner. Embedded within ICH-E6(R2) RBM principles, this innovative solution enhances patient safety, reduces trial durations, and curbs costs. Furthermore, we introduce an extension leveraging a deep learning framework, incorporating hierarchical anomaly detection and temporal analysis to enhance accuracy and scalability across clinical trial settings. Together, these methodologies hold transformative potential for the efficacy and sustainability of clinical trial monitoring.
Senior Director, Statistical Scientist at Novartis
Drug development for rare diseases is often stalled by data scarcity and prohibitive costs, leaving millions of patients with few therapeutic options. We propose a systematic framework where Large Language Models and other AI/ML techniques analyze heterogeneous real-world data to de-risk development by identifying high-potential drug repurposing candidates. We argue that success requires an iterative, collaborative ecosystem—integrating explainable AI and novel clinical trial designs—to effectively bridge the gap from data-driven insights to tangible patient benefit.
Associate Director in Biostatistics, Daiichi Sankyo
Writing statistical analysis plans (SAPs) is often time-consuming, particularly for junior statisticians new to the pharmaceutical industry. As a standard internal practice, a well-developed SAP draft must be finalized before first subject in. Delays in this process can hinder the timely integration of statistical strategy into study design and execution. Furthermore, despite having a standard SAP template, inconsistencies in practice still occur—potentially compromising document quality and creating misinterpretations during collaboration with programming teams. With increasing demands to accelerate development while reducing cost and timelines, a smarter, faster, and more consistent approach is urgently needed. We present a retrieval-augmented generation (RAG) powered large language model (LLM) system that automatically generates a protocol-specific SAP. To address the challenge of LLM hallucinations, we explored multiple optimization strategies within the RAG architecture to ensure stable and reliable output. Our approach leverages semantic search, vector embeddings, and domain-tuned retrieval to extract protocol-specific details, which are then integrated into a pre-aligned SAP template through the LLM. The resulting system delivers measurable improvements in efficiency, cost savings, and compliance—while maintaining scientific rigor through human-in-the-loop interactive review.
DahShu 2025 ContactFor all general questions about the symposium, including program details, registration, and logistics: Email: dahshu2025@gmail.com | Our Social Networks |