Short Courses

Home
DSDSS2025
Agenda & Program
Scientific Sessions

Plenary Talks

Scientific Sessions

Scientific Session 1
AI and Genetics, Computational Biology, Biomedical Studies

Scientific Session 2
Medical Imaging, Neuroimaging & Dependent Data

Scientific Session 3
Innovative Clinical Design & RWE

Scientific Session 4
Machine learning & AI in Healthcare

Scientific Session 5
AI for Drug Development

Scientific Session 1

Title: Linkreg: A Statistical Framework for Linking Genome-wide Candidate Cis-Regulatory Elements to Target Genes

Speaker: Dr. Qunhua Li

Professor, Penn State University

Abstract: Cis-regulatory elements (CREs) are noncoding DNA segments that regulate transcription of genes residing on the same chromosome. Connecting millions of candidate CREs (cCREs) in the human genome with their target genes is critical for decoding the cis-regulatory mechanisms of gene expression and disease risk, but this remains a major and open challenge. We present Linkreg, a Bayesian variable selection framework that identifies cCRE-gene linkages from genome-wide transcriptomic and epigenomic data across diverse biosamples. Extensive benchmarking analyses show that Linkreg consistently outperforms state-of-the-art methods in simulations based on real epigenomic data from 304 human biosamples, as well as in both CRISPR perturbation and chromatin conformation experiments on human cell lines. Applying Linkreg to matched transcriptomic and epigenomic data of 31 blood and immune-related biosamples yields a high-quality genome-wide atlas of cCRE-gene linkages. Integrating this atlas with expression quantitative trait loci in whole blood and genome-wide association studies of 22 blood and immune-related traits not only demonstrates significantly stronger enrichments of biosample-relevant genetic signals than those obtained by existing methods from the same data, but also highlights putative mechanisms at GWAS loci of white blood cell trait and autoimmune disease. Overall, Linkreg provides an interpretable and efficient solution for the genome-wide identification of biosample-specific cCRE-gene linkages.

Title: Chromosome translocation and breakpoint detection in cancer cells using Hi-C

Speaker: Dr. Zhengqing Ouyang

Associate Professor, University of Massachusetts, Amherst

Abstract: Recent Hi-C technology enables more comprehensive chromosomal conformation research, including the detection of structural variations, especially translocation. In this paper, we formulate the interchromosomal translocation detection as a problem of scan clustering in a spatial point process. We then develop TranScan, a new translocation detection method through scan statistics with the control of false discovery. Evaluation of TranScan against current translocation detection methods on realistic breakpoint simulations generated from real data suggests better discriminative power. Both the simulation and real data analysis indicate that TranScan has great potentials in interchromosomal translocation detection using Hi-C data.

Title: Application of ctDNA in Cancer Drug Development: Opportunities and Challenges

Speaker: Dr. Zhongwu Lai

Executive Director, AstraZeneca

Abstract: Circulating tumor DNA (ctDNA) is becoming central to oncology R&D—enabling patient selection, minimal residual disease (MRD) detection, on-treatment response monitoring, and resistance profiling. A key principle is fit-for-purpose assay selection: different clinical questions demand different ctDNA characteristics (sensitivity, breadth, turnaround time, logistics, and cost). Broadly, assays fall into tumor-naïve and tumor-informed categories. Tumor-naïve panels offer simpler workflows and faster turnaround but lower sensitivity, whereas tumor-informed assays achieve higher sensitivity at the expense of more complex tissue requirements, longer timelines, and higher costs. For mutation-based assays, clonal hematopoiesis (CHIP) can confound results unless white-blood-cell sequencing or robust bioinformatic filters are employed. Clinical signals are emerging across settings. In neoadjuvant immunotherapy for NSCLC, early ctDNA clearance correlates with pathologic complete response (pCR) and improved event-free survival (EFS), supporting MRD as a pharmacodynamic and prognostic readout. Post-surgery MRD status can guide escalation/de-escalation in resectable disease, and in metastatic NSCLC, a ≥50% ctDNA drop (“molecular response”) associates with better outcomes. To translate these insights into routine decision-making, the field needs harmonization: standardized pre-analytics, CHIP handling, common definitions for MRD and molecular response, cross-platform QC, and transparent statistical plans aligned with regulatory expectations.

Title: Deciphering Proteins in Alzheimer’s Disease: A New Mendelian Randomization Method Integrated with AlphaFold3 for 3D Structure Prediction

Speaker: Dr. Zhonghua Liu

Assistant Professor, Columbia University

Abstract: Hidden confounding biases hinder identifying causal protein biomarkers for Alzheimer’s disease in non-randomized studies. While Mendelian randomization (MR) can mitigate these biases using protein quantitative trait loci (pQTLs) as instrumental variables, some pQTLs violate core assumptions, leading to biased conclusions. To address this, we propose MR-SPI, a novel MR method that selects valid pQTL instruments using Leo Tolstoy’s Anna Karenina principle and performs robust post-selection inference. Integrating MR-SPI with AlphaFold3, we developed a computational pipeline to identify causal protein biomarkers and predict 3D structural changes. Applied to genome-wide proteomics data from 54,306 UK Biobank participants and 455,258 subjects (71,880 cases and 383,378 controls) for a genome-wide association study of Alzheimer’s disease, we identified seven proteins (TREM2, PILRB, PILRA, EPHA1, CD33, RET, and CD55) with structural alterations due to missense mutations. These findings offer insights into the etiology and potential drug targets for Alzheimer’s disease.

Scientific Session 2

Title: Tensor Stochastic Regression for High-dimensional Time Series via CP Decomposition

Speaker: Dr. Yao Zheng

Associate Professor, University of Connecticut

Abstract: As tensor-valued data become increasingly common in time series analysis, there is a growing need for flexible and interpretable models that can handle high-dimensional predictors and responses across multiple modes. We propose a unified framework for high-dimensional tensor stochastic regression based on CANDECOMP/PARAFAC (CP) decomposition, which encompasses vector, matrix, and tensor responses and predictors as special cases. Tensor autoregression naturally arises as a special case within this framework. By leveraging CP decomposition, the proposed models interpret the interactive roles of any two distinct tensor modes, enabling dynamic modeling of input-output mechanisms. We develop both CP low-rank and sparse CP low-rank estimators, establish their non-asymptotic error bounds, and propose an efficient alternating minimization algorithm for estimation. Simulation studies confirm the theoretical properties and demonstrate the computational advantage. Applications to mixed-frequency macroeconomic data and spatio-temporal air pollution data reveal interpretable low-dimensional structures and meaningful dynamic dependencies. We will also discuss possible applications of the method to medical/neuroimaging at the end of the talk.

Title: Real World Evidence in evaluation of diagnostic and therapeutic interventions for Alzheimer’s

Speaker: Dr. Constantine Gatsonis

Professor, Brown University

In this presentation we will elucidate the potential of RWE using the experience from IDEAS and New IDEAS, our prospective observational studies of the impact of amyloid PET imaging on the management and outcomes of patients with mild cognitive impairment of dementia. We will also discuss RWE in the evaluation of therapeutic interventions using the experience from ALZ-NET, a recently launched network collecting real-world clinical and imaging data from persons evaluated for or treated with FDA approved Alzheimer’s therapies.

Title: Multisite Studies Harmonization and Effect Sizes Variability in Alzheimer’s Imaging and Blood Biomarkers Outcomes

Speaker: Dr. Dana Tudorascu

Associate Professor, University of Pittsburgh

Abstract: Multisite studies offer several advantages including increased statistical power and enabling the generalization of research outcomes; however, data harmonization and standardization across different clinical domains including Positron Emission Tomography (PET) imaging and blood biomarkers continue to hinder our ability to accurately estimate differences across different clinical groups. In this study we present different harmonizations methods for PET imaging and blood biomarkers outcomes in Alzheimer’s Disease studies and show variability in effect sizes across different groups before and after harmonization methods.

Title: Longitudinal Manifold Learning for Modeling Shapes in Alzheimer's Disease

Speaker: Ani Eloyan

Associate Professor and Vice Chair of Department of Biostatistics, Brown University

Abstract: Estimation of biomarkers related to disease classification and modeling of its progression is essential for treatment development for Alzheimer’s Disease (AD). The task is more daunting for characterizing relatively rare AD subtypes such as the early-onset AD. In this talk, I will describe the Longitudinal Alzheimer’s Disease Study (LEADS) intending to collect and publicly distribute clinical, imaging, genetic, and other types of data from people with EOAD, as well as cognitively normal (CN) controls and people with early-onset non-amyloid positive (EOnonAD) dementias. I will discuss manifold estimation methods for estimation of surfaces of shapes in the brain using data clouds, factor-analytic methods for estimation of clinical biomarkers of AD and their use for modeling differences in longitudinal trajectories of clinical deterioration between CN, EOAD, and EOnonAD groups. Finally, I will discuss our work in leveraging magnetic resonance imaging and positron emission tomography data to characterize distributions of white matter hyperintensities in people with EOAD and to obtain imaging-based biomarkers of disease trajectories of AD subtypes.

Scientific Session 3

Title: The Use of AI/ML for Drugs and Biologics Development: Evolving Regulatory Landscape and Causal Inference

Speaker: Dr. Hana Lee

Senior Statistical Reviewer in the Office of Biostatistics (OB), Center for Drug Evaluation and Research, Food and Drug Administration

Abstract: This talk will provide a brief overview of the evolving regulatory landscape for artificial intelligence and machine learning (AI/ML) in the development of drugs and biologics, with a focus on initiatives led by the U.S. Food and Drug Administration (FDA). It will discuss the use of AI/ML for estimating and inferring causal effects, including key regulatory challenges and considerations as well as emerging opportunities. This talk aims to contribute to the ongoing dialogue on AI/ML for medical product development by providing insights into regulatory thinking and methodological strategies for robust causal inference, while emphasizing the need for continued collaboration between stakeholders to ensure responsible and transparent AI/ML use in this field.

Title: On the Optimal Two-step Hybrid Design for Augmenting Randomized Trials Using Real-World Data

Speaker: Dr. Ying Lu

Professor, Stanford University

Abstract: Hybrid clinical trials, which integrate real-world data (RWD) from sources such as patient registries, claims databases, and electronic health records (EHRs) to enhance randomized clinical trials, are gaining significant attention. In their forthcoming study, Xu et al. (BRS, 2025) propose an advancement to the two-step design introduced by Yuan et al. (2019), focusing on effective type-I error control. This talk will provide an overview of the newly developed two-step hybrid design, highlighting its enhancements for tighter control over type-I error rates. Additionally, I will discuss methods and algorithms for the optimal selection of design parameters aimed at minimizing sample size, maximizing statistical power, or achieving a favorable treatment-to-control ratio. This research is a collaborative effort by Jiapeng Xu (SU), Arlina Shen (SU), Ruben van Eijk (UMUC), and Lu Tian (SU).

Title: Statistics and Beyond: Use RWE and Statistical Simulation to inform Hybrid Design of a phase 3 Clinical Program

Speaker: Dr. Bharat Ramakrishna

One innovative design in clinical development is the use of externally controlled trial design. Evidence from such trials has recently gained acceptance from regulators as pivotal in their decision-making processes, and this trend is accelerating. However, this approach is not without risks. Recent FDA draft guidance, “Considerations for the Design and Conduct of Externally Controlled Trials for Drug and Biological Products,” and the EMA reflection paper, “Establishing Efficacy Based on Single-Arm Trials Submitted as Pivotal Evidence in a Marketing Authorisation,” discuss several methodological limitations including confounding and bias. These documents also emphasize the critical importance of assessing and controlling for confounding and bias in externally controlled clinical trials. In this presentation, we will discuss important considerations in designing externally controlled clinical trials and use a case study to demonstrate how to incorporate those considerations within and beyond statistics territory and combine RWE and statistical simulation approaches to inform externally controlled design for a phase 3 clinical development program.

Title: On the Use of in Silico Methods in Drug Development with A Case Study on Organ Impairment

Speaker: Dr. Bo Huang

Executive Director, Statistics Group Head of Non-malignant Hematology, Pfizer

Abstract: The rapid advancement of computational technologies has ushered in a transformative era for drug development, where in silico methods are increasingly pivotal. This presentation delves into the relevance and importance of computer-simulated approaches in pharmaceutical research, particularly focusing on their application in clinical trials. In silico methods offer a powerful alternative to traditional experimental approaches, enabling researchers to simulate complex biological processes and predict drug behavior in virtual environments. Such methods are invaluable in streamlining the drug development pipeline, reducing time and costs, and enhancing safety protocols. A prime example of in silico applications in clinical trials is the simulation of pharmacokinetic and pharmacodynamic profiles, which helps in optimizing dosing regimens and predicting adverse reactions. Moreover, these techniques facilitate the identification of potential biomarkers and the assessment of drug efficacy and toxicity across diverse patient cohorts. This presentation will introduce a compelling case study that employs in silico methodologies to address organ impairment issues. By generating virtual cohorts of healthy participants based on existing data from prior studies, we can circumvent the ethical concerns associated with exposing real participants to experimental treatments. This approach not only minimizes potential harm to healthy individuals but also presents significant cost-saving opportunities for sponsors, thereby enhancing the overall efficiency of the trial process. The case study will elucidate the practical implementation and benefits of in silico methods in drug development, underscoring their role in fostering more ethical, economical, and scientifically robust clinical trials. As the pharmaceutical industry continues to evolve, embracing these innovative techniques will be crucial for addressing complex challenges and driving the future of medicine.

Scientific Session 4

Title: PEACE in Drug Discovery: Building Next-Gen AI Capabilities Together

Speaker: Dr. Jake Y. Chen

Professor, University of Alabama at Birmingham

Abstract: Artificial intelligence is poised to revolutionize drug discovery, yet its full potential has not been realized. The Systems Pharmacology AI Research Center (SPARC) at UAB is advancing PEACE—a vision to make drug discovery Personalized, Accelerated, and Economically accessible. This talk will introduce SPARC’s integrated approach to building next-generation AI drug discovery capabilities. We will showcase how our partner network and cloud-based infrastructure connect diverse datasets and models to uncover new therapeutic insights. Highlighted tools include GeneTerrain Knowledge Maps for harmonizing multi-omics data, PAGER and Mondrian Maps for pathway and gene set interpretation, BioRSP for revealing dynamic changes in single-cell data, and a neuro-symbolic AI platform for hERG cardiotoxicity prediction. Through case studies in oncology, neurology, and rare diseases, we will demonstrate how multi-scale digital twins and advanced analytics can accelerate translational research. The session will close with an invitation to collaborate on joint projects and competitive grants to shape the future of smart, AI-driven drug discovery.

Title: AI and Digital Twins for Precision Medicine: Opportunities and Challenges

Speaker: Dr. Jun Deng

Professor, Yale University

Abstract: In this lecture, Dr. Deng will highlight several exciting applications of AI and digital twins for precision medicine and discuss some of the major challenges in implementing AI and digital twins for healthcare and medicine in the coming years.

Title: Empowering Clinical Trials with Agentic Intelligence

Speaker: Dr. Weishen Pan

Postdoctoral Associate, Cornell Univeristy

Abstract: Randomized controlled trials (RCTs) remain the gold standard for evaluating the efficacy and safety of medical interventions. However, they are time-consuming and expensive to conduct. We propose a multi-agent framework where a team of AI agents, each focused on a specific task, collaborate to streamline workflows in clinical trials. The talk will highlight our recent work on leveraging this framework to infer and accelerate trial design by deriving real-world evidence from real-world data in the pre-trial stage. The future opportunities and challenges associated with this framework will also be discussed.

Title: Recent Progress on Generative Models for Molecular Design

Speaker: Dr. Jinbo Bi

Univeristy of Connecticut

Abstract: The discovery of new drug candidates is challenged by the vastness of chemical space and the inefficiency of traditional approaches. Generative machine learning models provide new opportunities by enabling the automated design of novel molecules. This talk will review genAI methods across three main directions: 1D SMILES-based language models, 2D graph-based generation, and diffusion models for 3D molecular structures. While SMILES and graph methods support efficient and valid molecular design, diffusion approaches improve the accuracy and stability of 3D generation. We will highlight recent progress from our group, including knowledge-infused hierarchical modeling, real-time feedback in SMILES generation, and an annealing-based diffusion framework for 3D structures. Together, these developments illustrate the potential of generative AI to accelerate drug discovery and deliver novel, synthesizable candidates.

Scientific Session 5

Title: Transforming Quality Tolerance Limits (QLTs) and Clinical Trial Monitoring with Machine Learning and Deep Learning Approaches

Speaker: Dr. Jianchang Lin

Executive Director, Head Statistical and Quantitative Sciences, Neuroscience & Chief Statistical Office, Takeda Pharmaceuticals

Abstract: Traditional clinical trial monitoring, reliant on labor-intensive site visits and manual data review via Electronic Data Capture systems, is both time-consuming and resource-intensive. The emergence of risk-based monitoring (RBM) and quality tolerance limit (QTL) frameworks offers a more efficient alternative by proactively identifying systematic risks to patient safety and data integrity. In this paper, we propose an advanced machine learning (ML) approach to enable real-time, automated QTL risk assessment. The QTL-ML framework integrates multi-domain clinical data to predict diverse QTLs at the program, study, site, and patient levels, bypassing limitations of static thresholds and single-source data dependency. Our assumption-free approach leverages dynamically accumulating trial data to detect and mitigate risks in an automated manner. Embedded within ICH-E6(R2) RBM principles, this innovative solution enhances patient safety, reduces trial durations, and curbs costs. Furthermore, we introduce an extension leveraging a deep learning framework, incorporating hierarchical anomaly detection and temporal analysis to enhance accuracy and scalability across clinical trial settings. Together, these methodologies hold transformative potential for the efficacy and sustainability of clinical trial monitoring.

Title: An LLM-Powered Framework for Drug Repurposing in Rare Diseases

Speaker: Dr. Alex Sverdlov

Senior Director, Statistical Scientist, Novartis

Abstract: Drug development for rare diseases is often stalled by data scarcity and prohibitive costs, leaving millions of patients with few therapeutic options. We propose a systematic framework where Large Language Models and other AI/ML techniques analyze heterogeneous real-world data to de-risk development by identifying high-potential drug repurposing candidates. We argue that success requires an iterative, collaborative ecosystem—integrating explainable AI and novel clinical trial designs—to effectively bridge the gap from data-driven insights to tangible patient benefit.

Title: Improve Efficiency and Consistency of Statistical Analysis Plan Preparation through RAG-based LLMs

Speaker: Dr. Zhaohua Lu

Associate Director in Biostatistics, Daiichi Sankyo

Abstract: Writing statistical analysis plans (SAPs) is often time-consuming, particularly for junior statisticians new to the pharmaceutical industry. As a standard internal practice, a well-developed SAP draft must be finalized before first subject in. Delays in this process can hinder the timely integration of statistical strategy into study design and execution. Furthermore, despite having a standard SAP template, inconsistencies in practice still occur—potentially compromising document quality and creating misinterpretations during collaboration with programming teams. With increasing demands to accelerate development while reducing cost and timelines, a smarter, faster, and more consistent approach is urgently needed. We present a retrieval-augmented generation (RAG) powered large language model (LLM) system that automatically generates a protocol-specific SAP. To address the challenge of LLM hallucinations, we explored multiple optimization strategies within the RAG architecture to ensure stable and reliable output. Our approach leverages semantic search, vector embeddings, and domain-tuned retrieval to extract protocol-specific details, which are then integrated into a pre-aligned SAP template through the LLM. The resulting system delivers measurable improvements in efficiency, cost savings, and compliance—while maintaining scientific rigor through human-in-the-loop interactive review.

Title: AI Landscape of the Pharmaceutical Industry

Speaker: Dr. Mark Chang

Founder, AGInception

Abstract: Artificial intelligence (AI) is transforming the pharmaceutical industry across the entire drug development lifecycle—from target identification and preclinical research to clinical trials, pharmacovigilance, and advanced manufacturing. This presentation provides an overview of the evolving AI landscape, highlighting investment trends, regulatory perspectives, and real-world applications. Key advances include deep learning for drug discovery, natural language processing for pharmacovigilance, and machine learning methods such as similarity-based approaches that address classical statistical paradoxes in clinical trial design. Regulatory initiatives by the FDA and EMA are shaping the adoption of AI tools, particularly in digital twins and precision medicine. Challenges remain, including data quality, privacy, interpretability, and resistance to paradigm shifts away from traditional statistics. Through case studies such as similarity-based machine learning, Procova models, and AI-driven digital twins, this talk illustrates how AI can reduce errors, improve predictive accuracy, and enable personalized medicine. The discussion concludes with broader implications of AI in healthcare and education, emphasizing critical thinking and creative analogies as essential skills for navigating the AI era.

Title: Beyond Large Language Models: Building Trustworthy AI for Drug Development with a Large-scale Biomedical Knowledge Graph

Speaker: Dr. Jinfeng Zhang

Professor, Florida State University

Abstract: Large language models (LLMs) have shown impressive fluency in biomedical text, but their limitations in factual accuracy and scientific reasoning remain major barriers for drug discovery. We present IKraph, a large-scale, high-quality biomedical knowledge graph built with our methods that won the LitCoin NLP and BioCreative Challenges. Equipped with a highly explainable method, Probabilistic Semantic Reasoning (PSR), IKraph enables robust reasoning for drug development applications such as drug repurposing. On a benchmark evaluation, IKraph significantly outperformed leading LLMs and LLM-based systems including GPT-4o, Claude, Gemini and FutureHouse underscoring the advantages of knowledge graphs in delivering reliable and explainable insights. Building on IKraph, we are developing two new applications. IKnow (Integrated Knowledge Intelligence) is a fact-checking system that validates information from PubMed articles and newly submitted manuscripts against the entire body of knowledge captured from PubMed. IDEAL (Insilicom Data Exploration And Learning) integrates IKraph with harmonized genomics data to deliver accurate, data-driven answers to biomedical questions, powered by our BioASQ 13B award-winning biomedical QA method. These efforts highlight how knowledge graphs address key bottlenecks of current LLMs, offering a scalable and trustworthy foundation for scientific discovery. Our work demonstrates the critical role of knowledge graph–driven AI in accelerating drug development and advancing biomedical research.

DahShu 2025 Contact

For all general questions about the symposium, including program details, registration, and logistics:

Email: dahshu2025@gmail.com