Assistant Professor, Department of Biostatistics, Yale University
Dr. Qiao Liu is an Assistant Professor in the Department of Biostatistics at Yale University. His research lies at the intersection of statistics, artificial intelligence, and computational biology, where he develops practical statistical and AI-driven tools with both theoretical and applied significance. His work leverages advances in generative AI to address high-dimensional statistical challenges, including density estimation and causal inference, with broad applications in single-cell genomics, multi-omics data integration, and genomic large language models. Dr. Liu has authored over 40 publications in leading journals and conferences, including Nature Machine Intelligence, PNAS, Genome Biology, Nucleic Acids Research, ISMB, ECCB, NeurIPS and MICCAI. His contributions have been recognized with prestigious honors, including the NIH Pathway to Independence Award.
Kenan Distinguished Professor, University of North Carolina at Chapel Hill
Dr. Hongtu Zhu is the Kenan Distinguished Professor of Biostatistics, Statistics, Radiology, Computer Science, and Genetics at the University of North Carolina at Chapel Hill. He was a DiDi Fellow and Chief Scientist of Statistics at DiDi Chuxing between 2018 and 2020 and held the Endowed Bao-Shan Jing Professorship in Diagnostic Imaging at MD Anderson Cancer Center between 2016 and 2018. He is an internationally recognized expert in statistical learning, medical image analysis, precision medicine, biostatistics, artificial intelligence, and big data analytics. He received an established investigator award from the Cancer Prevention Research Institute of Texas in 2016, the INFORMS Daniel H. Wagner Prize for Excellence in Operations Research Practice in 2019, and the COPSS 2025 Snedecor Award. He has published more than 340 papers in top journals, including Nature, Science, Cell, Nature Genetics, Nature Communication, PNAS, AOS, JASA, Biometrika, and JRSSB, as well as presenting 58+ conference papers at top conferences, including meetings for Neurips, ICLR, ICML, AAAI, and KDD. He is the coordinating editor of JASA and the editor of JASA ACS.
The rapid evolution of flexible and reusable artificial intelligence (AI) models is transforming medical science. This short course introduces Causal Generalist Medical AI (Causal GMAI)—a paradigm that integrates causal inference with generalist AI models to enhance interpretability, robustness, and generalizability in medical decision-making. Causal GMAI employs self-supervised, semi-supervised, and supervised learning on diverse multimodal datasets—including imaging, electronic health records, clinical trials, laboratory results, genomics, knowledge graphs, and medical text—to perform a wide range of tasks with minimal task-specific supervision. By embedding causal reasoning, these models go beyond prediction to infer underlying causal relationships, improving diagnostic accuracy, treatment recommendations, and personalized medicine. The course covers key technical components such as causal discovery, counterfactual reasoning, and domain adaptation, alongside real-world applications. We will also explore challenges in regulation, validation, and dataset curation to ensure clinical reliability and ethical deployment. Designed for researchers, clinicians, data scientists, and AI practitioners, this course provides a foundation for advancing the next generation of trustworthy and interpretable medical AI.
Director, AstraZeneca Oncology Statistical Innovation
Dr. D’Angelo has extensive experience with biomarkers. She has over 20 years academic and industry experience on biomarker-related statistical methods, biomarker discovery, and biomarker-driven clinical trials. She received a PhD in Biostatistics from University of Pittsburgh focusing on missing data methods. After receiving her degree, she did a postdoc in Biostatistics at University of Pittsburgh focusing on genetics and neuroimaging statistical approaches. From 2007-2013, she was faculty at Washington University Division of Biostatistics where she was involved as collaborator on Alzheimer’s and depression studies and grants incorporating biomarkers. She was a PI on a K award focused on statistical approaches for neuroimaging in dementia. Educating has been a constant component for Dr. D’Angelo where she has taught many courses as faculty in SAS, survival, and clinical trials, and has mentored many junior faculty and junior colleagues.
Currently, Gina is a Director at AstraZeneca Oncology Statistical Innovation and offers her guidance and expertise on many of the early-late phase trials including a program devoted to a novel AI biomarker in Oncology, where she participates in strategic decisions regarding patient enrichment and development of relevant biomarkers. She has designed numerous trials and consulted on statistical analyses, approaches, and design for numerous trials. Her involvement in a respiratory biomarker-driven design led to a successful noninferiority trial and novel design and successful lung cancer trial with a novel combined biomarker leading to two patents pending. As a result of collaborations, she has published multiple statistical and clinical papers on biomarkers. She is currently co-editing a book on Statistical and design consideration for biomarkers in clinical trials. Her experience spans from discovery to late phase trials, with small to high-dimensional data, across various therapeutic areas. This instructor will offer abundant and valuable information where the attendees will benefit from her extensive knowledge and hands-on experience.
Senior Director of Clinical Statistics, Bayer
Dr. Shen obtained his Ph.D. from the University of Pittsburgh, specialized in statistical genetics and genomics. Dr. Shen is currently a Senior Director of Clinical Statistics at Bayer. He plays a pivotal role in early-phase clinical development and precision medicine. Among his notable accomplishments, Dr. Shen led Bayer’s first global companion diagnostic (CDx) submissions, achieving regulatory approvals from health agencies in the United States, Japan, and Europe, as well as completing the submission process in China. Furthermore, he led the statistical development of biomarker sections in multiple global drug submission packages, ensuring that the statistical strategies for patient selection and efficacy assessment met the rigorous standards of diverse global health agencies.
In addition to his leadership and innovation in clinical statistics at Bayer—where he also serves as a science fellow focusing on novel methods such as Bayesian approaches and deep learning—Dr. Shen brings a wealth of teaching experience. From 2013 to 2016, he taught a series of courses through NIH Continuing Medical Education (CME) programs, introducing participants to statistical software (R), fundamental and advanced statistical techniques, machine learning methodologies, and the principles of experimental design. Offered both on-site and online, these courses attracted over 500 attendees, including healthcare professionals, clinical researchers, and biostatisticians, and were designed to enhance practical skill sets, promote best practices in data analysis, and foster evidence-based decision-making in clinical research.
Dr. Shen’s unique combination of hands-on clinical studies, methodological innovation, and extensive teaching background positions him ideally to guide course participants through the complexities of companion diagnostic (CDx) studies. His real-world insights and expertise ensure that attendees will gain not only a solid theoretical understanding of statistical methods but also the practical knowledge needed to apply these principles effectively in their own clinical studies.
Biomarkers play a vital role in healthcare and drug discovery, enhancing disease understanding, identifying effective compounds, targeting populations, and accelerating clinical trials. This supports the advance of precision (personalized) medicine, which customizes treatments for individuals. Traditional medicine often used a “one size fits all” approach, overlooking genetic and biological differences—biomarkers—that result in variable treatment responses. Precision medicine has grown due to advances in omics, bioinformatics, and data analytics, enabling deeper exploration of individual biological identities through biomarkers.
The era of precision medicine was marked by the FDA’s initiative in 2016 and expanded with a 2018 policy pushing the field forward. Early examples include approvals of Vizimpro (dacomitinib) for NSCLC patients with EGFR biomarker-positive tumors and Mektovi (binimetinib) for BRAF-positive metastatic melanoma. Since then, approvals for personalized medicines have risen yearly, highlighting the need for new trial designs and statistical methods tailored for biomarker-driven studies.
In clinical trials, biomarkers are critical because not all patients benefit equally from the same therapies. Precision medicine relies on biomarkers for patient enrichment—guiding drug development by identifying subpopulations most likely to respond. By categorizing patients by biomarker status, researchers improve trial outcomes. Selecting the right trial population requires robust biomarkers and optimal threshold cut-offs, especially for continuous markers. Numerous statistical challenges arise, including handling missing data, combining markers, and managing high-dimensional data. Biomarker discovery progresses rapidly, pushing statistical methods and study designs to evolve alongside scientific, technological, and regulatory progress.
Biomarkers are now central not only for trial optimization and patient stratification, but also play a key role in the regulatory landscape. To use biomarkers in clinical decision-making, agencies often require a diagnostic test to identify candidates for therapy. This demand led to companion diagnostics (CDx), which align diagnostic tools with drug development and are transforming precision therapy pathways.
CDx was introduced in 1998 when the FDA approved HercepTest for detecting HER2 in patients receiving trastuzumab (Herceptin). CDx and precision medicine have since rapidly evolved, prompting regulatory bodies—including the FDA, EMA, Japan’s PMDA, and China’s NMPA—to develop guidelines encouraging co-development and co-approval of drugs and their diagnostics. As biomarkers and patient populations become more complex, statistical challenges mount. The design, validation, and interpretation of CDx require rigorous statistical methods to demonstrate clinical utility, demanding careful study design by statisticians.
Often, limited data must guide critical decisions for advancing therapies to Phase 3 trials. More efficient, well-designed studies are essential for better-informed transitions from early to late phase research. In the pharmaceutical industry, identifying the patients most likely to benefit from a treatment is crucial. This process relies on appropriate statistical analysis and study design, supporting portfolio advancement and regulatory engagement.
This short course will focus on precision medicine and companion diagnostics. The first half will provide the audience with a concise understanding of precision medicine and the statistical and design considerations to find a predictive biomarker for patient stratification and enrichment. Key issues, evolution of the statistical field, and new statistical and design approaches around precision medicine and biomarkers will be covered so the participants will be more knowledgeable and successful in their study team contributions. The second half of the short course aims to equip clinical statisticians with a comprehensive understanding of the statistical principles and methodologies that underpin successful CDx development. Participants will gain insights into fundamental design elements, key analytical strategies, and emerging advanced methods, thereby strengthening their capacity to navigate the evolving regulatory landscape and successfully contribute to CDx study teams.
Target Audience: course is designed for clinical statisticians and data scientists, particularly those new to precision medicine and CDx studies, and for those who want a refresher. This course introduces basic to advanced statistical and design topics for precision medicine and patient enrichment. Building on the strong interest and follow-up inquiries from Dr. Shen’s session on tissue-agnostic CDx at the 2022 ASA Biopharmaceutical Section Regulatory-Industry Statistics Workshop (RISW), this course expands upon foundational elements, best practices, and cutting-edge methodologies in CDx. We anticipate that the course will be of great interest at the 2025 DahShu Data Science Symposium and beyond, helping statisticians and data scientists confidently lead and participate in biomarker-driven studies and CDx study designs that advance precision medicine.
Professor, University of Texas MD Anderson Cancer Center
Ying Yuan is Bettyann Asche Murray Distinguished Professor and Chair in the Department of Biostatistics at University of Texas MD Anderson Cancer Center. Dr. Yuan is an internationally renowned researcher in innovative Bayesian adaptive designs, including early phase trials, seamless trials, biomarker-guided trials, and basket and platform trials. The designs and software developed by Dr. Yuan’s lab (www.trialdesign.org) have been widely used in medical research institutes and pharmaceutical companies. The BOIN design, developed by Dr. Yuan’s team, is a groundbreaking oncology dose-finding design that has been recognized by the FDA as a fit-for-purpose drug development tool. Dr. Yuan was elected as the American Statistical Association Fellow, and is the leading author of two books, “Bayesian Designs for Phase I-II Clinical Trials” and “Model-Assisted Bayesian Designs for Dose Finding and Optimization,” both published by Chapman & Hall/CRC.
Associate Professor, Indiana University
Dr Yong Zang is the Indiana University School of Medicine Showalter Scholar Associate Professor in the Department of Biostatistics and Health Data Science, Indiana University. He also serves as the Co-Director of Clinical Research for the Biostatistics and Data Management Core, IU Simon Comprehensive Cancer Center. He received his Ph.D. degree in Statistics from the University of Hong Kong and finished his Postdoctoral training in The University of Texas, MD Anderson Cancer Center. His research interests are clinical trial design and health data informatics. He has published over eighty papers in peer-reviewed statistical, informatics and medical journals. His research is supported by National Institute of Health, Showalter Trust and Indiana CTSI.
In this short course, we will delve into Bayesian clinical trial designs and their implementation, with a focus on early phase trials. We will begin by providing a brief review of Bayesian inference to introduce relevant notation and concepts.
Next, we will examine phase I dose finding and optimization trial designs. Our focus will be on model-assisted designs, which offer simplicity, flexibility, and excellent operating characteristics. Given the increasing emphasis on dose optimization, we will also highlight state-of-the-art designs that support this effort. We will use real-world trial examples to illustrate these novel designs using freely available software at www.trialdesign.org,
Moving on to phase II trial design, we will introduce the Bayesian optimal phase II design and demonstrate its practical application. Additionally, we will cover biomarker-based designs, such as enrichment and biomarker-stratified designs. Finally, we will explore basket and platform trial designs.
The focus of this short course is to bridge the gap between theoretical understanding and practical application, partially based on the book "Model-Assisted Bayesian Designs for Dose Finding and Optimization” by Ying Yuan, Ruitao Lin and Jack J Lee (2022, Chapman & Hall/CRC). Together, we will work through real-world examples and case studies, allowing participants to gain hands-on experience with designing adaptive trials. By the end of the course, attendees will have a solid understanding of how to implement Bayesian clinical trial designs in their own research.
Professor, University of California, Berkeley
Mark van der Laan is the Jiann-Ping Hsu/Karl E. Peace Professor of Biostatistics and Statistics at the University of California, Berkeley. He graduated in 1993 under supervision of Richard Gill at the Utrecht University in the Netherlands. He started a position in Biostatistics in 1994 and has been at UC Berkeley since. He has made contributions to survival analysis, semiparametric statistics, multiple testing, censored data and causal inference. He also developed the targeted maximum likelihood methodology and general theory for super-learning. He is a founding editor of the Journal of Causal Inference and International Journal of Biostatistics. He has authored 4 books on targeted learning, censored data and multiple testing, authored over 400 publications, and graduated 60 Ph.D students. He received the COPSS Presidents' Award in 2005, the Mortimer Spiegelman Award in 2004, and the van Dantzig Award in 2005, among various others.
PhD student, University of California, Berkeley
Sky Qiu is a Biostatistics PhD student at UC Berkeley advised by Professor Mark van der Laan. Sky’s research lies at the intersection of machine learning and causal inference, with a particular emphasis on developing estimators within the targeted learning framework. Recently, he has been working on methods for integrating randomized controlled trials with external real-world observational data to improve efficiency.
Targeted Learning (TL) is an important field in statistics and causal inference that provides a general framework for constructing asymptotically linear, efficient plug-in estimators of estimands under realistic knowledge about the data generating process. At the core of recent advancements in this field is the Highly Adaptive Lasso (HAL), a machine learning algorithm with exceptional theoretical properties and practical performance. HAL achieves essentially dimension-free L2 convergence rates for estimating a broad class of target functions, including conditional means, densities, and treatment effects. HAL also admits pointwise asymptotic normality for the estimated function itself, enabling construction of valid pointwise confidence intervals and simultaneous confidence bands, even for non-pathwise differentiable parameters such as dose-response curves. This short course will introduce attendees to HAL and its key theoretical results, demonstrate its use for causal inference problems in R programming language, and present new developments on the horizon including adaptive and profile targeted maximum likelihood estimation (A-TMLE and P-TMLE). We present practical applications of A-TMLE for augmenting randomized controlled trials with external real-world data and improving power in subgroup analyses—pressing challenges that are increasingly central to modern drug development and regulatory decisions. The course also highlights emerging directions in Targeted Learning, such as DeepLTMLE and scalable algorithmic implementations leveraging advancements in deep learning architectures, including transformers, for analyzing complex high-dimensional longitudinal data.
DahShu 2025 ContactFor all general questions about the symposium, including program details, registration, and logistics: Email: dahshu2025@gmail.com | Our Social Networks |