Fri, Jan 27, 2023, 09:00 – 10:00AM (PDT), 12:00 – 1:00PM (EST)
"AI is like nuclear energy–both promising and dangerous."
Bill Gates, 2019
Data Science is a pillar of AI and has driven most of recent cutting-edge discoveries in biomedical research and beyond. Human judgment calls are ubiquitous at every step of a data science life cycle, e.g., in problem formulation, choosing data cleaning methods, predictive algorithms and data perturbations. Such judgment calls are often responsible for the "dangers" of AI.
To mitigate these dangers, we introduce in this talk a framework based on three core principles: Predictability, Computability and Stability (PCS). The PCS framework unifies and expands on the ideas and best practices of statistics and machine learning. It emphasizes reality check through predictability and takes a full account of uncertainty sources in the whole data science life cycle including those from human judgment calls such as those in data curation/cleaning. PCS consists of a workflow and documentation and is supported by our software package veridical or v-flow. Moreover, we illustrate the usefulness of PCS in the development of iterative random forests (iRF) for predictable and stable non-linear interaction discovery (in collaboration with the Brown Lab at LBNL and Berkeley Statistics). Finally, in the pursuit of genetic drivers of a heart disease called hypertrophic cardiomyopathy (HCM) as a CZ Biohub project in collaboration with the Ashley Lab at Stanford Medical School and others, we use iRF and UK Biobank data to recommend gene-gene interaction targets for knock-down experiments. We then analyze the experimental data to show promising findings about genetic drivers for HCM.
Bin Yu is Chancellor's Distinguished Professor and Class of 1936 Second Chair
in the departments of statistics and EECS, and Center for Computational Biology at UC Berkeley. She obtained her BS Degree in Mathematics from Peking University, and MS and PhD Degrees in Statistics from UC Berkeley. She was Assistant Professor at UW-Madison, Visiting Assistant Professor at Yale University, Member of Technical Staff at Lucent Bell-Labs, and Miller Research Professor at Berkeley. She was a Visiting Faculty at MIT, ETH, Poincare Institute, Peking University, INRIA-Paris, Fields Institute at University of Toronto, Newton Institute at Cambridge University, and the Flatiron Institute in NYC. She was Chair of the Department of Statistics at UC Berkeley
She has published more than 170 publications in premier venues and these papers not only investigate a wide range of research topics from practice to algorithms and to theory, but also seek deep insights. The breadth and depth of her research experience enabled unique and novel solutions to interdisciplinary data problems in audio and image compression, network tomography, remote sensing, neuroscience, genomics, and precision medicine.. She championed collaborative research with experts in the subject knowledge and led research in statistical machine learning (e.g. boosting, sparse modeling, kernel methods, and spectral clustering) and causal inference (e.g. X-learner) through theoretical analysis and practical fast algorithms.
She is a Member of the U.S. National Academy of Sciences and of the American Academy of Arts and Sciences. She is Past President of the Institute of Mathematical Statistics (IMS), Guggenheim Fellow, Tukey Memorial Lecturer of the Bernoulli Society, Rietz Lecturer of IMS, and a COPSS E. L. Scott Prize winner.She has been selected to deliver the Wald Memorial Lectures of IMS at JSM in 2023.
She recently served on the inaugural Scientific Advisory Committee of the UK Turing Institute for Data Science and AI, and is serving on the editorial board of Proceedings of National Academy of Sciences (PNAS).