Fri, Nov 11, 2022, 09:00 – 10:00AM (PDT), 12:00 – 1:00PM (EST)
Many human conditions, including cognition, are complex and depend on both genetic and environmental factors. After the completion of the Human Genome Project, genome-wide association studies have associated genetic markers such as single-nucleotide polymorphisms with many human conditions and diseases. Despite the progress, it remains difficult to identify genes and environmental factors for complex diseases - the so-called geneticist's nightmare. Furthermore, although the impact of these discoveries on human health is not shock and awe, “drugs with support from human genetic studies for related effects succeed from phase I trials to final approval twice as often as those without such evidence.” Therefore, it is important and promising, while challenging, to identify genetic variants for complex human health-related conditions.
This talk is not intended to provide a comprehensive review of massive progress of related methods and discoveries. Instead, I will focus on some of the work that many of my students assisted me in over the past several years.
The first area is the identification of super-variants. A super-variant is a set of alleles in multiple loci of human genome although unlike the loci in a gene, contributing loci to a super-variant can be anywhere in the genome. The concept of super-variant follows a common practice in genetic studies by the means of collapsing a set of variants, specifically single nucleotide polymorphisms. The novelty and challenge lie in how to find, replicate, interpret, and eventually make use of the super-variants. Our work has been mainly based on the use of tree- and forest-based methods, and a data analytic flow that we proposed in 2007, which in retrospect resembles the spirit of “deep learning” that Hinton coined in 2006.
The second area is our progress in conducting statistical inference for high dimensional and structured data objects. Such data objects not only more and more commonly appear in imaging genetic studies, but also in other areas of data science including artificial intelligence. They do not belong to a Euclidean space for which most of the statistical theory and methods such as the distribution function are developed. How do we analyze data objects in non-Euclidean spaces?
Heping Zhang, Susan Dwight Bliss Professor of Biostatistics, Professor of Child Study, and Professor of Statistics and Data Science, Yale University
Professor Zhang has published over 360 research articles and monographs in theory, methodology, and applications of statistics. He is particularly interested in biomedical research including epidemiology, genetics, child and women health, mental health, and substance use. He directs the Collaborative Center for Statistics in Science that coordinates major national research networks to understand the etiology of pregnancy outcomes and to evaluate treatment effectiveness for infertility. He is a fellow of the American Statistical Association and a fellow of the Institute of Mathematical Statistics. He was named the 2008 Myrto Lefokopoulou distinguished lecturer by Harvard School of Public Health and a 2011 Medallion Award and Lecture and the 2022 Neyman Award and Lecture by the Institute of Mathematical Statistics. Dr. Zhang was the founding Editor-in-Chief of Statistics and Its Interface and is the past coordinate Editor of the Journal of the American Statistical Association.