Targeting the multivariate tails in AI-driven molecular optimization
Invited talk and panel, The Exploration in AI Today Workshop at ICML 2025, Vancouver, Canada
Invited talk and panel, The Exploration in AI Today Workshop at ICML 2025, Vancouver, Canada
Invited talk, From Models to Molecules: AI’s Expanding Roles in Therapeutics, hosted by Novoprotein, South San Francisco, CA
Drug discovery is expensive largely because we must make decisions under uncertainty – about biological mechanisms, assay noise, and the ultimate clinical success of a molecule. I will present three complementary tools for measuring and managing these unknowns: (1) diversity-steered sampling, which identifies when large language models are guessing, making literature triage and hypothesis generation more reliable; (2) semiparametric conformal prediction, which wraps around any predictive model to deliver calibrated predictions for many correlated assay read-outs; and (3) multi-objective Bayesian optimization with multivariate ranks, which translates these calibrated predictions into action by balancing potency, selectivity, and manufacturability along a principled Pareto frontier. Together, these methods can accelerate and de-risk hit finding, safety assessment, and lead optimization. I will close by discussing how uncertainty-aware methods can move from benchmark to bench through close collaboration between experimentalists and AI practitioners. [Event Page]
Invited talk, Molecule Maker Lab Institute Symposium 2025, Urbana, IL
LLMs are optimized for average-case behavior, whereas drug design requires us to consider rare, extreme combinations of molecular properties. I present two recent projects: a novel multi-objective acquisition function for Bayesian optimization and multi-target conformal calibration. Both projects use nonparametric vine copulas to model flexible tail dependence, which gives us the structure we need to explore where it matters most. [Event Page] [Recording] [Slides]
Poster, Molecular Machine Learning (MOML) Conference, Cambridge, MA
MOML@MIT poster presentation. [Event Page]
Invited panel, 2024 AI Summit, South San Francisco, CA
A panel organized by Genentech CMG (Commercial, Medical, and Government Affairs) moderated by Amit Akhelikar (COO & Managing Partner, Lynx Analytics) on what it takes to gain competitive advantage as a pharma in the age of AI.
Invited talk, KOLIS Conference 2024, Stanford, CA
Invited talk, Cradle, Zurich, Switzerland
An invited talk at Cradle.
Invited talk, Neural Concept, EPFL, Lausanne, Switzerland
An invited talk at EPFL and Neural Concept.
Invited talk, 2024 SIAM Conference on Uncertainty Quantification, Trieste, Italy
Many scientific and industrial applications require the joint optimization of multiple, potentially competing objectives. Multi-objective Bayesian optimization (MOBO) is a sample-efficient framework for identifying Pareto-optimal solutions. At the heart of MOBO is the acquisition function, which determines the next candidate to evaluate by navigating the best compromises among the objectives. In this paper, we show a natural connection between non-dominated solutions and the extreme quantile of the joint cumulative distribution function (CDF). Motivated by this link, we propose the Pareto-compliant CDF indicator and the associated acquisition function, BOtied. BOtied inherits desirable invariance properties of the CDF, and an efficient implementation with copulas allows it to scale to many objectives. Our experiments on a variety of synthetic and real-world problems demonstrate that BOtied outperforms state-of-the-art MOBO acquisition functions while being computationally efficient for many objectives. [Event Page] [Slides]
Invited panel, 2024 APS CUWiP, Stanford, CA
Invited plenary panelist at the American Physical Society’s 2024 Conference for Undergraduate Women in Physics (CUWiP) at Stanford University. The goal of CUWiP is to encourage undergraduate women and underrepresented minorities to continue in physics. [Event Page]
Invited talk, KASBP-SF Symposium 2024, South San Francisco, CA
Active design of therapeutic molecules requires the joint optimization of multiple, potentially competing properties. Multi-objective Bayesian optimization (MOBO) offers a sample-efficient framework for identifying Pareto-optimal drug candidates. MOBO proceeds in cycles, a single iteration of which involves (1) sampling molecules from a combinatorially vast design space, (2) inferring multiple properties of interest, and (3) selecting the most promising subset for wet-lab evaluation. In this talk, I highlight the importance of modeling the tails – extreme, low-probability events –- in biological applications and propose algorithms designed to accommodate complex tail behavior in each of these steps. Together, the algorithms enable modeling flexibility beyond that afforded by the common log-concave (e.g., Gaussian) assumption. [Event Page]