Targeting the multivariate tails in molecular optimization

Date:

Abstract: Multi-objective molecular optimization requires special attention to the multivariate tails of molecular property distributions, where desirable candidates lie and decision making is most difficult, yet these regions remain underexplored in practice. This talk highlights two tasks in model-based optimization where tails play a pronounced role: acquisition and uncertainty quantification, with the latter being a prerequisite for the former. First, I introduce a multi-objective acquisition function for Bayesian optimization that models the population distribution of molecular properties with a vine copula and scores candidate molecules by their expected multivariate rank in property space. This method leverages a theoretical connection between the extreme level sets of the population distribution and the Pareto front. Second, I extend conformal prediction to multivariate targets by estimating the joint $1-\alpha$ quantile of prediction errors with a vine copula, where $\alpha \in (0, 1)$ is a user-specified miscoverage rate (e.g., 5%). Here I show how the one-step estimator from semiparametric statistics improves the efficiency of the resulting prediction sets. Together, these results illustrate the importance of modeling the tails for reliable uncertainty quantification and downstream acquisition in multi-objective molecular design.