Bayesian Optimization
Assembly Line
Leveraging AI for Sustainable Chemical Manufacturing
With the optimization of chemical processes, we’re using Bayesian optimization to learn a model of the output of the chemical processes—for example conversion of the product we want to make—as a function of all the parameters, and then we use that model to select the next set of experiments to run, always updating the model with new data to suggest experiments that are likely to maximize the conversion. When you use this type of technique, you end up reducing the number of experiments and the costs dramatically. We applied Bayesian optimization to one of our current products and the process cost went down by 60 percent in ~40 small scale reactors. We’ve since scaled that best experimental condition we found to the much larger pilot plant, and the results held up. So the final step is taking the optimized process to the Bioforge to see the full impact of our ML-driven optimization for this key reaction.
For enzymes, there are public data sets with hundreds of millions of sequences for the proteins. That has enabled an explosion of machine learning models that understand protein sequences, but there are very few and very limited data sets with sequences and function for the protein. Because of that, it’s still very hard to find good models to predict function from sequence. For us to maximize impact, I think we need a consortium with universities and governments to produce the public data sets that the field needs to develop the best models to predict functions of proteins from sequence.
SnAKe: Bayesian Optimization with Pathwise Exploration
Bayesian Optimization is a very effective tool for optimizing expensive black-box functions. Inspired by applications developing and characterizing reaction chemistry using droplet microfluidic reactors, we consider a novel setting where the expense of evaluating the function can increase significantly when making large input changes between iterations. We further assume we are working asynchronously, meaning we have to select new queries before evaluating previous experiments. This paper investigates the problem and introduces ‘Sequential Bayesian Optimization via Adaptive Connecting Samples’ (SnAKe), which provides a solution by considering large batches of queries and preemptively building optimization paths that minimize input costs. We investigate some convergence properties and empirically show that the algorithm is able to achieve regret similar to classical Bayesian Optimization algorithms in both synchronous and asynchronous settings, while reducing input costs significantly. We show the method is robust to the choice of its single hyper-parameter and provide a parameter-free alternative.