We show that evolutionary algorithms enhanced with LLM-based molecule generation can scale for simple molecular optimization tasks in both directions: LLM size and number of optimization steps
We trained language models that directly generate the 3D structures of drug-like molecules. We also show improvements with the scale of the models.
Up to 2B parameter language models (Chemlactica and Chemma) combined with a genetic algorithm produces state-of-the-art results on most molecular optimization benchmarks.
A BART-like encoder-decoder model trained on 1.7 billion SMILES. Demonstrated competitive performance after fine-tuning on property prediction, chemical reaction prediction and retrosynthesis tasks.
We show that property predictors operating on the latent space of VAEs can improve the downstream performance on related property prediction tasks.