A recurring challenge in molecular design, whether for pharmaceutical, chemical, or material applications, is creating synthesizable molecules. Synthesizability assessment often requires mapping the synthesis pathway for a molecule: the sequence of chemical reactions needed to transform precursor molecules into the target product molecule. This post introduces ReaSyn, a generative model from NVIDIA designed for predicting molecular synthesis pathways that also addresses limitations in current approaches.

Why chain-of-thought reasoning matters for AI in chemistry

Large language models (LLMs) have become integral to daily life, powering applications from virtual assistants to complex problem-solving. Modern LLMs solve complex problems by generating a chain of thought (CoT), which is a series of intermediate reasoning steps that lead to a final answer. Combining CoT and test-time search methods, such as generating multiple CoT paths, are critical to the improved accuracy of recent LLMs.

Chemistry faces a similar challenge in molecular synthesis pathway prediction, where a pathway contains a series of intermediate synthesis steps. Pathway prediction is a critical step in drug, chemical, and materials development because a molecule, however promising, is only valuable if it can‌ be synthesized. ReaSyn is a novel generative framework that efficiently predicts molecular synthesis pathways. It uses a unique chain of reaction (CoR) notation, inspired by the CoT approach in LLMs, combined with a test-time search algorithm.

ReaSyn: treating synthetic pathways as CoR

A synthetic pathway follows a bottom-up tree structure: simple molecules, or building blocks (BB), are combined through chemical reactions (RXN) to produce intermediate products (INT), which in turn undergo further reactions to form increasingly complex molecules (Figure 1). This process is multi-step, with each reaction applied to reactants that may be either building blocks or intermediates. In practice, chemists deduce such pathways step-by-step, reasoning through each transformation to reach the final target molecule.

*Figure 1. CoR notation views synthetic pathways as CoT reasoning paths*

ReaSyn captures this step-by-step reasoning through its CoR notation, inspired by the CoT approach in LLMs. In CoR, an entire synthetic pathway is represented as a linear sequence where each step explicitly includes the reactants, the reaction rule, and the resulting product. Reactants and products are encoded as SMILES (strings, wrapped with special tokens marking their boundaries), while each reaction is denoted by a single reaction-class token. This representation not only mirrors how chemists think about synthesis but also enables the model to receive intermediate supervision at every step for richer learning of chemical reaction rules and more reliable multi-step pathway generation.

*Figure 3. Reinforcement learning (RL) finetuning of ReaSyn using GRPO*

Goal-directed search: guiding pathways

During generation, ReaSyn uses beam search, which maintains a pool of sequences being generated and expands them block-by-block (BB or RXN). The search enables ReaSyn to generate diverse pathways for a single input molecule, and guides the generation in a preferred direction by scoring the sequences through a reward function. In retrosynthesis planning, the reward function can be the similarity to the input molecule. In goal-directed optimization tasks, the reward function can be the desired chemical property.

*Figure 4. Goal-directed test-time search of ReaSyn*

Generating synthetic pathways with ReaSyn

ReaSyn’s synthesizable projection is highly versatile: it enables retrosynthesis planning, suggests analogs for unsynthesizable molecules, supports goal-directed molecular optimization, and facilitates synthesizable hit expansion. Below, we examine its performance on these tasks.

Retrosynthesis planning

Table 1. Retrosynthesis planning success rates (%)

Even given a vast synthesizable space, previous synthesizable molecule generation methods have struggled to cover this space extensively. ReaSyn shows a high success rate in generating synthetic pathways given synthesizable molecules, demonstrating its powerful explorability in the synthesizable chemical space.

Synthesizable goal-directed molecular optimization

Table 2. Average synthesizable optimization scores of 15 PMO molecular optimization tasks

ReaSyn can project molecules generated by an off-the-shelf molecular optimization method to perform synthesizable goal-directed optimization. Combined with Graph GA, Graph GA-ReaSyn shows higher optimization performance than previous synthesis-based methods.

Synthesizable hit expansion: exploring molecule neighborhoods

The search scheme enables ReaSyn to suggest multiple synthesizable analogs for a given target molecule by projecting them differently. ReaSyn explores the neighborhood of given molecules in synthesizable space, and can be applied to hit expansion to find diverse synthesizable analogs of hit molecules (Figure 5).

*Figure 5. Synthesizable hit expansion with ReaSyn*

Empowering drug discovery with advanced reasoning

Most generative models create molecules that aren’t synthesizable in practice. ReaSyn builds on recent reasoning advances in LLMs, equipping scientists with an effective generative tool to project small molecules into the synthesizable chemical space. With its enhanced reasoning capabilities, diversity, and versatility, ReaSyn shows promise as a means for navigating combinatorially large synthesizable chemical space in real-world drug discovery.

Find out more about ReaSyn by reading our paper on arXiv and the code is available on GitHub.

Why chain-of-thought reasoning matters for AI in chemistry

ReaSyn: treating synthetic pathways as CoR

Goal-directed search: guiding pathways

Generating synthetic pathways with ReaSyn

Retrosynthesis planning

Synthesizable goal-directed molecular optimization

Synthesizable hit expansion: exploring molecule neighborhoods

Empowering drug discovery with advanced reasoning

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签