Learning Flexible Forward Trajectories for Masked Molecular Diffusion

Hyunjin Seo^{* 1}, Taewon Kim^*1, Sihyun Yu¹, Sungsoo Ahn¹

^*Equal contribution ¹KAIST AI

We propose MELD, a masked diffusion model geared for molecular generation. MELD resolves the state-clashing problem through a learnable forward trajectory, and shows robust performance in both conditional and unconditional generation.

Figure 1: Overview of MELD (Masked Element-wise Learnable Diffusion). MELD addresses the state-clashing problem in masked diffusion models by learning flexible forward trajectories with element-specific corruption rates for atoms and bonds in molecular graphs.

Abstract

Masked diffusion models (MDMs) have achieved notable progress in modeling discrete data, while their potential in molecular generation remains underexplored. In this work, we explore their potential and introduce the surprising result that naively applying standard MDMs severely degrades the performance. We identify the critical cause of this issue as a state-clashing problem—where the forward diffusion of distinct molecules collapse into a common state, resulting in a mixture of reconstruction targets that cannot be learned using typical reverse diffusion process with unimodal predictions. To mitigate this, we propose Masked Element-wise Learnable Diffusion (MELD) that orchestrates per-element corruption trajectories to avoid collision between distinct molecular graphs. This is achieved through a parameterized noise scheduling network that assigns distinct corruption rates to individual graph elements, i.e., atoms and bonds. Extensive experiments on diverse molecular benchmarks reveal that MELD markedly enhances overall generation quality compared to element-agnostic noise scheduling, increasing the chemical validity of vanilla MDMs on ZINC250K from 15% to 93%. Furthermore, it achieves state-of-the-art property alignment in conditional generation tasks.

Examination of the State-Clashing Problem

Formally, the training loss for the diffusion model can be expressed as the expected KL divergence: E_t[KL(p(g|g_t) || p_θ(g|g_t))], where p(g|g_t) ∝ p(g_t|g) p(g). Since many graphs g can map to the same intermediate state g_t, the true posterior p(g|g_t) is often highly multimodal. However, the parameterized reverse distribution p_θ(g|g_t) is unimodal by construction, as it independently predicts each node and edge: p_θ(g|g_t) = ∏_i p(x_i|g_t) ∏_i,j p(e_ij|g_t). This mismatch leads the model to spread its probability mass across multiple incompatible modes, often resulting in chemically invalid generations.
We visualize its effects when generating o-Phenylemediamene (row 1) and m-Phenylemediamene (row 2) from the same intermediate state, benzene ring. Conversely, when reconstructing 4-Amino-2-chloro-5-fluoropyrimidine when only the chlorine bond is masked, the asymmetric structure suppresses the state-clashing problem, leading to more confident predictions.

Figure 2: Prediction entropy of edges during the reverse diffusion process.

Examples Generated by MELD

The following animations showcase the molecular generation process of MELD on different molecular structures. MELD's learnable forward trajectories enable effective generation of diverse molecular graphs by avoiding the state-clashing problem that affects standard masked diffusion models. Interestingly, we observe that nodes are generated first, followed by bonds, showing a clear hierarchy during the generation process. For further details, please refer to the paper.

Figure 3: Examples of molecular generation process using MELD on different molecular structures.

Citation

@misc{seo2025learningflexibleforwardtrajectories,
    title={Learning Flexible Forward Trajectories for Masked Molecular Diffusion}, 
    author={Hyunjin Seo and Taewon Kim and Sihyun Yu and SungSoo Ahn},
    year={2025},
    eprint={2505.16790},
    archivePrefix={arXiv},
    primaryClass={cs.LG},
    url={https://arxiv.org/abs/2505.16790}
}