Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Architecture Overview

Chemeleon2 implements a three-stage generative pipeline for crystal structure generation, combining variational autoencoders, diffusion models, and reinforcement learning.

Pipeline Overview

Module Responsibilities

ModulePurposeKey Components
vae_moduleEncode/decode crystal structuresTransformer encoder, Transformer decoder
ldm_moduleDiffusion-based generationDiT denoiser, Gaussian diffusion
rl_moduleReward-guided fine-tuningGRPO algorithm, Reward components
dataData loading and batchingCrystalBatch, MPDataset
utilsMetrics and utilitiesMetrics, Featurizer, Visualize

Data Flow

  1. Training Data: Crystal structures from Materials Project (MP-20, Alex-MP-20)

  2. Encoding: VAE converts structures to continuous latent vectors

  3. Diffusion: LDM learns to denoise in latent space

  4. RL Optimization: GRPO maximizes reward signals from generated structures

Directory Structure

src/
├── vae_module/          # Variational Autoencoder
│   ├── vae_module.py    # Main VAE Lightning module
│   ├── encoders/        # Encoder architectures
│   └── decoders/        # Decoder architectures
├── ldm_module/          # Latent Diffusion Model
│   ├── ldm_module.py    # Main LDM Lightning module
│   ├── denoisers/       # DiT denoiser
│   └── diffusion/       # Diffusion utilities
├── rl_module/           # Reinforcement Learning
│   ├── rl_module.py     # Main RL Lightning module
│   ├── reward.py        # Reward aggregation
│   └── components.py    # Reward components
├── data/                # Data loading
│   ├── datamodule.py    # Lightning DataModule
│   └── schema.py        # CrystalBatch definition
└── utils/               # Utilities
    ├── metrics.py       # Evaluation metrics
    └── featurizer.py    # Structure featurization

Learn More