Chemeleon2 implements a three-stage generative pipeline for crystal structure generation, combining variational autoencoders, diffusion models, and reinforcement learning.
Pipeline Overview¶
Module Responsibilities¶
| Module | Purpose | Key Components |
|---|---|---|
vae_module | Encode/decode crystal structures | Transformer encoder, Transformer decoder |
ldm_module | Diffusion-based generation | DiT denoiser, Gaussian diffusion |
rl_module | Reward-guided fine-tuning | GRPO algorithm, Reward components |
data | Data loading and batching | CrystalBatch, MPDataset |
utils | Metrics and utilities | Metrics, Featurizer, Visualize |
Data Flow¶
Training Data: Crystal structures from Materials Project (MP-20, Alex-MP-20)
Encoding: VAE converts structures to continuous latent vectors
Diffusion: LDM learns to denoise in latent space
RL Optimization: GRPO maximizes reward signals from generated structures
Directory Structure¶
src/
├── vae_module/ # Variational Autoencoder
│ ├── vae_module.py # Main VAE Lightning module
│ ├── encoders/ # Encoder architectures
│ └── decoders/ # Decoder architectures
├── ldm_module/ # Latent Diffusion Model
│ ├── ldm_module.py # Main LDM Lightning module
│ ├── denoisers/ # DiT denoiser
│ └── diffusion/ # Diffusion utilities
├── rl_module/ # Reinforcement Learning
│ ├── rl_module.py # Main RL Lightning module
│ ├── reward.py # Reward aggregation
│ └── components.py # Reward components
├── data/ # Data loading
│ ├── datamodule.py # Lightning DataModule
│ └── schema.py # CrystalBatch definition
└── utils/ # Utilities
├── metrics.py # Evaluation metrics
└── featurizer.py # Structure featurizationLearn More¶
VAE Module - Crystal structure encoding
LDM Module - Diffusion-based generation
RL Module - Reward-guided fine-tuning
Data Pipeline - Data loading and utilities