The Latent Diffusion Model (LDM) is the second stage of the Chemeleon2 pipeline. It learns to generate crystal structures by denoising in the VAE’s latent space.
What LDM Does¶
The LDM is the second stage of Chemeleon2 that learns to generate crystal structures by denoising in the VAE’s latent space. For architectural details, see LDM Module.
Key components (see src/ldm_module/ldm_module.py):
Diffusion Transformer (DiT): Predicts noise at each timestep
DDPM/DDIM Sampling: Iteratively denoises random noise
Conditioning: Optional guidance from composition or properties
Prerequisites¶
LDM training requires a trained VAE checkpoint. The VAE encodes crystal structures into the latent space where the LDM operates.
# In config files
ldm_module:
vae_ckpt_path: ${hub:mp_20_vae} # Or use local path# In CLI
python src/train_ldm.py ldm_module.vae_ckpt_path='${hub:mp_20_vae}'See Checkpoint Management for available checkpoints.
Quick Start¶
# Train unconditional LDM (src/train_ldm.py)
python src/train_ldm.py experiment=mp_20/ldm_nullTraining script: src/train_ldm.py
Example config: configs/experiment/mp_20/ldm_null.yaml
Training Modes¶
Unconditional Generation¶
Generate diverse structures without any guidance:
python src/train_ldm.py experiment=mp_20/ldm_nullComposition-Conditioned Generation¶
Guide generation with target chemical composition:
python src/train_ldm.py experiment=mp_20/ldm_compositionProperty-Conditioned Generation¶
Guide generation with target property values (e.g., band gap):
python src/train_ldm.py experiment=alex_mp_20_bandgap/ldm_bandgapTraining Commands¶
Basic Training¶
# Use experiment config
python src/train_ldm.py experiment=mp_20/ldm_null
# Override checkpoint path
python src/train_ldm.py experiment=mp_20/ldm_null \
ldm_module.vae_ckpt_path=ckpts/my_vae.ckpt
# Override training parameters
python src/train_ldm.py experiment=mp_20/ldm_null \
trainer.max_epochs=500 \
data.batch_size=64Advanced: LoRA Fine-tuning¶
Fine-tune a pre-trained LDM with Low-Rank Adaptation (LoRA):
python src/train_ldm.py experiment=alex_mp_20_bandgap/ldm_bandgap_loraLoRA enables efficient fine-tuning by only updating low-rank adapter weights instead of all model parameters. This approach:
Reduces memory usage: Only adapter weights require gradients
Faster training: Fewer parameters to update
Prevents catastrophic forgetting: Base model weights remain frozen
Use LoRA when fine-tuning a pre-trained LDM on new datasets or conditions.
Configuration¶
Key Hyperparameters¶
| Parameter | Default | Description |
|---|---|---|
num_diffusion_steps | 1000 | Number of diffusion timesteps |
hidden_dim | 768 | DiT hidden dimension (dit_b config) |
num_layers | 12 | Number of DiT layers (depth) |
num_heads | 12 | Number of attention heads |
Example Config Override¶
python src/train_ldm.py experiment=mp_20/ldm_null \
ldm_module.num_diffusion_steps=500 \
ldm_module.hidden_dim=768Available Experiments¶
| Experiment | Dataset | Condition | Description |
|---|---|---|---|
mp_20/ldm_null | MP-20 | None | Unconditional generation |
mp_20/ldm_composition | MP-20 | Composition | Composition-guided |
alex_mp_20_bandgap/ldm_bandgap | Alex MP-20 | Band gap | Property-guided |
alex_mp_20_bandgap/ldm_bandgap_lora | Alex MP-20 | Band gap | LoRA fine-tuning |
Training Tips¶
Monitoring¶
Key metrics to watch in WandB:
train/loss: Diffusion loss (should decrease)val/loss: Validation loss (check for overfitting)
Typical Training¶
Duration: Up to 5000 epochs (default), with early stopping after 200 epochs without improvement
Batch size: 256 (default), can be reduced to 32-128 for limited GPU memory
Learning rate: 1e-4 (default)
Next Steps¶
After training LDM:
Note the checkpoint path
Option A: Proceed to RL Training to fine-tune with rewards
Option B: Use directly for generation (see Evaluation)