LDM Training - Chemeleon2 Documentation

The Latent Diffusion Model (LDM) is the second stage of the Chemeleon2 pipeline. It learns to generate crystal structures by denoising in the VAE’s latent space.

What LDM Does¶

The LDM is the second stage of Chemeleon2 that learns to generate crystal structures by denoising in the VAE’s latent space. For architectural details, see LDM Module.

Key components (see src/ldm_module/ldm_module.py):

Diffusion Transformer (DiT): Predicts noise at each timestep
DDPM/DDIM Sampling: Iteratively denoises random noise
Conditioning: Optional guidance from composition or properties

Prerequisites¶

LDM training requires a trained VAE checkpoint. The VAE encodes crystal structures into the latent space where the LDM operates.

# In config files
ldm_module:
  vae_ckpt_path: ${hub:mp_20_vae}  # Or use local path

# In CLI
python src/train_ldm.py ldm_module.vae_ckpt_path='${hub:mp_20_vae}'

See Checkpoint Management for available checkpoints.

Quick Start¶

# Train unconditional LDM (src/train_ldm.py)
python src/train_ldm.py experiment=mp_20/ldm_null

Training script: src/train_ldm.py Example config: configs/experiment/mp_20/ldm_null.yaml

Training Modes¶

Unconditional Generation¶

Generate diverse structures without any guidance:

python src/train_ldm.py experiment=mp_20/ldm_null

Composition-Conditioned Generation¶

Guide generation with target chemical composition:

python src/train_ldm.py experiment=mp_20/ldm_composition

Property-Conditioned Generation¶

Guide generation with target property values (e.g., band gap):

python src/train_ldm.py experiment=alex_mp_20_bandgap/ldm_bandgap

Training Commands¶

Basic Training¶

# Use experiment config
python src/train_ldm.py experiment=mp_20/ldm_null

# Override checkpoint path
python src/train_ldm.py experiment=mp_20/ldm_null \
    ldm_module.vae_ckpt_path=ckpts/my_vae.ckpt

# Override training parameters
python src/train_ldm.py experiment=mp_20/ldm_null \
    trainer.max_epochs=500 \
    data.batch_size=64

Advanced: LoRA Fine-tuning¶

Fine-tune a pre-trained LDM with Low-Rank Adaptation (LoRA):

python src/train_ldm.py experiment=alex_mp_20_bandgap/ldm_bandgap_lora

LoRA enables efficient fine-tuning by only updating low-rank adapter weights instead of all model parameters. This approach:

Reduces memory usage: Only adapter weights require gradients
Faster training: Fewer parameters to update
Prevents catastrophic forgetting: Base model weights remain frozen

Use LoRA when fine-tuning a pre-trained LDM on new datasets or conditions.

Configuration¶

Key Hyperparameters¶

Parameter	Default	Description
`num_diffusion_steps`	1000	Number of diffusion timesteps
`hidden_dim`	768	DiT hidden dimension (dit_b config)
`num_layers`	12	Number of DiT layers (depth)
`num_heads`	12	Number of attention heads

Example Config Override¶

python src/train_ldm.py experiment=mp_20/ldm_null \
    ldm_module.num_diffusion_steps=500 \
    ldm_module.hidden_dim=768

Available Experiments¶

Experiment	Dataset	Condition	Description
`mp_20/ldm_null`	MP-20	None	Unconditional generation
`mp_20/ldm_composition`	MP-20	Composition	Composition-guided
`alex_mp_20_bandgap/ldm_bandgap`	Alex MP-20	Band gap	Property-guided
`alex_mp_20_bandgap/ldm_bandgap_lora`	Alex MP-20	Band gap	LoRA fine-tuning

Training Tips¶

Monitoring¶

Key metrics to watch in WandB:

train/loss: Diffusion loss (should decrease)
val/loss: Validation loss (check for overfitting)

Typical Training¶

Duration: Up to 5000 epochs (default), with early stopping after 200 epochs without improvement
Batch size: 256 (default), can be reduced to 32-128 for limited GPU memory
Learning rate: 1e-4 (default)

Next Steps¶

After training LDM:

Note the checkpoint path
Option A: Proceed to RL Training to fine-tune with rewards
Option B: Use directly for generation (see Evaluation)