Learn to configure the multi-objective reward used in the Chemeleon2 paper for generating novel, stable, and diverse crystal structures.
Overview¶
The DNG reward combines four complementary objectives:
| Component | Purpose |
|---|---|
| CreativityReward | Generate unique and novel structures |
| EnergyReward | Ensure thermodynamic stability |
| StructureDiversityReward | Explore varied structure prototypes |
| CompositionDiversityReward | Explore chemical composition space |
Prerequisites¶
MP-20 reference dataset - Required for evaluation metrics (novelty, diversity, etc.)
The DNG Configuration¶
Reference file: configs/custom_reward/rl_dng.yaml
# @package _global_
# GRPO for DNG on MP-20
data:
batch_size: 5
rl_module:
ldm_ckpt_path: ${hub:mp_20_ldm_base}
vae_ckpt_path: ${hub:mp_20_vae}
rl_configs:
num_group_samples: 64
group_reward_norm: true
reward_fn:
_target_: src.rl_module.reward.ReinforceReward
normalize_fn: std
eps: 1e-4
reference_dataset: mp-20
components:
- _target_: src.rl_module.components.CreativityReward
weight: 1.0
normalize_fn: null
- _target_: src.rl_module.components.EnergyReward
weight: 1.0
normalize_fn: norm
- _target_: src.rl_module.components.StructureDiversityReward
weight: 0.1
normalize_fn: norm
- _target_: src.rl_module.components.CompositionDiversityReward
weight: 1.0
normalize_fn: norm
logger:
wandb:
name: rl_dng_grpoComponent Deep Dive¶
The DNG reward uses built-in reward components provided by Chemeleon2. For a complete list of available components, see RL Module - Built-in Reward Components. These components are ready to use without additional implementation.
CreativityReward¶
Purpose: Reward structures that are both unique (not duplicated in batch) and novel (not in training set).
How it works:
for i, gen_structure in enumerate(gen_structures):
u, v = metrics_results["unique"][i], metrics_results["novel"][i]
if u and v:
r = 1.0 # Fully creative: unique AND novel
elif not u and not v:
r = 0.0 # Not creative: duplicate of existing
else:
# Edge case: use AMD distance as continuous measure
amds = structures_to_amd([gen_structure] + matching_refs, 100)
dists = amd.AMD_cdist(amds, amds)[0]
r = dists[dists > 0].min()Configuration:
weight: 1.0- Equal importance with other objectivesnormalize_fn: null- Already in [0, 1] range
EnergyReward¶
Purpose: Penalize structures with high energy above the convex hull.
How it works:
Computes formation energy using MACE-torch
Compares to Materials Project convex hull
Returns negative energy (minimization → maximization)
r_energy = torch.as_tensor(metrics_results["e_above_hull"]).float()
r_energy = r_energy.nan_to_num(nan=1.0) # Handle failed calculations
r_energy = r_energy.clamp(min=0.0, max=1.0) # Clip to reasonable range
r_energy = r_energy * -1.0 # Negative for minimizationConfiguration:
weight: 1.0- Strong emphasis on stabilitynormalize_fn: norm- Scale within batch
StructureDiversityReward¶
Purpose: Encourage diverse crystal geometries using Maximum Mean Discrepancy (MMD).
How it works:
Featurizes structures (lattice parameters, atomic positions)
Computes MMD between generated batch and reference distribution
Rewards structures that differ from existing patterns
Configuration:
weight: 0.1- Lower weight prevents over-diversificationnormalize_fn: norm- Scale within batch
Why lower weight? Too much structure diversity can lead to:
Physically unrealistic geometries
Sacrificing stability for novelty
CompositionDiversityReward¶
Purpose: Encourage exploration of chemical composition space.
How it works:
Extracts element-wise composition features
Computes MMD between generated and reference compositions
Rewards deviating from common compositions
Configuration:
weight: 1.0- Strong emphasis on chemical diversitynormalize_fn: norm- Scale within batch
Running DNG Training¶
# Standard DNG training (src/train_rl.py)
python src/train_rl.py custom_reward=rl_dng
# With custom hyperparameters
python src/train_rl.py custom_reward=rl_dng \
rl_module.rl_configs.num_group_samples=128 \
trainer.max_steps=2000
# Override checkpoint paths (e.g., use alex_mp_20 model)
python src/train_rl.py custom_reward=rl_dng \
rl_module.ldm_ckpt_path='${hub:alex_mp_20_ldm_base}' \
rl_module.vae_ckpt_path='${hub:alex_mp_20_vae}'Monitoring Training¶
In WandB, watch these metrics:
| Metric | Description |
|---|---|
train/reward | Mean reward from reward function (should increase) |
val/reward | Validation reward |
train/advantages | Normalized rewards used for policy gradient |
train/kl_div | KL divergence from reference policy |
train/entropy | Policy entropy |
train/loss | Total policy loss |
Weight Tuning Guide¶
Adjust weights based on your priorities:
| Priority | CreativityReward | EnergyReward | StructureDiversity | CompositionDiversity |
|---|---|---|---|---|
| More novelty | ↑ 1.5 | ↓ 0.5 | 0.1 | 1.0 |
| More stability | 0.5 | ↑ 2.0 | 0.1 | 0.5 |
| More diversity | 1.0 | 0.5 | ↑ 0.5 | ↑ 1.5 |
| Balanced (default) | 1.0 | 1.0 | 0.1 | 1.0 |
# Example: Prioritize stability
components:
- _target_: src.rl_module.components.CreativityReward
weight: 0.5
- _target_: src.rl_module.components.EnergyReward
weight: 2.0
normalize_fn: norm
- _target_: src.rl_module.components.StructureDiversityReward
weight: 0.1
normalize_fn: norm
- _target_: src.rl_module.components.CompositionDiversityReward
weight: 0.5
normalize_fn: normGenerating and Evaluating Samples¶
After training your DNG model, you can generate 10,000 structures and evaluate them against reference datasets to assess quality.
Generate Samples¶
Generate crystal structures using the trained RL model:
# Generate 10000 samples with batch size 2000
python src/sample.py \
--ldm_ckpt_path=logs/train_rl/runs/<your-run>/checkpoints/last.ckpt \
--num_samples=10000 \
--batch_size=2000 \
--output_dir=outputs/dng_samplesEvaluate Samples¶
The evaluation computes several quality metrics:
| Metric | Description |
|---|---|
| Unique | Structures not duplicated within generated set |
| Novel | Structures not found in reference dataset |
| E Above Hull | Energy above convex hull (stability measure) |
| Metastable/Stable | Thermodynamically viable structures |
| Composition Validity | Chemically valid compositions (via SMACT) |
| Structure Diversity | Inverse Fréchet distance for structure embeddings |
| Composition Diversity | Inverse Fréchet distance for composition embeddings |
python src/evaluate.py \
--model_path=logs/train_rl/runs/<your-run>/checkpoints/last.ckpt \
--structure_path=outputs/dng_samples \
--num_samples=10000 \
--batch_size=2000 \
--output_file=outputs/dng_samples/results.csvSummary¶
The DNG reward configuration:
Balances multiple objectives for well-rounded generation
Prevents mode collapse with diversity rewards
Ensures physical validity with energy penalty
Encourages exploration with creativity bonus
Next Steps¶
Predictor Reward - Property-targeted optimization
Atomic Density - Simple custom reward example