Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Tutorial: DNG (De Novo Generation) Reward

Learn to configure the multi-objective reward used in the Chemeleon2 paper for generating novel, stable, and diverse crystal structures.

Overview

The DNG reward combines four complementary objectives:

ComponentPurpose
CreativityRewardGenerate unique and novel structures
EnergyRewardEnsure thermodynamic stability
StructureDiversityRewardExplore varied structure prototypes
CompositionDiversityRewardExplore chemical composition space

Prerequisites

The DNG Configuration

Reference file: configs/custom_reward/rl_dng.yaml

# @package _global_
# GRPO for DNG on MP-20

data:
  batch_size: 5

rl_module:
  ldm_ckpt_path: ${hub:mp_20_ldm_base}
  vae_ckpt_path: ${hub:mp_20_vae}

  rl_configs:
    num_group_samples: 64
    group_reward_norm: true

  reward_fn:
    _target_: src.rl_module.reward.ReinforceReward
    normalize_fn: std
    eps: 1e-4
    reference_dataset: mp-20
    components:
      - _target_: src.rl_module.components.CreativityReward
        weight: 1.0
        normalize_fn: null
      - _target_: src.rl_module.components.EnergyReward
        weight: 1.0
        normalize_fn: norm
      - _target_: src.rl_module.components.StructureDiversityReward
        weight: 0.1
        normalize_fn: norm
      - _target_: src.rl_module.components.CompositionDiversityReward
        weight: 1.0
        normalize_fn: norm

logger:
  wandb:
    name: rl_dng_grpo

Component Deep Dive

The DNG reward uses built-in reward components provided by Chemeleon2. For a complete list of available components, see RL Module - Built-in Reward Components. These components are ready to use without additional implementation.

CreativityReward

Purpose: Reward structures that are both unique (not duplicated in batch) and novel (not in training set).

How it works:

for i, gen_structure in enumerate(gen_structures):
    u, v = metrics_results["unique"][i], metrics_results["novel"][i]
    if u and v:
        r = 1.0  # Fully creative: unique AND novel
    elif not u and not v:
        r = 0.0  # Not creative: duplicate of existing
    else:
        # Edge case: use AMD distance as continuous measure
        amds = structures_to_amd([gen_structure] + matching_refs, 100)
        dists = amd.AMD_cdist(amds, amds)[0]
        r = dists[dists > 0].min()

Configuration:

EnergyReward

Purpose: Penalize structures with high energy above the convex hull.

How it works:

r_energy = torch.as_tensor(metrics_results["e_above_hull"]).float()
r_energy = r_energy.nan_to_num(nan=1.0)  # Handle failed calculations
r_energy = r_energy.clamp(min=0.0, max=1.0)  # Clip to reasonable range
r_energy = r_energy * -1.0  # Negative for minimization

Configuration:

StructureDiversityReward

Purpose: Encourage diverse crystal geometries using Maximum Mean Discrepancy (MMD).

How it works:

Configuration:

Why lower weight? Too much structure diversity can lead to:

CompositionDiversityReward

Purpose: Encourage exploration of chemical composition space.

How it works:

Configuration:

Running DNG Training

# Standard DNG training (src/train_rl.py)
python src/train_rl.py custom_reward=rl_dng

# With custom hyperparameters
python src/train_rl.py custom_reward=rl_dng \
    rl_module.rl_configs.num_group_samples=128 \
    trainer.max_steps=2000

# Override checkpoint paths (e.g., use alex_mp_20 model)
python src/train_rl.py custom_reward=rl_dng \
    rl_module.ldm_ckpt_path='${hub:alex_mp_20_ldm_base}' \
    rl_module.vae_ckpt_path='${hub:alex_mp_20_vae}'

Monitoring Training

In WandB, watch these metrics:

MetricDescription
train/rewardMean reward from reward function (should increase)
val/rewardValidation reward
train/advantagesNormalized rewards used for policy gradient
train/kl_divKL divergence from reference policy
train/entropyPolicy entropy
train/lossTotal policy loss

Weight Tuning Guide

Adjust weights based on your priorities:

PriorityCreativityRewardEnergyRewardStructureDiversityCompositionDiversity
More novelty↑ 1.5↓ 0.50.11.0
More stability0.5↑ 2.00.10.5
More diversity1.00.5↑ 0.5↑ 1.5
Balanced (default)1.01.00.11.0
# Example: Prioritize stability
components:
  - _target_: src.rl_module.components.CreativityReward
    weight: 0.5
  - _target_: src.rl_module.components.EnergyReward
    weight: 2.0
    normalize_fn: norm
  - _target_: src.rl_module.components.StructureDiversityReward
    weight: 0.1
    normalize_fn: norm
  - _target_: src.rl_module.components.CompositionDiversityReward
    weight: 0.5
    normalize_fn: norm

Generating and Evaluating Samples

After training your DNG model, you can generate 10,000 structures and evaluate them against reference datasets to assess quality.

Generate Samples

Generate crystal structures using the trained RL model:

# Generate 10000 samples with batch size 2000
python src/sample.py \
    --ldm_ckpt_path=logs/train_rl/runs/<your-run>/checkpoints/last.ckpt \
    --num_samples=10000 \
    --batch_size=2000 \
    --output_dir=outputs/dng_samples

Evaluate Samples

The evaluation computes several quality metrics:

MetricDescription
UniqueStructures not duplicated within generated set
NovelStructures not found in reference dataset
E Above HullEnergy above convex hull (stability measure)
Metastable/StableThermodynamically viable structures
Composition ValidityChemically valid compositions (via SMACT)
Structure DiversityInverse Fréchet distance for structure embeddings
Composition DiversityInverse Fréchet distance for composition embeddings
python src/evaluate.py \
    --model_path=logs/train_rl/runs/<your-run>/checkpoints/last.ckpt \
    --structure_path=outputs/dng_samples \
    --num_samples=10000 \
    --batch_size=2000 \
    --output_file=outputs/dng_samples/results.csv

Summary

The DNG reward configuration:

  1. Balances multiple objectives for well-rounded generation

  2. Prevents mode collapse with diversity rewards

  3. Ensures physical validity with energy penalty

  4. Encourages exploration with creativity bonus

Next Steps