Learn to create a custom reward that maximizes atomic density in generated crystals.
Objective¶
Create a reward function that encourages denser crystal structures:
Higher density = more mass packed per unit volume.
Step 1: Understand the CustomReward Class¶
The CustomReward class in src/rl_module/components.py is a placeholder for user-defined logic:
class CustomReward(RewardComponent):
"""Wrapper for user-defined custom reward functions."""
def compute(self, gen_structures: list[Structure], **kwargs) -> torch.Tensor:
"""Placeholder for custom reward function."""
return torch.zeros(len(gen_structures))The compute() method receives:
gen_structures: List of pymatgenStructureobjectsAdditional kwargs like
batch_gen,device,metrics_obj
Step 2: Implement Atomic Density Reward¶
Edit src/rl_module/components.py and modify the CustomReward class:
class CustomReward(RewardComponent):
"""Atomic density reward - maximize atoms per unit volume."""
def compute(self, gen_structures: list[Structure], **kwargs) -> torch.Tensor:
"""
Compute atomic density for each structure.
Returns higher rewards for denser structures.
"""
rewards = []
for structure in gen_structures:
density = structure.density # atomic mass / volume [g/cm³]
rewards.append(density)
return torch.as_tensor(rewards)Step 3: Create Configuration File¶
See configs/custom_reward/atomic_density.yaml:
# @package _global_
# RL Custom Reward Experiment Configuration
data:
data_dir: ${paths.data_dir}/mp-20
batch_size: 5
trainer:
max_steps: 200
rl_module:
ldm_ckpt_path: ${hub:alex_mp_20_ldm_base}
vae_ckpt_path: ${hub:alex_mp_20_vae}
rl_configs:
num_group_samples: 64
group_reward_norm: true
reward_fn:
normalize_fn: std
components:
- _target_: custom_reward.atomic_density.AtomicDensityReward
logger:
wandb:
name: rl_custom_rewardStep 4: Run Training¶
python src/train_rl.py custom_reward=atomic_densityTraining script: src/train_rl.py
Step 5: Monitor Training¶
In WandB, watch these metrics:
| Metric | Description |
|---|---|
train/reward | Mean reward from reward function (should increase) |
val/reward | Validation reward |
train/advantages | Normalized rewards used for policy gradient |
train/kl_div | KL divergence from reference policy |
train/entropy | Policy entropy |
train/loss | Total policy loss |
As training progresses, the model should generate increasingly dense structures.
Step 6: Evaluate Results¶
Generate Samples¶
python src/sample.py \
--ldm_ckpt_path=logs/train_rl/runs/<your-run>/checkpoints/last.ckpt \
--num_samples=10 \
--output_dir=outputs/rl_samplesAnalyze Density¶
from monty.serialization import loadfn
import numpy as np
structures = loadfn("outputs/rl_samples/generated_structures.json.gz")
densities = [s.density for s in structures]
print(f"Mean density: {np.mean(densities):.3f} g/cm³")
print(f"Max density: {np.max(densities):.3f} g/cm³")Extensions¶
Target Density¶
Instead of maximizing density, optimize toward a specific target. Create custom_reward/target_density.py:
"""Target density reward for RL training."""
import torch
from pymatgen.core import Structure
from src.rl_module.components import RewardComponent
class TargetDensityReward(RewardComponent):
"""Reward based on distance from target density."""
def __init__(self, target_density: float = 0.05, **kwargs):
super().__init__(**kwargs)
self.target_density = target_density
def compute(self, gen_structures: list[Structure], **kwargs) -> torch.Tensor:
rewards = []
for structure in gen_structures:
density = len(structure) / structure.lattice.volume
# Negative distance from target (higher = closer to target)
reward = -abs(density - self.target_density)
rewards.append(reward)
return torch.tensor(rewards, dtype=torch.float32)Create a config file (configs/custom_reward/rl_target_density.yaml):
# @package _global_
rl_module:
reward_fn:
components:
- _target_: custom_reward.target_density.TargetDensityReward
target_density: 5.0 # atoms/ųCombined with built-in Reward Components¶
Ensure dense structures are also stable by adding EnergyReward and StructureDiversityReward:
# @package _global_
rl_module:
reward_fn:
components:
- _target_: custom_reward.atomic_density.AtomicDensityReward
weight: 1.0
normalize_fn: norm
- _target_: src.rl_module.components.EnergyReward
weight: 0.5
normalize_fn: norm
- _target_: src.rl_module.components.StructureDiversityReward
weight: 0.5
normalize_fn: normThis encourages the model to generate structures that are dense, low-energy, and diverse.
Summary¶
Create your reward class in
custom_reward/folderCreate config in
configs/custom_reward/referencing your rewardRun training:
python src/train_rl.py custom_reward=your_configCombine with other components for multi-objective optimization
Next Steps¶
DNG Reward - Multi-objective optimization from the paper
Predictor Reward - Use ML models as reward