Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

API Reference

This section provides API documentation for all Chemeleon2 modules.

Module Index

Core Modules

ModuleDescriptionSource
src.vae_moduleVariational Autoencoder for crystal structure encodingsrc/vae_module/
src.ldm_moduleLatent Diffusion Model for generationsrc/ldm_module/
src.rl_moduleReinforcement Learning fine-tuningsrc/rl_module/

Data & Utilities

ModuleDescriptionSource
src.dataData loading and processingsrc/data/
src.utilsMetrics, featurization, and visualizationsrc/utils/

Quick Reference

Main Classes

ClassModuleKey MethodsDescription
VAEModulesrc.vae_module.vae_moduleencode(), decode(), sample(), reconstruct()Lightning module for VAE
LDMModulesrc.ldm_module.ldm_modulecalculate_loss(), sample()Lightning module for LDM
RLModulesrc.rl_module.rl_modulerollout(), compute_rewards(), calculate_loss()Lightning module for RL
DataModulesrc.data.datamodulesetup(), train_dataloader()Lightning DataModule
CrystalBatchsrc.data.schemato_atoms(), to_structure(), collate()Batch container for crystals
Metricssrc.utils.metricscompute(), to_dataframe(), to_csv()Evaluation metrics
ReinforceRewardsrc.rl_module.rewardforward(), normalize()Aggregates multiple reward components
PredictorModulesrc.vae_module.predictor_modulepredict()Property predictor in latent space

Training Scripts

ScriptDescriptionUsage
src/train_vae.pyTrain VAE modelpython src/train_vae.py experiment=mp_20/vae_dng
src/train_ldm.pyTrain LDM modelpython src/train_ldm.py experiment=mp_20/ldm_null
src/train_rl.pyTrain RL modelpython src/train_rl.py experiment=mp_20/rl_dng
src/train_predictor.pyTrain property predictorpython src/train_predictor.py experiment=alex_mp_20_bandgap/predictor
src/sample.pyGenerate structurespython src/sample.py --num_samples=1000
src/evaluate.pyEvaluate generated structurespython src/evaluate.py --structure_path=outputs/structures.json.gz

Reward Components

ComponentModuleDescription
CustomRewardsrc.rl_module.componentsUser-defined reward function (override compute())
PredictorRewardsrc.rl_module.componentsSurrogate model predictions
CreativityRewardsrc.rl_module.componentsUniqueness + Novelty with AMD fallback
EnergyRewardsrc.rl_module.componentsEnergy above hull minimization (requires mace-torch)
StructureDiversityRewardsrc.rl_module.componentsMMD-based structure diversity
CompositionDiversityRewardsrc.rl_module.componentsMMD-based composition diversity

Usage Examples

Load Pre-trained Models

from src.vae_module import VAEModule
from src.ldm_module import LDMModule

# Load from checkpoint
vae = VAEModule.load_from_checkpoint(
    "path/to/vae.ckpt",
    weights_only=False
)

# Load LDM with VAE
ldm = LDMModule.load_from_checkpoint(
    "path/to/ldm.ckpt",
    vae_ckpt_path="path/to/vae.ckpt",
    weights_only=False
)

Generate Structures (CLI)

# Generate 1000 structures using DDIM sampler
python src/sample.py \
    --num_samples=1000 \
    --batch_size=500 \
    --sampler=ddim \
    --sampling_steps=50 \
    --output_dir=outputs

# Generate for specific compositions (CSP task)
python src/sample.py \
    --num_samples=10 \
    --compositions="LiFePO4,Li2Co2O4,LiMn2O4,LiNiO2"

Generate Structures (Programmatic)

from src.ldm_module import LDMModule
from src.data.schema import create_empty_batch
import torch

# Load model
ldm = LDMModule.load_from_checkpoint(
    "path/to/ldm.ckpt",
    vae_ckpt_path="path/to/vae.ckpt",
    weights_only=False
)
ldm.eval()

# Create batch with desired number of atoms
num_atoms = torch.tensor([10, 12, 15])  # 3 structures
batch = create_empty_batch(num_atoms, device="cuda")

# Sample structures
batch_gen = ldm.sample(batch, sampling_steps=50)
structures = batch_gen.to_structure()  # Convert to pymatgen Structure

Evaluate Structures

from src.utils.metrics import Metrics
from monty.serialization import loadfn

# Load generated structures
gen_structures = loadfn("outputs/structures.json.gz")

# Create metrics object
metrics = Metrics(
    reference_dataset="mp-20",  # or "mp-all", "alex-mp-20"
    phase_diagram="mp-all",
    metastable_threshold=0.1
)

# Compute metrics
results = metrics.compute(gen_structures=gen_structures)

# Print results
print(f"Uniqueness: {results['avg_unique']:.2%}")
print(f"Novelty: {results['avg_novel']:.2%}")
print(f"Avg Energy Above Hull: {results['avg_e_above_hull']:.3f} eV/atom")

# Save to CSV
metrics.to_csv("results.csv")

Evaluate Structures (CLI)

# Evaluate from JSON file
python src/evaluate.py \
    --structure_path=outputs/structures.json.gz \
    --reference_dataset=mp-20 \
    --output_file=results.csv

# Generate and evaluate in one command
python src/evaluate.py \
    --model_path=path/to/ldm.ckpt \
    --structure_path=outputs \
    --num_samples=1000

Custom Reward Component

from src.rl_module.components import RewardComponent
import torch

class MyCustomReward(RewardComponent):
    def __init__(self, weight=1.0, **kwargs):
        super().__init__(weight=weight, **kwargs)
        # Initialize any parameters

    def compute(self, gen_structures, **kwargs):
        """Compute rewards for generated structures.

        Args:
            gen_structures: list[Structure] - Generated structures
            **kwargs: Additional arguments (batch_gen, metrics_obj, device, etc.)

        Returns:
            torch.Tensor of shape (num_structures,)
        """
        rewards = []
        for structure in gen_structures:
            # Your reward logic here
            reward = self.calculate_reward(structure)
            rewards.append(reward)

        return torch.tensor(rewards)

Featurize Structures

from src.utils.featurizer import featurize

# Featurize structures using pre-trained VAE
features = featurize(
    structures=[structure1, structure2, structure3],
    model_path=None,  # Uses default VAE from HF Hub
    batch_size=2000,
    device="cuda"
)

# Access features
structure_features = features["structure_features"]  # (N, latent_dim)
composition_features = features["composition_features"]  # (N, embed_dim)
atom_features = features["atom_features"]  # list of (num_atoms, latent_dim)

Configuration

All modules use Hydra for configuration management.

CLI Configuration

# Override config via CLI
python src/train_vae.py experiment=mp_20/vae_dng trainer.max_epochs=100

# Multiple overrides
python src/train_vae.py \
    experiment=mp_20/vae_dng \
    data.batch_size=256 \
    trainer.max_epochs=100

# Show full config
python src/train_vae.py experiment=mp_20/vae_dng --cfg job

# Show available experiments
ls configs/experiment/

Learn More