Evaluation Guide - Chemeleon2 Documentation

This guide covers evaluating generated crystal structures against reference datasets for Available Metrics.

Prerequisites¶

Before running evaluation metrics, download and extract the reference dataset files containing structure embeddings, composition features, and phase diagram data.

Download from Figshare¶

You can download directly from the web:

Download benchmarks_mp_20.tar.gz from Figshare

Or use the command line (from project root):

# Download the reference dataset
curl -L -A "Mozilla/5.0" -o benchmarks_mp_20.tar.gz https://figshare.com/ndownloader/files/59462369

# Extract the dataset
tar -zxvf benchmarks_mp_20.tar.gz

This will create the following directory structure:

benchmarks/
└── assets/
    ├── mp_20_all_composition_features.pt              # VAE composition embeddings for diversity metrics
    ├── mp_20_all_structure_features.pt                # VAE structure embeddings for diversity metrics
    ├── mp_20_all_structure.json.gz                    # MP-20 reference structures for novelty checking
    ├── mp_all_unique_structure_250416.json.gz         # All MP unique structures for novelty checking
    └── ppd-mp_all_entries_uncorrected_250409.pkl.gz   # Phase diagram data for energy above hull

These files contain the reference data required for computing evaluation metrics against the MP-20 dataset.

Generate Samples¶

Generate crystal structures using a pre-trained LDM model. (Default model is trained on alex-mp-20 dataset.)

# Generate 10000 samples with 2000 batch size using DDIM sampler
python src/sample.py --num_samples=10000 --batch_size=2000 --output_dir=outputs/samples

Evaluate Models¶

Evaluate generated structures against reference datasets (i.e., MP-20) to assess quality and diversity.

Generate and Evaluate Together¶

Generate new structures and evaluate them in one command:

python src/evaluate.py \
    --model_path=ckpts/mp_20/ldm/ldm_null.ckpt \
    --structure_path=outputs/eval_samples \
    --reference_dataset=mp-20 \
    --num_samples=10000 \
    --batch_size=2000

Evaluate Pre-generated Structures¶

If you already have generated structures:

python src/evaluate.py \
    --structure_path=outputs/dng_samples \
    --reference_dataset=mp-20 \
    --output_file=benchmark/results/my_results.csv

Evaluation Metrics¶

The evaluation script computes several metrics to assess generation quality:

For detailed implementation, see src/utils/metrics.py.

Python API Usage¶

You can also compute metrics using the Python API directly:

from monty.serialization import loadfn
from src.utils.metrics import Metrics

# Load generated structures
gen_structures = loadfn("outputs/eval_samples/structures.json.gz")

# Create metrics object
metrics = Metrics(
    metrics=["unique", "novel", "e_above_hull", "composition_validity"],
    reference_dataset="mp-20",
    phase_diagram="mp-all",
    metastable_threshold=0.1,
    progress_bar=True,
)

# Compute metrics
results = metrics.compute(gen_structures=gen_structures)

# Save results
metrics.to_csv("outputs/results.csv")

# Or get as DataFrame
df = metrics.to_dataframe()
print(df.head())

Reference Datasets¶

Available reference datasets:

mp-20: Materials Project structures with ≤20 atoms
alex-mp-20: Alexandria MP structures with ≤20 atoms

Results are saved to the specified output file in CSV format for further analysis.

Benchmarks for Chemeleon2 DNG¶

Pre-computed benchmark results for de novo generation (DNG) are available in the benchmarks/dng/ directory:

MP-20: benchmarks/dng/chemeleon2_rl_dng_mp_20.json.gz - 10,000 generated structures using RL-trained model on MP-20
Alex-MP-20: benchmarks/dng/chemeleon2_rl_dng_alex_mp_20.json.gz - 10,000 generated structures using RL-trained model on Alex-MP-20

Loading Benchmark Data¶

These files contain generated crystal structures in compressed JSON format:

from monty.serialization import loadfn

# Load benchmark structures
structures = loadfn("benchmarks/dng/chemeleon2_rl_dng_mp_20.json.gz")
print(f"Loaded {len(structures)} structures")