Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

Evaluation Guide

This guide covers evaluating generated crystal structures against reference datasets for Available Metrics.

Prerequisites

Before running evaluation metrics, download and extract the reference dataset files containing structure embeddings, composition features, and phase diagram data.

Download from Figshare

You can download directly from the web:

Download benchmarks_mp_20.tar.gz from Figshare

Or use the command line (from project root):

# Download the reference dataset
curl -L -A "Mozilla/5.0" -o benchmarks_mp_20.tar.gz https://figshare.com/ndownloader/files/59462369

# Extract the dataset
tar -zxvf benchmarks_mp_20.tar.gz

This will create the following directory structure:

benchmarks/
└── assets/
    ├── mp_20_all_composition_features.pt              # VAE composition embeddings for diversity metrics
    ├── mp_20_all_structure_features.pt                # VAE structure embeddings for diversity metrics
    ├── mp_20_all_structure.json.gz                    # MP-20 reference structures for novelty checking
    ├── mp_all_unique_structure_250416.json.gz         # All MP unique structures for novelty checking
    └── ppd-mp_all_entries_uncorrected_250409.pkl.gz   # Phase diagram data for energy above hull

These files contain the reference data required for computing evaluation metrics against the MP-20 dataset.

Generate Samples

Generate crystal structures using a pre-trained LDM model. (Default model is trained on alex-mp-20 dataset.)

# Generate 10000 samples with 2000 batch size using DDIM sampler
python src/sample.py --num_samples=10000 --batch_size=2000 --output_dir=outputs/samples

Evaluate Models

Evaluate generated structures against reference datasets (i.e., MP-20) to assess quality and diversity.

Generate and Evaluate Together

Generate new structures and evaluate them in one command:

python src/evaluate.py \
    --model_path=ckpts/mp_20/ldm/ldm_null.ckpt \
    --structure_path=outputs/eval_samples \
    --reference_dataset=mp-20 \
    --num_samples=10000 \
    --batch_size=2000

Evaluate Pre-generated Structures

If you already have generated structures:

python src/evaluate.py \
    --structure_path=outputs/dng_samples \
    --reference_dataset=mp-20 \
    --output_file=benchmark/results/my_results.csv

Evaluation Metrics

The evaluation script computes several metrics to assess generation quality:

For detailed implementation, see src/utils/metrics.py.

Python API Usage

You can also compute metrics using the Python API directly:

from monty.serialization import loadfn
from src.utils.metrics import Metrics

# Load generated structures
gen_structures = loadfn("outputs/eval_samples/structures.json.gz")

# Create metrics object
metrics = Metrics(
    metrics=["unique", "novel", "e_above_hull", "composition_validity"],
    reference_dataset="mp-20",
    phase_diagram="mp-all",
    metastable_threshold=0.1,
    progress_bar=True,
)

# Compute metrics
results = metrics.compute(gen_structures=gen_structures)

# Save results
metrics.to_csv("outputs/results.csv")

# Or get as DataFrame
df = metrics.to_dataframe()
print(df.head())

Reference Datasets

Available reference datasets:

Results are saved to the specified output file in CSV format for further analysis.

Benchmarks for Chemeleon2 DNG

Pre-computed benchmark results for de novo generation (DNG) are available in the benchmarks/dng/ directory:

Loading Benchmark Data

These files contain generated crystal structures in compressed JSON format:

from monty.serialization import loadfn

# Load benchmark structures
structures = loadfn("benchmarks/dng/chemeleon2_rl_dng_mp_20.json.gz")
print(f"Loaded {len(structures)} structures")