This repository provides datasets and scripts for benchmarking physics-inspired machine learning models for property prediction of transition metal complexes with diverse charge and spin states.
Prerequisites
- Python 3.12 or later
condaorpip
Dependencies The Python scripts require NumPy, Pandas, and ASE.
pip install -r requirements.txtThe data/ directory contains all datasets, including XYZ files, property CSV files, extended XYZ files (for MACE), and dataset splits.
It includes three datasets: TM-GSspinPlus, tmPHOTO, and OctaKulik.
See the included README.md for details.
Packages used to generate molecular representations can be built from source by following the official documentation:
- ε-SPAHM, SPAHM(a), SPAHM(b): Q-stack
git clone https://github.com/lcmd-epfl/Q-stack cd Q-stack git checkout versioning_issue pip install -e .[spahm,regression] - SLATM, FCHL: Install the modified qml2 version below, forked from qml2.
It includes minor updates to atomic SLATM generation without affecting the resulting representations.git clone https://github.com/lcmd-epfl/tmc_qml2 cd tmc_qml2 pip install -e .
- SOAP: featomic, Documentation
pip install featomic
Bash scripts for generating job files, along with the required Python scripts, are provided under:
-
Molecular representation generation and kernel computations (including timing measurements):
representations/
See the included README.md for details. -
Kernel ridge regression using Q-stack:
krr/
See the included README.md for details.
Precomputed NumPy arrays of the molecular representations used in this work are available on Materials Cloud.
Installation instructions and example input files for 3DMol are available at:
https://github.com/lcmd-epfl/3DMol/tree/TMC-benchmark-v0
Trained models and log files are available on Materials Cloud.
Install the modified MACE version for intensive property prediction:
git clone https://github.com/lcmd-epfl/tmc_mace
cd tmc_mace
git checkout intensive
pip install -e .Example job scripts and train/test extended XYZ files are provided in mace/ for each dataset subdirectory.
See the included README.md for details.
Trained models, job scripts, and log files are available on Materials Cloud.
Two Conda environment files are provided as references.
Both environments were built on Red Hat Enterprise Linux 9.4 (Plow), x86_64.
The environment file environment_x86_64-rhel_9.yml includes all dependencies required for molecular representation generation and kernel benchmarking, including the following packages:
qstackqml2featomicmace
conda env create -f environment_x86_64-rhel_9.yml
conda activate benchmark_tmcThis mace environment mace_x86_64-rhel_9.yml includes only the dependencies required for training MACE models.
conda env create -f mace_x86_64-rhel_9.yml
conda activate mace