Open Source

Open Science. Open Tools.

Benchmark datasets and evaluation tools for reproducible AI research. We believe reproducibility is the foundation of trustworthy science. If you cannot verify it, you should not trust it.

Datasets

Benchmark Datasets

Standardized evaluation datasets for reproducible comparison. Each dataset includes ground-truth labels, train/test splits, and baseline model performance.

AIXC-Mol

Molecular generation benchmark with binding affinity, synthesizability, and ADMET ground truth

142K molecules · 12 targets

AIXC-Fin

Financial causal factor benchmark with regime labels and interventional scenarios

8 markets · 15 years · 3 regimes

AIXC-Mat

Ceramic composition-property pairs with phase stability labels and thermodynamic constraints

47K compositions · 6 properties

AIXC-Causal

Cross-domain causal discovery with ground-truth DAGs, interventional data, and transfer pairs

200 DAGs · 4 domains · 50K samples

Contribute

How to Contribute

Five steps from first clone to merged pull request. We review all contributions within 48 hours and provide detailed feedback.

Fork & Clone

Fork the repo, clone locally, create a feature branch

Read the Docs

Review CONTRIBUTING.md and architecture documentation

Write Tests First

TDD approach: write failing tests, then implement

Submit PR

Open pull request with description, tests, and benchmarks

Review & Merge

48-hour review cycle with detailed feedback

Get Involved

Start Contributing

Join a growing community of researchers and engineers building the next generation of AI-driven scientific discovery tools.

GitHub Organization Read the Papers