πŸ§ͺ

ChemAI Lab

Machine Learning & Quantum Chemistry Driven Chemical Research Platform

Python PyTorch RDKit v0.2.0 Apache 2.0
πŸ“¦ GitHub Repository
Project path

Place ChemAI Lab inside the application story

This page shows a long-term technical direction. Reviewers can return to the project section, verify certificates, or open the one-page resume.

Overview

ChemAI Lab is a unified chemistry + AI research platform designed to provide an end-to-end toolchain at the intersection of computational chemistry and machine learning. It integrates multiple open-source cheminformatics tools (RDKit, Chemprop, DeepChem, DScribe, MLatom, Molfeat) with a unified abstraction layer for data handling, feature engineering, model training, explainable AI, and quantum chemistry interfaces.

The project currently focuses on asymmetric organocatalysis with chiral phosphoric acids (CPAs), following a scientific roadmap of 5 stages across 100 development phases β€” from data infrastructure through to inverse molecular design.

Tech Stack

PyTorchDeep Learning
LightningTraining Pipeline
RDKitCheminformatics
scikit-learnClassic ML
NumPyNumerical
PandasData Processing
SciPyScientific
loguruLogging
ChempropOptional
DeepChemOptional
DScribeOptional
MLatomOptional

Module Architecture

16 subpackages covering the full research workflow

πŸ“¦

data

Molecular data abstraction: Molecule, Dataset, FormatConverter

πŸ”¬

features

Molecular featurization: MolecularFeaturizer, FeatureStore

🧠

models

Model registry: ModelHub, ModelRegistry

⚑

nn

Neural network modules & custom layers

πŸ”—

pipeline

Workflow orchestration: Workflow, PipelineStep

βš›οΈ

qm

Quantum chemistry interface: QMInterface

πŸ€–

automl

Auto hyperparameter optimization

πŸ“Š

evaluation

Model evaluation metrics

πŸ”

xai

Explainable AI

🎨

viz

Data visualization

πŸ“

hub / pretrained

Model zoo & pretrained weights

βš™οΈ

cli / config / utils

CLI, config management, utilities

πŸ”

serialization

Serialization utilities

πŸ–₯️

hub

Model hub & version management

Scientific Roadmap

Asymmetric organocatalysis with chiral phosphoric acids (CPAs) β€” 100 phases across 5 stages

1

Data Infrastructure

Standardized formats, descriptor libraries, reaction encoding, visualization tools

Phases 1-20
2

Asymmetric Catalysis Models

CPA-specific descriptors, enantioselectivity prediction, SHAP analysis, multi-task yield/ee models

Phases 21-40
3

Few-Shot Learning

Meta-learning, data augmentation, active learning, transfer learning

Phases 41-60
4

Mechanism-Driven Explainable AI

DFT feature fusion, reaction surface modeling, causal inference, physics-constrained NNs

Phases 61-80
5

Inverse Design

Target-driven molecular generation, inverse condition optimization, automated lab loops, multi-objective optimization

Phases 81-100

Development Standards

  • Code StyleRuff + Black (line length 100, target py311)
  • Type CheckingMypy (non-strict)
  • TestingPytest + plugins (coverage, benchmarks)
  • DocumentationSphinx + mkdocs
  • Git Hookspre-commit automated checks
  • CIGitHub Actions
  • LicenseApache 2.0