zh

Projects

Things I've put into the world.

Production-grade RL systems shipped during industry roles, plus a few open-source utilities and toolkits. Research artefacts tied to specific papers live on the publications page.

Industry experience

RL agents I've shipped into production.

Jun 2022 — May 2023

Top-Performing Team

RL Algorithms Engineer · InspirAI

Hangzhou, China

Headline result

Landlord (Dou Dizhu) AI defeated top-ranked professional players in head-to-head matches.

  • — Built a general-purpose card-game AI SDK deployed across Sanguosha, Hearthstone, Landlord (Dou Dizhu), and GuanDan — four production titles.
  • — On Landlord (Dou Dizhu), the deployed agent reached super-human level, defeating top-ranked professional players in head-to-head matches.
  • — On GuanDan, drove a +6% win-rate improvement over the previous production baseline through targeted algorithm optimization.

Jun 2021 — Oct 2021

Super Special Offer

RL Research Intern · Baidu

Beijing, China

  • — Proposed and implemented EDA-MAPPO (Expert-Data-Assisted Multi-Agent PPO).
  • — Successfully delivered the algorithm into a client production environment.

Toolchain

What I build with.

framework · PyTorch framework · JAX framework · TensorFlow compute · CUDA distributed · Ray / RLlib library · Stable-Baselines3 env · Gymnasium data · NumPy / Pandas experiment · Hydra / Wandb infra · Linux + Slurm

Open source

Tools I've released.

All on GitHub →

Where I'm pushing next

Open directions I'd like to chase.