Projects

Things I've put into the world.

Production-grade RL systems shipped during industry roles, plus a few open-source utilities and toolkits. Research artefacts tied to specific papers live on the publications page.

Industry experience

RL agents I've shipped into production.

Jun 2022 — May 2023

Top-Performing Team

RL Algorithms Engineer · InspirAI

Hangzhou, China

Headline result

Landlord (Dou Dizhu) AI defeated top-ranked professional players in head-to-head matches.

— Built a general-purpose card-game AI SDK deployed across Sanguosha, Hearthstone, Landlord (Dou Dizhu), and GuanDan — four production titles.
— On Landlord (Dou Dizhu), the deployed agent reached super-human level, defeating top-ranked professional players in head-to-head matches.
— On GuanDan, drove a +6% win-rate improvement over the previous production baseline through targeted algorithm optimization.

Jun 2021 — Oct 2021

Super Special Offer

RL Research Intern · Baidu

Beijing, China

— Proposed and implemented EDA-MAPPO (Expert-Data-Assisted Multi-Agent PPO).
— Successfully delivered the algorithm into a client production environment.

Toolchain

What I build with.

framework · PyTorch framework · JAX framework · TensorFlow compute · CUDA distributed · Ray / RLlib library · Stable-Baselines3 env · Gymnasium data · NumPy / Pandas experiment · Hydra / Wandb infra · Linux + Slurm

Open source

Tools I've released.

All on GitHub →

</> plasticity

Plasticity-Scan

Visualize neural-network plasticity under shifting regression tasks.

A toolkit for studying how networks adapt when the target distribution changes over time. Designed for quick experimentation around plasticity loss diagnostics.

plasticitydiagnosticstoolkit

</> data

nettrace

Unified data loader for network bandwidth traces and video chunks.

Common loader for research on adaptive streaming, QoE optimization, and RL-based bitrate adaptation. Cleans up the trace-format zoo so experiments can focus on policy.

datavideo-streamingrl

Where I'm pushing next

Open directions I'd like to chase.

◐ Reinforcement learning algorithms and theory under continual shift
◇ Large-scale and data-efficient RL with deployable cost
◧ Stable, robust control for embedded and real-time systems
◢ Continual / sustainable learning grounded in world models