Zhiqiang He (何志强)

I am a first-year Phd at the University of Electric Communications in Japan. I have a master's degree from Northeastern University, and my research direction is Reinforcement Learning. My academic research journey began in Jiangxi Province Advanced Control and Key Optimization Laboratory. July 2017 to June 2019, I worked under the guidance of Professor Pengzhan Chen. Subsequently, from July 2019 to June 2022, I continued my research at the Deep Learning and Advanced Intelligent Decision-Making Research Institute , mentored by Professor Jiao Wang.

In my professional capacity, I interned as a Research Engineer at Baidu in Beijing, from June to September 2021. Subsequently, I served as a Reinforcement Learning Algorithms Engineer at InspirAI from June 2022 to May 2023.

Email  /  CV  /  Scholar  /  Github  /  Zhihu  / 

profile photo

Experience

Between June 2022 and May 2023, I served as a Reinforcement Learning Algorithms Engineer at InspirAI. I put forward and optimized a general artificial intelligence modeling paradigm suitable for card games, which was successfully deployed in Hearthstone, Dou Dizhu (defeated professional players), and Guan Dan. Notably, The Doudizhu AI has been launched on the Taptop platform.

In the summer of 2021, I had the opportunity to intern as a Research Engineer at Baidu AI Cloud in Beijing. I developed an innovative multi-agent cooperative adversarial algorithm, which we termed Expert Data-Assisted Multi-Agent Proximal Policy Optimization (EDA-MAPPO). Our work finally released a video showing the performance of our algorithm, which has been published the Source Code. At the same time, we called "superfly" team completed a machine learning for combinatorial optimization competition (9/23).

Academic Activities

Served as a peer reviewer for IEEE Internet of Things Journal.

Publication / Preprint

Understanding World Models through Multi-Step Pruning Policy via Reinforcement Learning
Zhiqiang He, Wen Qiu, Wei Zhao, Xun Shao, Zhi Liu,
Information Sciences, 2024, Source Code, (IF=8.1)

Parallel Multi-Step Pruning Policies enhance diversity Sampling. (Analysis of convergence theory for MSPP and its PG Theorem.)

Erlang planning network: An iterative model-based reinforcement learning with multi-perspective
Jiao Wang, Lemin Zhang, Zhiqiang He, Can Zhu, Zihui Zhao,
Pattern Recognition, 2022, Source Code, (IF=8.5)

Bi-level reinforcement learning in Model-Based Reinforcement Learning.

Control Strategy of Speed Servo Systems Based on Deep Reinforcement Learning
Pengzhan Chen, Zhiqiang He, Chuanxi Chen, Jiahong Xu,
Algorithms 11, no. 5: 65., 2018, Source Code, (Cited 50 times)

First paper applied Reinforcement Learning in Jump Speed Servo System.


Credits.