Skip to content

Commit

Permalink
doc(CHANGELOG.md): add v2.0.4
Browse files Browse the repository at this point in the history
release(hok_env): change hok version to 2.0.4
doc(README.md): Added description for 3v3 mode
fix(hok3v3): fix test_env
  • Loading branch information
hongyangqin committed Dec 28, 2023
1 parent 37f6548 commit 26d455f
Show file tree
Hide file tree
Showing 4 changed files with 256 additions and 67 deletions.
25 changes: 25 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,28 @@
# 20231228 v2.0.4
rl_framework:
1. refactor(logger): utilize logurus as logger
`rl_framework.common.logging` should be replaced by `from rl_framework.common.logging import logger as LOG`.
2. feat(model_manager): support `save_model_seconds`.
3. feat(model_manager): send checkpoints without optimizer state to reduce disk usage cost.
4. feat(send_model): support `backup_ckpt_only`.

aiarena:
1. fix(1v1/agent_demo): typos
2. feat(1v1/agent_demo): return home if ego_hp_rate is less than 0.5.
3. refactor(1v1/3v3): improve code and remove redundant configurations.
4. feat(actor): support `auto_bind_cpu` to bind cpu_id for each actor process according to actor_id.
5. feat(learner): support `load_optimizer_state`.
6. fix(3v3/model): typos

hok_env:
1. feat(3v3): support reward configuration.

Others:
1. Introduce GitHub workflow to upload Python package hok to pypi for every release.
2. Archive network.py for the 3v3 paper (cppo, mappo, ppo).
3. Use a torch-only image, tensorflow training code is now deprecated.
4. Update README.md.

# 20230817

1. Refactor aiarena/hok_env/rl_framework
Expand Down
251 changes: 205 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Honor of Kings AI Open Environment of Tencent(腾讯王者荣耀AI开放环境)

![avatar](./docs/hok_1v1.png)

## Update: 3v3 Mode Now Available
Expand All @@ -13,19 +14,23 @@ Please consult [cluster.md](docs/cluster.md) document for instructions on cluste

## Introduction

[![cpu](https://github.com/tencent-ailab/hok_env/actions/workflows/cpu.yaml/badge.svg)](https://github.com/tencent-ailab/hok_env/actions/workflows/cpu.yaml)
[![gpu](https://github.com/tencent-ailab/hok_env/actions/workflows/gpu.yml/badge.svg)](https://github.com/tencent-ailab/hok_env/actions/workflows/gpu.yml)
[![PyPI](https://github.com/tencent-ailab/hok_env/actions/workflows/python-publish.yml/badge.svg)](https://github.com/tencent-ailab/hok_env/actions/workflows/python-publish.yml)
[![Image](https://github.com/tencent-ailab/hok_env/actions/workflows/gpu.yml/badge.svg)](https://github.com/tencent-ailab/hok_env/actions/workflows/gpu.yml)

- [Hok_env](https://github.com/tencent-ailab/hok_env) is the open environment of the MOBA game: [Honor of kings](https://pvp.qq.com/).

- This repository mainly includes Hok_env SDK, a reinforcement learning training framework and an implementation of ppo algorithm based on the training framework. Hok_env SDK is used to interact with the gamecore of Honor of Kings.

- [Hok_env](https://github.com/tencent-ailab/hok_env) is the open environment of the MOBA game: [Honor of kings 1V1](https://pvp.qq.com/).
- This repository mainly includes Hok_env SDK , a reinforcement learning training framework and an implementation of ppo algorithm based on the training framework. Hok_env SDK is used to interact with the gamecore of Honor of Kings 1v1.
- This repository also contains the implementation code for the paper:
> **Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning**.\
> Hua Wei*, Jingxiao Chen*, Xiyang Ji*, Hongyang Qin, Minwen Deng, Siqin Li, Liang Wang, Weinan Zhang, Yong Yu, Lin Liu, Lanxiao Huang, Deheng Ye, Qiang Fu, Wei Yang. (*Equal contribution) \
> **NeurIPS Datasets and Benchmarks 2022** \
> Project Page: https://github.com/tencent-ailab/hok_env \
> arXiv: https://arxiv.org/abs/2209.08483

> **Abstract**: *This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world’s most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multiagent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at: https://github.com/tencent-ailab/hok_env. The documentation is available at: https://aiarena.tencent.com/hok/doc/.*
> **Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning**.\
> Hua Wei*, Jingxiao Chen*, Xiyang Ji*, Hongyang Qin, Minwen Deng, Siqin Li, Liang Wang, Weinan Zhang, Yong Yu, Lin Liu, Lanxiao Huang, Deheng Ye, Qiang Fu, Wei Yang. (*Equal contribution) \
> **NeurIPS Datasets and Benchmarks 2022** \
> Project Page: https://github.com/tencent-ailab/hok_env \
> arXiv: https://arxiv.org/abs/2209.08483
> **Abstract**: *This paper introduces Honor of Kings Arena, a reinforcement learning (RL) environment based on Honor of Kings, one of the world’s most popular games at present. Compared to other environments studied in most previous work, ours presents new generalization challenges for competitive reinforcement learning. It is a multiagent problem with one agent competing against its opponent; and it requires the generalization ability as it has diverse targets to control and diverse opponents to compete with. We describe the observation, action, and reward specifications for the Honor of Kings domain and provide an open-source Python-based interface for communicating with the game engine. We provide twenty target heroes with a variety of tasks in Honor of Kings Arena and present initial baseline results for RL-based methods with feasible computing resources. Finally, we showcase the generalization challenges imposed by Honor of Kings Arena and possible remedies to the challenges. All of the software, including the environment-class, are publicly available at: https://github.com/tencent-ailab/hok_env. The documentation is available at: https://aiarena.tencent.com/hok/doc/.*

- Current supported heroes in hok_env:
- lubanqihao
- miyue
Expand All @@ -46,39 +51,44 @@ Please consult [cluster.md](docs/cluster.md) document for instructions on cluste
- gongsunli
- peiqinhu
- shangguanwaner
### Observation and action spaces
Please refer to https://aiarena.tencent.com/hok/doc/quickstart/index.html


## Running Requirement

* python 3.6+ (see our test dependencies in env.yaml).
* Windows 10 /11.
* Docker (to deploy Linux containers)
* wsl 2 (Windows Subsystem for Linux Version 2.0)
- python >= 3.6, <= 3.9.

- Windows 10/11 or Linux wine (to deploy windows gamecore server)

- Docker (to deploy hok_env on Linux containers)

- For windows, WSL 2 is required. (Windows Subsystem for Linux Version 2.0)

The gamecore of hok_env runs on the Windows platform, and the package **_hok_env_** needs to be deployed in linux platforms to interact with the gamecore.

The gamecore of hok_env runs on the Windows platform, and the package **_hok_env_** needs to be deployed in linux platforms to interact with the gamecore.
We also provided a docker image for training on your computer. In a further version, we will release a gamecore server compatible with linux.

To enable cluster training, here is a workaround by running Windows gamecore on Linux: [run windows gamecore on linux](./docs/run_windows_gamecore_on_linux.md).

## Installation
## Gamecore Installation

### Download the hok gamecore

You need to apply for the license and gamecore on this page: https://aiarena.tencent.com/aiarena/en/open-gamecore

Please put the `license.dat` under the folder:`hok_env_gamecore/gamecore/core_assets` and add the path of the folder `ai_simulator_remote` to the system environment variables.

![avatar](./docs/sgame_folder.png)
### Start the gamecore

### Test the gamecore

open CMD
``` shell
```shell
cd gamecore\bin
set PATH=%PATH%;..\lib\
.\sgame_simulator_remote_zmq.exe .\sgame_simulator.common.conf
```

sgame_simulator_remote_zmq.exe requires one parameters: config file path
`sgame_simulator_remote_zmq.exe` requires one parameters: `sgame_simulator.common.conf` the config file path

You can see the following message:
```
Expand Down Expand Up @@ -108,10 +118,8 @@ SGame Simulator End [FrameNum:8612][TimeUsed:7580ms]
```
The gamecore has started successfully!

---

`sgame_simulator.common.conf`:
```angular2html
Here is the content of `sgame_simulator.common.conf`:
```json
{
"abs_file": "../scene/1V1.abs",
"core_assets": "../core_assets",
Expand All @@ -135,34 +143,54 @@ kaiwu-base-35401-35400-6842-1669963820108111766-217.stat
kaiwu-base-35401-35400-6842-1669963820108111766-217_detail.stat
```

## 1v1

### Observation and action spaces

Please refer to https://aiarena.tencent.com/hok/doc/quickstart/index.html

### Usage

Please refer to [hok1v1/unit_test](hok_env/hok/hok1v1/unit_test/test_env.py) for the basic usage of the hok1v1.

Please refer to [aiarena/1v1](aiarena/1v1) for the training code of hok1v1.

### Test the gamecore with the demo script in WSL

You can test gamecore with a simple python script in wsl.

#### Make sure your pc supports wsl2

For the installation and upgrade of wsl2, please refer to the link: https://docs.microsoft.com/zh-cn/windows/wsl/install-manual#step-4---download-the-linux-kernel-update-package`

You need to install python3.6 and some required dependencies in wsl.

#### Run the test script in wsl2

0. Start the gamecore server outside wsl2
```
cd gamecore
gamecore-server.exe server --server-address :23432
```

```shell
cd gamecore
gamecore-server.exe server --server-address :23432
```

1. Install hok_env in python
```
## after git clone this repo
cd hok_env/hok_env
pip install -e .
```

```shell
## after git clone this repo
cd hok_env/hok_env
pip install -e .
```

2. Run the test script
```angular2html
cd /hok_env/hok/hok1v1/unit_test
python test_env.py
```

```shell
cd /hok_env/hok/hok1v1/unit_test
python test_env.py
```

If you see the following message, you have successfully established a connection with Hok_env and have completed a game. Congratulations!

```
# python test_env.py
127.0.0.1:23432 127.0.0.1
Expand Down Expand Up @@ -201,7 +229,8 @@ first frame: 0
}, ...]
```
## Modify Game Config
### Modify 1v1 Game Config
Before running the game Env, you need to create file `config.json` at the running path.
An example:
Expand All @@ -219,19 +248,147 @@ An example:
"log_level": "4"
}
```
This config file includes sub-reward factors and log_level of protobuf processing part.
The file is only loaded when creating instances of `HoK1v1`,

This config file includes sub-reward factors and log_level of protobuf processing part.
The file is only loaded when creating instances of `HoK1v1`,
and any modifications would not be reloaded even if you call `HoK1v1.reset`.

In most cases, log_level should be set as `4` to avoid useless log information.
Only if you meet some error when using our environment, `log_level` may need a
In most cases, log_level should be set as `4` to avoid useless log information.
Only if you meet some error when using our environment, `log_level` may need a
lower value to help us get more information about the error your meet.

## Replay Tool
## 3v3

### Observation and action spaces

Please refer to [hok3v3](https://doc.aiarena.tencent.com/paper/hok3v3/latest/hok3v3_env/honor-of-kings/) for further information.

### Usage

Assuming you have started your gamecore server at `127.0.0.1:23432` and your IP running the hok_env is `127.0.0.1`.

Here is the basic usage of the hok3v3 environment:

- Get the environment instance:

```python
GC_SERVER_ADDR = os.getenv("GAMECORE_SERVER_ADDR", "127.0.0.1:23432")
AI_SERVER_ADDR = os.getenv("AI_SERVER_ADDR", "127.0.0.1")
reward_config = RewardConfig.default_reward_config.copy()

env = get_hok3v3(GC_SERVER_ADDR, AI_SERVER_ADDR, reward_config)
```

- Reset env and start a new game

```python
use_common_ai = [True, False]
camp_config = {
"mode": "3v3",
"heroes": [
[{"hero_id": 190}, {"hero_id": 173}, {"hero_id": 117}],
[{"hero_id": 141}, {"hero_id": 111}, {"hero_id": 107}],
],
}
env.reset(use_common_ai, camp_config, eval_mode=True)
```

- Game loop and predictions

```python
gameover = False
while not gameover:
for i, is_comon_ai in enumerate(use_common_ai):
if is_comon_ai:
continue

continue_process, features, frame_state = env.step_feature(i)
gameover = frame_state.gameover
# only predict every 3 frame
if not continue_process:
continue

probs = random_predict(features, frame_state)
ok, results = env.step_action(i, probs, features, frame_state)
if not ok:
raise Exception("step action failed")

env.close_game(force=True)
```

You can get the default reward config by:
```python
reward_config = RewardConfig.default_reward_config.copy()
```

Here is the [reward config example](aiarena/3v3/actor/config/config.py):
```python
reward_config = {
"whether_use_zero_sum_reward": 1,
"team_spirit": 0,
"time_scaling_discount": 1,
"time_scaling_time": 4500,
"reward_policy": {
"hero_0": {
"hp_rate_sqrt_sqrt": 1,
"money": 0.001,
"exp": 0.001,
"tower": 1,
"killCnt": 1,
"deadCnt": -1,
"assistCnt": 1,
"total_hurt_to_hero": 0.1,
"atk_monster": 0.1,
"win_crystal": 1,
"atk_crystal": 1,
},
},
"policy_heroes": {
"hero_0": [169, 112, 174],
},
}
```

You can get the hero_id by hero_name via [HERO_DICT](hok_env/hok/common/camp.py):
```python
from hok.common.camp import HERO_DICT
print(HERO_DICT)
```

Please refer to [hok3v3/test_env](hok_env/hok/hok3v3/unit_test/test_env.py) for the full code introduced above.

And you can run the test code by following script:

```shell
python3.8 -c "from hok.hok3v3.unit_test.test_env import run_test; run_test()"
```

You will get the following message if works correctly:
```
2023-12-28 12:54:28.106 | INFO | hok.hok3v3.unit_test.test_env:get_hok3v3:14 - Init libprocessor: /usr/local/lib/python3.8/dist-packages/hok/hok3v3/config.dat
2023-12-28 12:54:28.106 | INFO | hok.hok3v3.unit_test.test_env:get_hok3v3:15 - Init reward: {'whether_use_zero_sum_reward': 1, 'team_spirit': 0.2, 'time_scaling_discount': 1, 'time_scaling_time': 4500, 'reward_policy': {'policy_name_0': {'hp_rate_sqrt': 1, 'money': 0.001, 'exp': 0.001, 'tower': 1, 'killCnt': 1, 'deadCnt': -1, 'assistCnt': 1, 'total_hurt_to_hero': 0.1, 'ep_rate': 0.1, 'win_crystal': 1}}, 'hero_policy': {1: 'policy_name_0'}, 'policy_heroes': {'policy_name_0': [1, 2]}}
2023-12-28 12:54:28.107 | INFO | hok.hok3v3.unit_test.test_env:get_hok3v3:16 - Init gamecore environment: 127.0.0.1:23432 127.0.0.1
2023-12-28 12:54:28.107 | INFO | hok.hok3v3.reward:update_reward_config:116 - Update reward config: time_scaling_time:4500, time_scaling_discount:1, team_spirit:0.2, whether_use_zero_sum_reward:1
2023-12-28 12:54:28.107 | INFO | hok.hok3v3.reward:update_reward_config:124 - Update hero reward config: 1 -> {'hp_rate_sqrt': 1, 'money': 0.001, 'exp': 0.001, 'tower': 1, 'killCnt': 1, 'deadCnt': -1, 'assistCnt': 1, 'total_hurt_to_hero': 0.1, 'ep_rate': 0.1, 'win_crystal': 1}
2023-12-28 12:54:28.107 | INFO | hok.hok3v3.reward:update_reward_config:124 - Update hero reward config: 2 -> {'hp_rate_sqrt': 1, 'money': 0.001, 'exp': 0.001, 'tower': 1, 'killCnt': 1, 'deadCnt': -1, 'assistCnt': 1, 'total_hurt_to_hero': 0.1, 'ep_rate': 0.1, 'win_crystal': 1}
2023-12-28 12:54:28.136 | INFO | hok.hok3v3.server:start:31 - Start server at tcp://0.0.0.0:35151
2023-12-28 12:54:28.139 | INFO | hok.hok3v3.env:reset:85 - Reset info: agent:0 is_common_ai:True
2023-12-28 12:54:28.139 | INFO | hok.hok3v3.env:reset:85 - Reset info: agent:1 is_common_ai:False
2023-12-28 12:54:30.212 | INFO | hok.hok3v3.unit_test.test_env:run_test:78 - ----------------------run step 0
2023-12-28 12:54:30.673 | INFO | hok.hok3v3.unit_test.test_env:run_test:78 - ----------------------run step 100
2023-12-28 12:54:30.945 | INFO | hok.hok3v3.unit_test.test_env:run_test:78 - ----------------------run step 200
```

## Cluster training

Please consult [cluster.md](docs/cluster.md) document for instructions on cluster training utilizing the `hok_env` environment and the integrated `rl_framework`.

## Replay software: ABS Parsing Tool (will be provided along with the gamecore)

Watching the game is a direct way to see the performance of your agent throughout a match. We provide a replay tool to visualize the matches.

### Replay software: ABS Parsing Tool (will be provided along with the gamecore in the reply email)
This is an official replay software which parses the ABS file generated by the gamecore and outputs the videos in the game UI of Honor of Kings. The ABS file generated by the gamecore could be found under the folder ai_simulator_remote (the gamecore path).
This is an official replay software which parses the ABS file generated by the gamecore and outputs the videos in the game UI of Honor of Kings.
The ABS file generated by the gamecore could be found under the folder ai_simulator_remote (the gamecore path).
You can visualize the matches by putting ABS files under the `Replays` folder and running **ABSTOOL.exe**.
![avatar](./docs/replay-tool.gif)

Expand All @@ -248,7 +405,9 @@ pip install hok
```

## Citation

If you use the gamecore of hok_env or the code in this repository, please cite our paper as follows.

```
@inproceedings{wei2022hok_env,
title={Honor of Kings Arena: an Environment for Generalization in Competitive Reinforcement Learning},
Expand Down
Loading

0 comments on commit 26d455f

Please sign in to comment.