Skip to content

[v2]Notes on Training Data

Cryolite edited this page May 24, 2024 · 8 revisions

Training Data Format

There are three formats of training data available for the training programs in this repository: one for supervised learning (SL), one for offline reinforcement learning (offline RL), and one for predicting specific outcomes based on the initial or final state of a given round. The first format can be obtained by converting Mahjong Soul game records using cryolite/kanachan.annotate. The second format can be derived by transforming the annotated data in the first format using bin/annotate4rl/annotate4rl.py. The last format can be obtained by converting the annotated data in the first format using (TODO).

Common Conventions

Before detailing the training data formats, the following subsections explain the conventions used in these formats.

Seat

Each player, of course there are four players in a 4-player mahjong game, is distinguished by the notion of "seat". The 0th seat is the dealer (zhuang jia, 荘家) at the start of a game, also known as the qi jia (起家). The 1st seat is the right next to the 0th seat, or xia jia of qi jia (起家の下家) in other words. The 2nd seat is the one across from the 0th seat, or dui mian of qi jia (起家の対面) in other words. The 3rd seat is the left next to the 0th seat, or shang jia of qi jia (起家の上家) in other words.

Seat Meaning
0 the dealer at the start of a game
1 the right next to the 0th seat
2 the one across from the 0th seat
3 the left next to the 0th seat

Relative Seat (Relseat)

There are cases where it is necessary to represent the relative positions of two players. For example, complete information about a pon (peng, 碰, ポン) includes details about who melds the pon and who discards the melded tile. In such cases, one piece of information is represented by a seat index, and the other is represented by the position relative to the former.

Relseat Meaning
0 the player right next to the player of interest
1 the player across from the player of interest
2 the player left next to the player of interest

Tile

The type of a tile is represented by an integer from 0 to 36, inclusive.

Tile Value
0m ~ 9m 0 ~ 9
0p ~ 9p 10 ~ 19
0s ~ 9s 20 ~ 29
1z ~ 7z 30 ~ 36

Tile'

There is no need to distinguish between black and red tiles of certain kinds to indicate a type of closed kong (an gang, 暗槓). In such a case, the 34 types of tiles excluding red ones are represented by integers from 0 to 33, inclusive.

Tile Value
1m ~ 9m 0 ~ 8
1p ~ 9p 9 ~ 17
1s ~ 9s 18 ~ 26
1z ~ 7z 27 ~ 33

Grade

The grade (段位) is represented by integers from 0 to 15, inclusive.

Grade Value
Novice (初心) 1~3 0 ~ 2
Adept (雀士) 1~3 3 ~ 5
Expert (雀傑) 1~3 6 ~ 8
Master (雀豪) 1~3 9 ~ 11
Saint (雀聖) 1~3 12 ~ 14
Celestial (魂天) 15

Chow (Chi, 吃, チー)

Chows are represented by integers from 0 to 89, inclusive.

Value Chow (The last element represents the discarded tile)
0 (2m, 3m, 1m)
1 (1m, 3m, 2m)
2 (3m, 4m, 2m)
3 (1m, 2m, 3m)
4 (2m, 4m, 3m)
5 (4m, 5m, 3m)
6 (4m, 0m, 3m)
7 (2m, 3m, 4m)
8 (3m, 5m, 4m)
9 (3m, 0m, 4m)
10 (5m, 6m, 4m)
11 (0m, 6m, 4m)
12 (3m, 4m, 5m)
13 (3m, 4m, 0m)
14 (4m, 6m, 5m)
15 (4m, 6m, 0m)
16 (6m, 7m, 5m)
17 (6m, 7m, 0m)
18 (4m, 5m, 6m)
19 (4m, 0m, 6m)
20 (5m, 7m, 6m)
21 (0m, 7m, 6m)
22 (7m, 8m, 6m)
23 (5m, 6m, 7m)
24 (0m, 6m, 7m)
25 (6m, 8m, 7m)
26 (8m, 9m, 7m)
27 (6m, 7m, 8m)
28 (7m, 9m, 8m)
29 (7m, 8m, 9m)
30 ~ 59 Likewise for Circle tiles (筒子)
60 ~ 89 Likewise for Bamboo tiles (索子)

Pon (Peng, 碰, ポン)

Pons are represented by integers from 0 to 39, inclusive.

Value Pon (The last element represents the discarded tile)
0 (1m, 1m, 1m)
1 (2m, 2m, 2m)
2 (3m, 3m, 3m)
3 (4m, 4m, 4m)
4 (5m, 5m, 5m)
5 (0m, 5m, 5m)
6 (5m, 5m, 0m)
7 (6m, 6m, 6m)
8 (7m, 7m, 7m)
9 (8m, 8m, 8m)
10 (9m, 9m, 9m)
11 (1p, 1p, 1p)
12 (2p, 2p, 2p)
13 (3p, 3p, 3p)
14 (4p, 4p, 4p)
15 (5p, 5p, 5p)
16 (0p, 5p, 5p)
17 (5p, 5p, 0p)
18 (6p, 6p, 6p)
19 (7p, 7p, 7p)
20 (8p, 8p, 8p)
21 (9p, 9p, 9p)
22 (1s, 1s, 1s)
23 (2s, 2s, 2s)
24 (3s, 3s, 3s)
25 (4s, 4s, 4s)
26 (5s, 5s, 5s)
27 (0s, 5s, 5s)
28 (5s, 5s, 0s)
29 (6s, 6s, 6s)
30 (7s, 7s, 7s)
31 (8s, 8s, 8s)
32 (9s, 9s, 9s)
33 (1z, 1z, 1z)
34 (2z, 2z, 2z)
35 (3z, 3z, 3z)
36 (4z, 4z, 4z)
37 (5z, 5z, 5z)
38 (6z, 6z, 6z)
39 (7z, 7z, 7z)

State Features

In this document, a state refers to either the very beginning of a particular round of a game, the very end of a particular round of a game, or a decision-making point for a player. Features of a state refer to various pieces of information related to that state. Features of a state can be divided into four categories: "sparse features," "numeric features," "progression features," and "candidate features." Sparse features are categorical data related to the state, cannot be interpreted numerically, and do not pertain to the game progression up to that state. Numeric features are data related to the state that can be interpreted numerically, and do not pertain to the game progression up to that state. Progression features are data related to the game progression leading up to that state. Candidate features, or simply candidates, are defined only if the state is a decision-making point and consist of all possible actions that can be chosen at that point.

The following will describe the specifications of state features.

Sparse Features

All sparse features are an non-negative integers. The meaning of each integer is as follows.

Title Value Note
Room 0: Bronze Room (銅の間)
1: Silver Room (銀の間)
2: Gold Room (金の間)
3: Jade Room (玉の間)
4: Throne Room (王座の間)
Game Style 5: quarter-length game (dong feng zhan, 東風戦)
6: half-length game (ban zhuang zhan, 半荘戦)
Grade of the player at the seat 0 7 ~ 22 7 + grade
Grade of the player at the seat 1 23 ~ 38 23 + grade
Grade of the player at the seat 2 39 ~ 54 39 + grade
Grade of the player at the seat 3 55 ~ 70 55 + grade
Seat 71 ~ 74 71 + seat
Game Wind (Chang, 場) 75: East (東場)
76: South (南場)
77: West (西場)
Round (Ju, 局) 78 ~ 81 78 + round
# of Left Tiles to Draw 82 ~ 151 82 + (# of left tiles)
Dora Indicator 152 ~ 188 152 + tile
2nd Dora Indicator 189 ~ 225 optional, 189 + tile
3rd Dora Indicator 226 ~ 262 optional, 226 + tile
4th Dora Indicator 263 ~ 299 optional, 263 + tile
5th Dora Indicator 300 ~ 336 optional, 300 + tile
Hand (shou pai, 手牌) 337 ~ 472 (combination, see below)
Drawn Tile (zimo pai, 自摸牌) 473 ~ 509 optional, 473 + tile
<PADDING> 510 (used for padding)

The following is how a tile in the hand is represented:

Tile Value
Red 5m 337
First 1m 338
Second 1m 339
Third 1m 340
Fourth 1m 341
First 2m 342
Second 2m 343
Third 2m 344
Fourth 2m 345
First 3m 346
Second 3m 347
Third 3m 348
Fourth 3m 349
First 4m 350
Second 4m 351
Third 4m 352
Fourth 4m 353
First black 5m 354
Second black 5m 355
Third black 5m 356
First 6m 357
Second 6m 358
Third 6m 359
Fourth 6m 360
First 7m 361
Second 7m 362
Third 7m 363
Fourth 7m 364
First 8m 365
Second 8m 366
Third 8m 367
Fourth 8m 368
First 9m 369
Second 9m 370
Third 9m 371
Fourth 9m 372
Red 5p 373
First 1p 374
..... (Likewise for Circle tiles (筒子)) ...
Red 5s 409
First 1s 410
..... (Likewise for Bamboo tiles (索子)) ...
Fourth 9s 446
First East 445
Second East 446
Third East 447
Fourth East 448
First South 449
..... ...
First White Dragon (白) 461
..... ...
Fourth Red Dragon (中) 472

Numeric Features

All numeric features are an non-negative integers. The meaning of each integer is as follows.

Element Index Explanation
0 The number of counter sticks (ben chang, 本場)
1 The number of riichi deposits (供託本数)
2 The score of the player at the seat 0
3 The score of the player at the seat 1
4 The score of the player at the seat 2
5 The score of the player at the seat 3

Progression Features

All progression features are non-negative integers. These represent the sequence of discards and meldings made from the start of each round to the state in question, arranged in the order they occurred. The meaning of each integer is as follows.

Title Values Note
Begging of Round 0 Always starts with this feature
Discard of Tile (打牌) 5 ~ 596 5 + seat * 148 + tile * 4 + a * 2 + b, where;
a = 0: not moqi (手出し)
a = 1: moqi (自摸切り)
b = 0: w/o riichi declaration
b = 1: w/ riichi declaration
Chow (Chi, チー, 吃) 597 ~ 956 597 + seat * 90 + chi
Pon (peng, ポン, 碰) 957 ~ 1436 957 + seat * 120 + relseat * 40 + peng
Da Ming Gang (大明槓) 1437 ~ 1880 1437 + seat * 111 + relseat * 37 + tile
An Gang (暗槓) 1881 ~ 2016 1881 + seat * 34 + tile'
Jia Gang (加槓) 2017 ~ 2164 2017 + seat * 37 + tile
<PADDING> 2165 (used for padding)

Candidate Features

All candidate features (or simply candidates) are an non-negative integers. The meaning of each integer is as follows.

Type of Actions Value Note
Discarding tile 0 ~ 147 tile * 4 + a * 2 + b, where;
a = 0: not moqi (手出し)
a = 1: moqi (自摸切り)
b = 0: w/o riichi declaration
b = 1: w/ riichi declaration
An Gang (暗槓) 148 ~ 181 148 + tile'
Jia Gang (加槓) 182 ~ 218 Represented by the tile newly added to an existing peng.
182 + tile
Zimo Hu (自摸和) 219
Jiu Zhong Jiu Pai (九種九牌) 220
Skip 221
Chow (chi, チー, 吃) 222 ~ 311 222 + chi
Pon, (peng, ポン, 碰) 312 ~ 431 312 + relseat * 40 + peng
Da Ming Gang (大明槓) 432 ~ 542 Represented by the discarded tile.
432 + relseat * 37 + tile
Rong (栄和) 543 ~ 545 543: from xia jia (下家から)
544: from dui mian (対面から)
545: from shang Jia (上家から)
<PADDING> 546 (does not appear in annotation)

Training Data Format for Supervised Learning (SL)

Roughly speaking, the training data format for supervised learning represents the set of triplets, which consist of the situation of a decision-making point (see Annotate for the definition of a decision-making point), the actual action taken by the player at that point, and the results of the round and game where that point appears.

In this format, the annotation of a decision-making point is represented by one text line. Each line is tab-separated into 8 fields, and each field is in turn comma-separated into elements. In each line, the first field is for debugging purposes only, the next 4 fields represent the state features of a decision-making point, the next field represents the actual action taken by the player at that point, and the final two fields represent the round and game results.

0th Field: Game UUID

The 0th field consists of the game UUID, which uniquely identifies the game in which the decision-making point appears. This field is for debugging purposes only and is not used for training at all.

1st Field: Sparse Features

The 1st field consists of the sparse features of the decision-making point.

2nd Field: Numeric Features

The 2nd field consists of the numeric features of the decision-making point.

3rd Field: Progression Features

The 3rd field consists of the progression features of the decision-making point.

4th Field: Candidate Features

The 4th field consists of the candidate features of the decision-making point.

5th Field: Actual Action

The 5th field indicates the actual action chosen by the player (indicated by Seat) at the decision-making point. This field is the index to one of the possible actions enumerated in the 4th field.

6th Field: Round Summary

The 6th field indicates the summary of the round where the decision-making point appears. This field consists of a maximum of 7 elements. This field consists of multiple elements only in the case of double or triple deal-ins (ダブロン, トリプルロン), or the end of a round due to an exhaustive draw (荒牌平局).

Value Explanation
0 Win of the player at the seat 0 by drawing a tile (席0の自摸和)
1 Win of the player at the seat 1 by drawing a tile (席1の自摸和)
2 Win of the player at the seat 2 by drawing a tile (席2の自摸和)
3 Win of the player at the seat 3 by drawing a tile (席3の自摸和)
4 Win of the player at the seat 0 by dealt-in by the player at the seat 1 (席1から席0への放銃)
5 Win of the player at the seat 0 by dealt-in by the player at the seat 2 (席2から席0への放銃)
6 Win of the player at the seat 0 by dealt-in by the player at the seat 3 (席3から席0への放銃)
7 Win of the player at the seat 1 by dealt-in by the player at the seat 0 (席0から席1への放銃)
8 Win of the player at the seat 1 by dealt-in by the player at the seat 2 (席2から席1への放銃)
9 Win of the player at the seat 1 by dealt-in by the player at the seat 3 (席3から席1への放銃)
10 Win of the player at the seat 2 by dealt-in by the player at the seat 0 (席0から席2への放銃)
11 Win of the player at the seat 2 by dealt-in by the player at the seat 1 (席1から席2への放銃)
12 Win of the player at the seat 2 by dealt-in by the player at the seat 3 (席3から席2への放銃)
13 Win of the player at the seat 3 by dealt-in by the player at the seat 0 (席0から席3への放銃)
14 Win of the player at the seat 3 by dealt-in by the player at the seat 1 (席1から席3への放銃)
15 Win of the player at the seat 3 by dealt-in by the player at the seat 2 (席2から席3への放銃)
16 No left tile without any ready hand of the player at the seat 0 (席0の不聴を伴う荒牌平局)
17 No left tile with a ready hand of the player at the seat 0 (席0の聴牌を伴う荒牌平局)
18 No left tile with Liuju Manguan (流し満貫) by the player at the seat 0
19 No left tile without any ready hand of the player at the seat 1 (席1の不聴を伴う荒牌平局)
20 No left tile with a ready hand of the player at the seat 1 (席1の聴牌を伴う荒牌平局)
21 No left tile with Liuju Manguan (流し満貫) by the player at the seat 1
22 No left tile without any ready hand of the player at the seat 2 (席2の不聴を伴う荒牌平局)
23 No left tile with a ready hand of the player at the seat 2 (席2の聴牌を伴う荒牌平局)
24 No left tile with Liuju Manguan (流し満貫) by the player at the seat 2
25 No left tile without any ready hand of the player at the seat 3 (席3の不聴を伴う荒牌平局)
26 No left tile with a ready hand of the player at the seat 3 (席3の聴牌を伴う荒牌平局)
27 No left tile with Liuju Manguan (流し満貫) by the player at the seat 3
28 Interruption of the game
29 <PADDING> (does not appear in annotation)

7th Field: Results

The 7th field represents the result of the round where the decision-making point appears and the result of the game. This field consists of exactly 14 elements.

Element Index Explanation
0 Round delta of the score of the player at the seat 0
1 Round delta of the score of the player at the seat 1
2 Round delta of the score of the player at the seat 2
3 Round delta of the score of the player at the seat 3
4 End-of-round number of counter sticks
5 End-of-round number of riichi deposits
6 End-of-round score of the player at the seat 0
7 End-of-round score of the player at the seat 1
8 End-of-round score of the player at the seat 2
9 End-of-round score of the player at the seat 3
10 End-of-game score of the player at the seat 0
11 End-of-game score of the player at the seat 1
12 End-of-game score of the player at the seat 2
13 End-of-game score of the player at the seat 3

Training Data Format for Offline Reinforcement Learning (Offline RL)

Roughly speaking, the training data format for offline reinforcement learning consists of a set of triplets (s, a, s') or (s, a, o), which represent state transitions from a decision-making point to either the next consecutive decision-making point or the "terminal state" of the game.

In the former, (s, a, s'), s and s' represent the situation at two consecutive decision-making points as seen from one player's perspective. From this, s is not the last decison-making point of each game for any given player. a represents the action taken by the player at s. In other words, (s, a, s') represents a state transition from s to s', from the perspective of one player.

In the latter, (s, a, o), s represents the situation at the last decision-making point from the perspective of a player in each game. Note that (s, a, o) represents the last decision-making point "from the perspective of a player", so there exist four (s, a, o) in each game of a 4-player mahjong. a represents the action taken by the player at s. In other words, (s, a) represents a state transition from s to the "terminal state" of each game, where a is the last action taken by the player in that game. o is the result of the game.

Let me describe this format in more detail. The annotation of a state transition from a decision-making point to the next consecutive decision-making point or the terminal state of the game is represented by one text line. Each line is tab-separated into either 10, 12, or 8 fields, and each field is in turn comma-separated into elements. Lines with 10 tab-separated fields are annotations of state transitions from a decision-making point to the next consecutive decision-making point. In these cases, the former decision-making point is not the last one for a player in a round. Lines with 12 tab-separated fields are annotations of state transitions from the last decision-making point for a player in a round to the next consecutive (and thus beginning-of-round) decision-making point. Lines with 8 tab-separated fields are annotations of state transitions from the last decision-making point for a player in a game to the terminal state of the game.

Format for Lines with 10 Tab-Separated Fields

This format represents the transition between two consecutive states from the perspective of a player, where the former state is not the player's final state in any round.

0th Field: Game UUID

The 0th field consists of the game UUID, which uniquely identifies the game in which the transition appears. This field is for debugging purposes only and is not used for training at all.

1st Field: Sparse Features of Former State

The 1st field consists of the sparse features of the former state.

2nd Field: Numeric Features of Former State

The 2nd field consists of the numeric features of the former state.

3rd Field: Progression Features of Former State

The 3rd field consists of the progression features of the former state.

4th Field: Candidate Features of Former State

The 4th field consists of the candidate features of the former state.

5th Field: Actual Action Taken at Former State

The 5th field indicates the actual action chosen by the player at the former state. This field is the index to one of the possible actions enumerated in the 4th field.

6th Field: Sparse Features of Latter State

The 6th field consists of the sparse features of the latter state.

7th Field: Numeric Features of Latter State

The 7th field consists of the numeric features of the latter state.

8th Field: Progression Features of Latter State

The 8th field consists of the progression features of the latter state.

9th Field: Candidate Features of Latter State

The 9th field consists of the candidate features of the latter state.

Format for Lines with 12 Tab-Separated Fields

This format represents the transition between two consecutive states from the perspective of a player, where the former state is the player's final state in a round but not at the end of a game. In addition to the fields explained in the Format for Lines with 10 Tab-Separated Fields section, the following two fields are appened.

10th Field: Round Summary

The 10th field indicates the summary of the round where the transition appears. The meaning of the elements in this field is identical to that in the 6th Field: Round Summary section.

11th Field: Round Result

The 11th field represents the result of the round where the transition appears. This field consists of exactly 10 elements.

Element Index Explanation
0 Round delta of the score of the player at the seat 0
1 Round delta of the score of the player at the seat 1
2 Round delta of the score of the player at the seat 2
3 Round delta of the score of the player at the seat 3
4 End-of-round number of counter sticks
5 End-of-round number of riichi deposits
6 End-of-round score of the player at the seat 0
7 End-of-round score of the player at the seat 1
8 End-of-round score of the player at the seat 2
9 End-of-round score of the player at the seat 3

Format for Lines with 8 Tab-Separated Fields

This format represents the transition from the player's end-of-game state to the "terminal" state. The first six fields (from the 0th field to the 5th field) are identical to the first six fields of Format for Lines with 10 Tab-Separated Fields. The remaining two fields are as follows.

6th Field: Round Summary

The 6th field indicates the summary of the round where the transition appears. The meaning of the elements in this field is identical to that in the 6th Field: Round Summary section.

7th Field: Results

The 7th field represents the results of the round and the game where the transition appears. The meaning of the element in this field is identical to that in the 7th Field: Results section.

Note

In most cases, the elements from the 6th to the 9th (each player's score after settlement in the round where a player's final action in the game occurred) agrees with those from the 10th to the 13th (each player's score after settlement at the end of a game). However, there are very rare cases where it does not hold. A typical example is when a game ends without a player having any turn in the final round. In this case, the player's last action in the game is not in the final round of the game, so the elements from the 6th to the 9th and those from the 10th to the 13th will differ. To cover this exceptional case, this field has the elements from the 10th to 13th.

Training Data Format for Round

0th Column: Game UUID

1st Column: Sparse Features

The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.

Title Value Note
Room 0: Bronze Room (銅の間)
1: Silver Room (銀の間)
2: Gold Room (金の間)
3: Jade Room (玉の間)
4: Throne Room (王座の間)
Game Style 5: quarter-length game (dong feng zhan, 東風戦)
6: half-length game (ban zhuang zhan, 半荘戦)
Grade of the player at the seat 0 7 ~ 22 7 + grade
Grade of the player at the seat 1 23 ~ 38 23 + grade
Grade of the player at the seat 2 39 ~ 54 39 + grade
Grade of the player at the seat 3 55 ~ 70 55 + grade
Game Wind (Chang, 場) 71: East (東場)
72: South (南場)
73: West (西場)
Round (Ju, 局) 74 ~ 77 74 + round

2nd Column: Numeric Features

The 2nd column consists of numeric features. This field consists of exactly 6 elements. The numbers in this column are all at the very beginning of the round. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.

Element Index Explanation
0 The beginning-of-round number of counter sticks (ben chang, 本場)
1 The number of riichi deposits (供託本数)
2 The beginning-of-round score of the player at the seat 0
3 The beginning-of-round score of the player at the seat 1
4 The beginning-of-round score of the player at the seat 2
5 The beginning-of-round score of the player at the seat 3

3rd Column: Result

Element Index Explanation
0 The round score delta of the player at the seat 0
1 The round score delta of the player at the seat 1
2 The round score delta of the player at the seat 2
3 The round score delta of the player at the seat 3
4 The end-of-game score of the player at the seat 0
5 The end-of-game score of the player at the seat 1
6 The end-of-game score of the player at the seat 2
7 The end-of-game score of the player at the seat 3

Notes on Training Data

All the learning programs in this project assume that training data may be very huge. This includes the possibility that the training data will not fit in main memory (not GPU memory) or even on disk. Therefore, the learning programs do not put whole the training data into memory at the start time, but access the training data sequentially from the beginning as needed. This way, the learning programs consume very little main memory, no matter how large training data is. The learning programs also support the case where training data is compressed using gzip or bzip2. If the file name of training data ends with ".gz" or ".bz2", the learning programs automatically decompress the training data as they read it.

On the other hand, there is a downside to always accessing training data sequentially from the beginning, i.e., users need to shuffle training data before inputting them to a learning program. In particular, it is strongly discouraged to input annotated data created by annotate into learning programs without shuffling. This is because, in annotated data created using annotate, the annotations for each round are clustered together in a certain part of training data, and it is quite likely for very similar training samples to appear in a certain mini-batch of training. In general, training samples in machine learning are assumed to be independent and identically distributed (i.i.d.), and it is best to avoid such a bias in training samples.

Clone this wiki locally