[v2]Notes on Training Data

Training Data Format

There are three formats of training data available for the training programs in this repository: one for supervised learning (SL), one for offline reinforcement learning (offline RL), and one for predicting specific outcomes based on the initial or final state of a given round. The first format can be obtained by converting Mahjong Soul game records using cryolite/kanachan.annotate. The second format can be derived by transforming the annotated data in the first format using bin/annotate4rl/annotate4rl.py. The last format can be obtained by converting the annotated data in the first format using (TODO).

Common Conventions

Before detailing the training data formats, the following subsections explain the conventions used in these formats.

Seat

Each player, of course there are four players in a 4-player mahjong game, is distinguished by the notion of "seat". The 0th seat is the dealer (zhuang jia, 荘家) at the start of a game, also known as the qi jia (起家). The 1st seat is the right next to the 0th seat, or xia jia of qi jia (起家の下家) in other words. The 2nd seat is the one across from the 0th seat, or dui mian of qi jia (起家の対面) in other words. The 3rd seat is the left next to the 0th seat, or shang jia of qi jia (起家の上家) in other words.

Seat	Meaning
`0`	the dealer at the start of a game
`1`	the right next to the 0th seat
`2`	the one across from the 0th seat
`3`	the left next to the 0th seat

Relative Seat (Relseat)

There are cases where it is necessary to represent the relative positions of two players. For example, complete information about a pon (peng, 碰, ポン) includes details about who melds the pon and who discards the melded tile. In such cases, one piece of information is represented by a seat index, and the other is represented by the position relative to the former.

Relseat	Meaning
`0`	the player right next to the player of interest
`1`	the player across from the player of interest
`2`	the player left next to the player of interest

Tile

The type of a tile is represented by an integer from 0 to 36, inclusive.

Tile	Value
0m ~ 9m	`0` ~ `9`
0p ~ 9p	`10` ~ `19`
0s ~ 9s	`20` ~ `29`
1z ~ 7z	`30` ~ `36`

Tile'

There is no need to distinguish between black and red tiles of certain kinds to indicate a type of closed kong (an gang, 暗槓). In such a case, the 34 types of tiles excluding red ones are represented by integers from 0 to 33, inclusive.

Tile	Value
1m ~ 9m	`0` ~ `8`
1p ~ 9p	`9` ~ `17`
1s ~ 9s	`18` ~ `26`
1z ~ 7z	`27` ~ `33`

Grade

The grade (段位) is represented by integers from 0 to 15, inclusive.

Grade	Value
Novice (初心) 1~3	`0` ~ `2`
Adept (雀士) 1~3	`3` ~ `5`
Expert (雀傑) 1~3	`6` ~ `8`
Master (雀豪) 1~3	`9` ~ `11`
Saint (雀聖) 1~3	`12` ~ `14`
Celestial (魂天)	`15`

Chow (Chi, 吃, チー)

Chows are represented by integers from 0 to 89, inclusive.

Value	Chow (The last element represents the discarded tile)
`0`	(2m, 3m, 1m)
`1`	(1m, 3m, 2m)
`2`	(3m, 4m, 2m)
`3`	(1m, 2m, 3m)
`4`	(2m, 4m, 3m)
`5`	(4m, 5m, 3m)
`6`	(4m, 0m, 3m)
`7`	(2m, 3m, 4m)
`8`	(3m, 5m, 4m)
`9`	(3m, 0m, 4m)
`10`	(5m, 6m, 4m)
`11`	(0m, 6m, 4m)
`12`	(3m, 4m, 5m)
`13`	(3m, 4m, 0m)
`14`	(4m, 6m, 5m)
`15`	(4m, 6m, 0m)
`16`	(6m, 7m, 5m)
`17`	(6m, 7m, 0m)
`18`	(4m, 5m, 6m)
`19`	(4m, 0m, 6m)
`20`	(5m, 7m, 6m)
`21`	(0m, 7m, 6m)
`22`	(7m, 8m, 6m)
`23`	(5m, 6m, 7m)
`24`	(0m, 6m, 7m)
`25`	(6m, 8m, 7m)
`26`	(8m, 9m, 7m)
`27`	(6m, 7m, 8m)
`28`	(7m, 9m, 8m)
`29`	(7m, 8m, 9m)
`30` ~ `59`	Likewise for Circle tiles (筒子)
`60` ~ `89`	Likewise for Bamboo tiles (索子)

Pon (Peng, 碰, ポン)

Pons are represented by integers from 0 to 39, inclusive.

Value	Pon (The last element represents the discarded tile)
`0`	(1m, 1m, 1m)
`1`	(2m, 2m, 2m)
`2`	(3m, 3m, 3m)
`3`	(4m, 4m, 4m)
`4`	(5m, 5m, 5m)
`5`	(0m, 5m, 5m)
`6`	(5m, 5m, 0m)
`7`	(6m, 6m, 6m)
`8`	(7m, 7m, 7m)
`9`	(8m, 8m, 8m)
`10`	(9m, 9m, 9m)
`11`	(1p, 1p, 1p)
`12`	(2p, 2p, 2p)
`13`	(3p, 3p, 3p)
`14`	(4p, 4p, 4p)
`15`	(5p, 5p, 5p)
`16`	(0p, 5p, 5p)
`17`	(5p, 5p, 0p)
`18`	(6p, 6p, 6p)
`19`	(7p, 7p, 7p)
`20`	(8p, 8p, 8p)
`21`	(9p, 9p, 9p)
`22`	(1s, 1s, 1s)
`23`	(2s, 2s, 2s)
`24`	(3s, 3s, 3s)
`25`	(4s, 4s, 4s)
`26`	(5s, 5s, 5s)
`27`	(0s, 5s, 5s)
`28`	(5s, 5s, 0s)
`29`	(6s, 6s, 6s)
`30`	(7s, 7s, 7s)
`31`	(8s, 8s, 8s)
`32`	(9s, 9s, 9s)
`33`	(1z, 1z, 1z)
`34`	(2z, 2z, 2z)
`35`	(3z, 3z, 3z)
`36`	(4z, 4z, 4z)
`37`	(5z, 5z, 5z)
`38`	(6z, 6z, 6z)
`39`	(7z, 7z, 7z)

State Features

In this document, a state refers to either the very beginning of a particular round of a game, the very end of a particular round of a game, or a decision-making point for a player. Features of a state refer to various pieces of information related to that state. Features of a state can be divided into four categories: "sparse features," "numeric features," "progression features," and "candidate features." Sparse features are categorical data related to the state, cannot be interpreted numerically, and do not pertain to the game progression up to that state. Numeric features are data related to the state that can be interpreted numerically, and do not pertain to the game progression up to that state. Progression features are data related to the game progression leading up to that state. Candidate features, or simply candidates, are defined only if the state is a decision-making point and consist of all possible actions that can be chosen at that point.

The following will describe the specifications of state features.

Sparse Features

All sparse features are an non-negative integers. The meaning of each integer is as follows.

Title	Value	Note
Room	`0`: Bronze Room (銅の間) `1`: Silver Room (銀の間) `2`: Gold Room (金の間) `3`: Jade Room (玉の間) `4`: Throne Room (王座の間)
Game Style	`5`: quarter-length game (dong feng zhan, 東風戦) `6`: half-length game (ban zhuang zhan, 半荘戦)
Grade of the player at the seat `0`	`7` ~ `22`	`7 + grade`
Grade of the player at the seat `1`	`23` ~ `38`	`23 + grade`
Grade of the player at the seat `2`	`39` ~ `54`	`39 + grade`
Grade of the player at the seat `3`	`55` ~ `70`	`55 + grade`
Seat	`71` ~ `74`	`71 + seat`
Game Wind (Chang, 場)	`75`: East (東場) `76`: South (南場) `77`: West (西場)
Round (Ju, 局)	`78` ~ `81`	`78 + round`
# of Left Tiles to Draw	`82` ~ `151`	`82 + (# of left tiles)`
Dora Indicator	`152` ~ `188`	`152 + tile`
2nd Dora Indicator	`189` ~ `225`	optional, `189 + tile`
3rd Dora Indicator	`226` ~ `262`	optional, `226 + tile`
4th Dora Indicator	`263` ~ `299`	optional, `263 + tile`
5th Dora Indicator	`300` ~ `336`	optional, `300 + tile`
Hand (shou pai, 手牌)	`337` ~ `472`	(combination, see below)
Drawn Tile (zimo pai, 自摸牌)	`473` ~ `509`	optional, `473 + tile`
<PADDING>	`510`	(used for padding)

The following is how a tile in the hand is represented:

Tile	Value
Red 5m	337
First 1m	338
Second 1m	339
Third 1m	340
Fourth 1m	341
First 2m	342
Second 2m	343
Third 2m	344
Fourth 2m	345
First 3m	346
Second 3m	347
Third 3m	348
Fourth 3m	349
First 4m	350
Second 4m	351
Third 4m	352
Fourth 4m	353
First black 5m	354
Second black 5m	355
Third black 5m	356
First 6m	357
Second 6m	358
Third 6m	359
Fourth 6m	360
First 7m	361
Second 7m	362
Third 7m	363
Fourth 7m	364
First 8m	365
Second 8m	366
Third 8m	367
Fourth 8m	368
First 9m	369
Second 9m	370
Third 9m	371
Fourth 9m	372
Red 5p	373
First 1p	374
..... (Likewise for Circle tiles (筒子))	...
Red 5s	409
First 1s	410
..... (Likewise for Bamboo tiles (索子))	...
Fourth 9s	446
First East	445
Second East	446
Third East	447
Fourth East	448
First South	449
.....	...
First White Dragon (白)	461
.....	...
Fourth Red Dragon (中)	472

Numeric Features

All numeric features are an non-negative integers. The meaning of each integer is as follows.

Element Index	Explanation
0	The number of counter sticks (ben chang, 本場)
1	The number of riichi deposits (供託本数)
2	The score of the player at the seat `0`
3	The score of the player at the seat `1`
4	The score of the player at the seat `2`
5	The score of the player at the seat `3`

Progression Features

All progression features are non-negative integers. These represent the sequence of discards and meldings made from the start of each round to the state in question, arranged in the order they occurred. The meaning of each integer is as follows.

Title	Values	Note
Begging of Round	`0`	Always starts with this feature
Discard of Tile (打牌)	`5` ~ `596`	`5 + seat * 148 + tile * 4 + a * 2 + b`, where; `a = 0`: not moqi (手出し) `a = 1`: moqi (自摸切り) `b = 0`: w/o riichi declaration `b = 1`: w/ riichi declaration
Chow (Chi, チー, 吃)	`597` ~ `956`	`597 + seat * 90 + chi`
Pon (peng, ポン, 碰)	`957` ~ `1436`	`957 + seat * 120 + relseat * 40 + peng`
Da Ming Gang (大明槓)	`1437` ~ `1880`	`1437 + seat * 111 + relseat * 37 + tile`
An Gang (暗槓)	`1881` ~ `2016`	`1881 + seat * 34 + tile'`
Jia Gang (加槓)	`2017` ~ `2164`	`2017 + seat * 37 + tile`
<PADDING>	`2165`	(used for padding)

Candidate Features

All candidate features (or simply candidates) are an non-negative integers. The meaning of each integer is as follows.

Type of Actions	Value	Note
Discarding tile	`0` ~ `147`	`tile * 4 + a * 2 + b`, where; `a = 0`: not moqi (手出し) `a = 1`: moqi (自摸切り) `b = 0`: w/o riichi declaration `b = 1`: w/ riichi declaration
An Gang (暗槓)	`148` ~ `181`	`148 + tile'`
Jia Gang (加槓)	`182` ~ `218`	Represented by the tile newly added to an existing peng. `182 + tile`
Zimo Hu (自摸和)	`219`
Jiu Zhong Jiu Pai (九種九牌)	`220`
Skip	`221`
Chow (chi, チー, 吃)	`222` ~ `311`	`222 + chi`
Pon, (peng, ポン, 碰)	`312` ~ `431`	`312 + relseat * 40 + peng`
Da Ming Gang (大明槓)	`432` ~ `542`	Represented by the discarded tile. `432 + relseat * 37 + tile`
Rong (栄和)	`543` ~ `545`	`543`: from xia jia (下家から) `544`: from dui mian (対面から) `545`: from shang Jia (上家から)
<PADDING>	`546`	(does not appear in annotation)

Training Data Format for Supervised Learning (SL)

Roughly speaking, the training data format for supervised learning represents the set of triplets, which consist of the situation of a decision-making point (see Annotate for the definition of a decision-making point), the actual action taken by the player at that point, and the results of the round and game where that point appears.

In this format, the annotation of a decision-making point is represented by one text line. Each line is tab-separated into 8 fields, and each field is in turn comma-separated into elements. In each line, the first field is for debugging purposes only, the next 4 fields represent the state features of a decision-making point, the next field represents the actual action taken by the player at that point, and the final two fields represent the round and game results.

0th Field: Game UUID

The 0th field consists of the game UUID, which uniquely identifies the game in which the decision-making point appears. This field is for debugging purposes only and is not used for training at all.

1st Field: Sparse Features

The 1st field consists of the sparse features of the decision-making point.

2nd Field: Numeric Features

The 2nd field consists of the numeric features of the decision-making point.

3rd Field: Progression Features

The 3rd field consists of the progression features of the decision-making point.

4th Field: Candidate Features

The 4th field consists of the candidate features of the decision-making point.

5th Field: Actual Action

The 5th field indicates the actual action chosen by the player (indicated by Seat) at the decision-making point. This field is the index to one of the possible actions enumerated in the 4th field.

6th Field: Round Summary

The 6th field indicates the summary of the round where the decision-making point appears. This field consists of a maximum of 7 elements. This field consists of multiple elements only in the case of double or triple deal-ins (ダブロン, トリプルロン), or the end of a round due to an exhaustive draw (荒牌平局).

Value	Explanation
`0`	Win of the player at the seat `0` by drawing a tile (席`0`の自摸和)
`1`	Win of the player at the seat `1` by drawing a tile (席`1`の自摸和)
`2`	Win of the player at the seat `2` by drawing a tile (席`2`の自摸和)
`3`	Win of the player at the seat `3` by drawing a tile (席`3`の自摸和)
`4`	Win of the player at the seat `0` by dealt-in by the player at the seat `1` (席`1`から席`0`への放銃)
`5`	Win of the player at the seat `0` by dealt-in by the player at the seat `2` (席`2`から席`0`への放銃)
`6`	Win of the player at the seat `0` by dealt-in by the player at the seat `3` (席`3`から席`0`への放銃)
`7`	Win of the player at the seat `1` by dealt-in by the player at the seat `0` (席`0`から席`1`への放銃)
`8`	Win of the player at the seat `1` by dealt-in by the player at the seat `2` (席`2`から席`1`への放銃)
`9`	Win of the player at the seat `1` by dealt-in by the player at the seat `3` (席`3`から席`1`への放銃)
`10`	Win of the player at the seat `2` by dealt-in by the player at the seat `0` (席`0`から席`2`への放銃)
`11`	Win of the player at the seat `2` by dealt-in by the player at the seat `1` (席`1`から席`2`への放銃)
`12`	Win of the player at the seat `2` by dealt-in by the player at the seat `3` (席`3`から席`2`への放銃)
`13`	Win of the player at the seat `3` by dealt-in by the player at the seat `0` (席`0`から席`3`への放銃)
`14`	Win of the player at the seat `3` by dealt-in by the player at the seat `1` (席`1`から席`3`への放銃)
`15`	Win of the player at the seat `3` by dealt-in by the player at the seat `2` (席`2`から席`3`への放銃)
`16`	No left tile without any ready hand of the player at the seat `0` (席`0`の不聴を伴う荒牌平局)
`17`	No left tile with a ready hand of the player at the seat `0` (席`0`の聴牌を伴う荒牌平局)
`18`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `0`
`19`	No left tile without any ready hand of the player at the seat `1` (席`1`の不聴を伴う荒牌平局)
`20`	No left tile with a ready hand of the player at the seat `1` (席`1`の聴牌を伴う荒牌平局)
`21`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `1`
`22`	No left tile without any ready hand of the player at the seat `2` (席`2`の不聴を伴う荒牌平局)
`23`	No left tile with a ready hand of the player at the seat `2` (席`2`の聴牌を伴う荒牌平局)
`24`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `2`
`25`	No left tile without any ready hand of the player at the seat `3` (席`3`の不聴を伴う荒牌平局)
`26`	No left tile with a ready hand of the player at the seat `3` (席`3`の聴牌を伴う荒牌平局)
`27`	No left tile with Liuju Manguan (流し満貫) by the player at the seat `3`
`28`	Interruption of the game
`29`	<PADDING> (does not appear in annotation)

7th Field: Results

The 7th field represents the result of the round where the decision-making point appears and the result of the game. This field consists of exactly 14 elements.

Element Index	Explanation
0	Round delta of the score of the player at the seat `0`
1	Round delta of the score of the player at the seat `1`
2	Round delta of the score of the player at the seat `2`
3	Round delta of the score of the player at the seat `3`
4	End-of-round number of counter sticks
5	End-of-round number of riichi deposits
6	End-of-round score of the player at the seat `0`
7	End-of-round score of the player at the seat `1`
8	End-of-round score of the player at the seat `2`
9	End-of-round score of the player at the seat `3`
10	End-of-game score of the player at the seat `0`
11	End-of-game score of the player at the seat `1`
12	End-of-game score of the player at the seat `2`
13	End-of-game score of the player at the seat `3`

Training Data Format for Offline Reinforcement Learning (Offline RL)

Roughly speaking, the training data format for offline reinforcement learning consists of a set of triplets (s, a, s') or (s, a, o), which represent state transitions from a decision-making point to either the next consecutive decision-making point or the "terminal state" of the game.

In the former, (s, a, s'), s and s' represent the situation at two consecutive decision-making points as seen from one player's perspective. From this, s is not the last decison-making point of each game for any given player. a represents the action taken by the player at s. In other words, (s, a, s') represents a state transition from s to s', from the perspective of one player.

In the latter, (s, a, o), s represents the situation at the last decision-making point from the perspective of a player in each game. Note that (s, a, o) represents the last decision-making point "from the perspective of a player", so there exist four (s, a, o) in each game of a 4-player mahjong. a represents the action taken by the player at s. In other words, (s, a) represents a state transition from s to the "terminal state" of each game, where a is the last action taken by the player in that game. o is the result of the game.

Let me describe this format in more detail. The annotation of a state transition from a decision-making point to the next consecutive decision-making point or the terminal state of the game is represented by one text line. Each line is tab-separated into either 10, 12, or 8 fields, and each field is in turn comma-separated into elements. Lines with 10 tab-separated fields are annotations of state transitions from a decision-making point to the next consecutive decision-making point. In these cases, the former decision-making point is not the last one for a player in a round. Lines with 12 tab-separated fields are annotations of state transitions from the last decision-making point for a player in a round to the next consecutive (and thus beginning-of-round) decision-making point. Lines with 8 tab-separated fields are annotations of state transitions from the last decision-making point for a player in a game to the terminal state of the game.

Format for Lines with 10 Tab-Separated Fields

This format represents the transition between two consecutive states from the perspective of a player, where the former state is not the player's final state in any round.

0th Field: Game UUID

The 0th field consists of the game UUID, which uniquely identifies the game in which the transition appears. This field is for debugging purposes only and is not used for training at all.

1st Field: Sparse Features of Former State

The 1st field consists of the sparse features of the former state.

2nd Field: Numeric Features of Former State

The 2nd field consists of the numeric features of the former state.

3rd Field: Progression Features of Former State

The 3rd field consists of the progression features of the former state.

4th Field: Candidate Features of Former State

The 4th field consists of the candidate features of the former state.

5th Field: Actual Action Taken at Former State

The 5th field indicates the actual action chosen by the player at the former state. This field is the index to one of the possible actions enumerated in the 4th field.

6th Field: Sparse Features of Latter State

The 6th field consists of the sparse features of the latter state.

7th Field: Numeric Features of Latter State

The 7th field consists of the numeric features of the latter state.

8th Field: Progression Features of Latter State

The 8th field consists of the progression features of the latter state.

9th Field: Candidate Features of Latter State

The 9th field consists of the candidate features of the latter state.

Format for Lines with 12 Tab-Separated Fields

This format represents the transition between two consecutive states from the perspective of a player, where the former state is the player's final state in a round but not at the end of a game. In addition to the fields explained in the Format for Lines with 10 Tab-Separated Fields section, the following two fields are appened.

10th Field: Round Summary

The 10th field indicates the summary of the round where the transition appears. The meaning of the elements in this field is identical to that in the 6th Field: Round Summary section.

11th Field: Round Result

The 11th field represents the result of the round where the transition appears. This field consists of exactly 10 elements.

Element Index	Explanation
0	Round delta of the score of the player at the seat `0`
1	Round delta of the score of the player at the seat `1`
2	Round delta of the score of the player at the seat `2`
3	Round delta of the score of the player at the seat `3`
4	End-of-round number of counter sticks
5	End-of-round number of riichi deposits
6	End-of-round score of the player at the seat `0`
7	End-of-round score of the player at the seat `1`
8	End-of-round score of the player at the seat `2`
9	End-of-round score of the player at the seat `3`

Format for Lines with 8 Tab-Separated Fields

This format represents the transition from the player's end-of-game state to the "terminal" state. The first six fields (from the 0th field to the 5th field) are identical to the first six fields of Format for Lines with 10 Tab-Separated Fields. The remaining two fields are as follows.

6th Field: Round Summary

The 6th field indicates the summary of the round where the transition appears. The meaning of the elements in this field is identical to that in the 6th Field: Round Summary section.

7th Field: Results

The 7th field represents the results of the round and the game where the transition appears. The meaning of the element in this field is identical to that in the 7th Field: Results section.

Note

In most cases, the elements from the 6th to the 9th (each player's score after settlement in the round where a player's final action in the game occurred) agrees with those from the 10th to the 13th (each player's score after settlement at the end of a game). However, there are very rare cases where it does not hold. A typical example is when a game ends without a player having any turn in the final round. In this case, the player's last action in the game is not in the final round of the game, so the elements from the 6th to the 9th and those from the 10th to the 13th will differ. To cover this exceptional case, this field has the elements from the 10th to 13th.

Training Data Format for Round

0th Column: Game UUID

1st Column: Sparse Features

The 1st column consists of sparse features. All the elements in this column are an non-negative integer. These integers are used as indices for embeddings, which are finally used as a part of inputs to models. The meaning of each integer is as follows.

Title	Value	Note
Room	`0`: Bronze Room (銅の間) `1`: Silver Room (銀の間) `2`: Gold Room (金の間) `3`: Jade Room (玉の間) `4`: Throne Room (王座の間)
Game Style	`5`: quarter-length game (dong feng zhan, 東風戦) `6`: half-length game (ban zhuang zhan, 半荘戦)
Grade of the player at the seat `0`	`7` ~ `22`	`7 + grade`
Grade of the player at the seat `1`	`23` ~ `38`	`23 + grade`
Grade of the player at the seat `2`	`39` ~ `54`	`39 + grade`
Grade of the player at the seat `3`	`55` ~ `70`	`55 + grade`
Game Wind (Chang, 場)	`71`: East (東場) `72`: South (南場) `73`: West (西場)
Round (Ju, 局)	`74` ~ `77`	`74 + round`

2nd Column: Numeric Features

The 2nd column consists of numeric features. This field consists of exactly 6 elements. The numbers in this column are all at the very beginning of the round. These features are numerically meaningful and directly used as a part of inputs to models. The meaning of each element is as follows.

Element Index	Explanation
0	The beginning-of-round number of counter sticks (ben chang, 本場)
1	The number of riichi deposits (供託本数)
2	The beginning-of-round score of the player at the seat `0`
3	The beginning-of-round score of the player at the seat `1`
4	The beginning-of-round score of the player at the seat `2`
5	The beginning-of-round score of the player at the seat `3`

3rd Column: Result

Element Index	Explanation
0	The round score delta of the player at the seat `0`
1	The round score delta of the player at the seat `1`
2	The round score delta of the player at the seat `2`
3	The round score delta of the player at the seat `3`
4	The end-of-game score of the player at the seat `0`
5	The end-of-game score of the player at the seat `1`
6	The end-of-game score of the player at the seat `2`
7	The end-of-game score of the player at the seat `3`

Notes on Training Data

All the learning programs in this project assume that training data may be very huge. This includes the possibility that the training data will not fit in main memory (not GPU memory) or even on disk. Therefore, the learning programs do not put whole the training data into memory at the start time, but access the training data sequentially from the beginning as needed. This way, the learning programs consume very little main memory, no matter how large training data is. The learning programs also support the case where training data is compressed using gzip or bzip2. If the file name of training data ends with ".gz" or ".bz2", the learning programs automatically decompress the training data as they read it.

On the other hand, there is a downside to always accessing training data sequentially from the beginning, i.e., users need to shuffle training data before inputting them to a learning program. In particular, it is strongly discouraged to input annotated data created by annotate into learning programs without shuffling. This is because, in annotated data created using annotate, the annotations for each round are clustered together in a certain part of training data, and it is quite likely for very similar training samples to appear in a certain mini-batch of training. In general, training samples in machine learning are assumed to be independent and identically distributed (i.i.d.), and it is best to avoid such a bias in training samples.

[v2]Notes on Training Data

Training Data Format

Common Conventions

Seat

Relative Seat (Relseat)

Tile

Tile'

Grade

Chow (Chi, 吃, チー)

Pon (Peng, 碰, ポン)

State Features

Sparse Features

Numeric Features

Progression Features

Candidate Features

Training Data Format for Supervised Learning (SL)

0th Field: Game UUID

1st Field: Sparse Features

2nd Field: Numeric Features

3rd Field: Progression Features

4th Field: Candidate Features

5th Field: Actual Action

6th Field: Round Summary

7th Field: Results

Training Data Format for Offline Reinforcement Learning (Offline RL)

Format for Lines with 10 Tab-Separated Fields

0th Field: Game UUID

1st Field: Sparse Features of Former State

2nd Field: Numeric Features of Former State

3rd Field: Progression Features of Former State

4th Field: Candidate Features of Former State

5th Field: Actual Action Taken at Former State

6th Field: Sparse Features of Latter State

7th Field: Numeric Features of Latter State

8th Field: Progression Features of Latter State

9th Field: Candidate Features of Latter State

Format for Lines with 12 Tab-Separated Fields

10th Field: Round Summary

11th Field: Round Result

Format for Lines with 8 Tab-Separated Fields

6th Field: Round Summary

7th Field: Results

Training Data Format for Round

0th Column: Game UUID

1st Column: Sparse Features

2nd Column: Numeric Features

3rd Column: Result

Notes on Training Data

Clone this wiki locally