Skip to content

Commit

Permalink
Merge pull request #23 from rover12421/patch-1
Browse files Browse the repository at this point in the history
fixer: tokenizer.tokenize result
  • Loading branch information
jsksxs360 authored May 18, 2024
2 parents 58a649f + d12a248 commit 176f602
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _c2/2021-12-11-transformers-note-2.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,7 @@ print(tokens)
```

```
['Using', 'a', 'Trans', '##former', 'network', 'is', 'simple']
['using', 'a', 'transform', '##er', 'network', 'is', 'simple']
```

可以看到,BERT 分词器采用的是子词切分策略,它会不断切分词语直到获得词表中的 token,例如 “transformer” 会被切分为 “transform” 和 “##er”。
Expand Down

0 comments on commit 176f602

Please sign in to comment.