-
Notifications
You must be signed in to change notification settings - Fork 34.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update chapter01 Chinese translation #224
Conversation
jackbai233
commented
Dec 2, 2023
- Fix word spelling errors
- Update some sentences that don’t read smoothly
👋 Thanks for contributing @jackbai233! We will review the pull request and get back to you soon. |
@microsoft-github-policy-service agree |
This PR has not seen any action for a while! Closing for now, but it can be reopened at a later date. |
Check Broken PathsWe have automatically detected the following broken relative paths in your lessons. Review and fix the paths to resolve this issue. Check the file paths and associated broken paths inside them. For more details, check our Contributing Guide.
|
Check Missing Tracking from PathsWe have automatically detected missing tracking id from the following relative paths in your lessons. Review and add tracking to paths to resolve this issue. Check the file paths and associated paths inside them. For more details, check our Contributing Guide.
|
Check Missing Tracking from URLsWe have automatically detected missing tracking id from the following URLs in your lessons. Review and add tracking to URLs to resolve this issue. Check the file paths and associated URLs inside them. For more details, check our Contributing Guide.
|
Check Country Locale in URLsWe have automatically detected added country locale to URLs in your lessons. Review and remove country specific locale from URLs to resolve this issue. Check the file paths and associated URLs inside them. For more details, check our Contributing Guide.
|
|
||
## LLMs 如何工作? | ||
|
||
在下一章中,我们将探索不同类型的生成式 AI 模型,但现在让我们看看大型语言模型是如何工作的,重点是 OpenAI GPT(生成式预训练 Transformer)模型。 | ||
|
||
- **分词器,文本到数字**:大型语言模型接收文本作为输入并生成文本作为输出。 然而,作为统计模型,它们对数字的处理效果对比起文本序列的处理效果要好得多。 这就是为什么模型的每个输入在被核心模型使用之前都由分词器处理。 标记是一段文本——由可变数量的字符组成,因此标记器的主要任务是将输入分割成标记数组。 然后,每个令牌都映射有一个令牌索引,该索引是原始文本块的整数编码。 | ||
* **分词器,文本到数字**:大型语言模型接收文本作为输入并生成文本作为输出。 然而,作为统计模型,它们对数字的处理效果对比起文本序列的处理效果要好得多。 这就是为什么模型的每个输入在被核心模型使用之前都由分词器处理。 标记(token)是一段文本——由可变数量的字符组成,因此分词器的主要任务是将输入分割成标记数组。 然后,每个标记都映射有一个标记索引,该索引是原始文本块的整数编码。 | ||
|
||
![Example of tokenization](../../images/tokenizer-example.png?WT.mc_id=academic-105485-koreyst) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.