Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

7.30——开源项目pycorrector分析(二) #14

Open
li-aolong opened this issue Jul 30, 2019 · 0 comments
Open

7.30——开源项目pycorrector分析(二) #14

li-aolong opened this issue Jul 30, 2019 · 0 comments
Labels
GEC 语法错误纠正(Grammatical Error Correction) NLP 自然语言处理(Natural Language Processing) 开源项目 开源的项目

Comments

@li-aolong
Copy link
Owner

代码较多,看起来比较耗费时间

基于深度模型的方法——rnn_lm

  1. 数据预处理
    • config.py文件提供了初始化配置
    • 使用preprocess.py文件进行处理,对CGED三年的比赛数据进行解析,将字符分割并保存
  2. 模型设计
    • rnn_lm_model文件构造单元函数可选:rnngrulstm
    • 每个rnn单元默认有128个节点,共2层
  3. 训练
    • data_reader.py文件将文本的字符转换成固定的序号,并保存字符-序号对照表
    • train.py文件生成小批量数据,如batch_size大小为128
    • 训练数据,并保存模型
  4. infer.pylm.py文件使用训练好的模型进行测试
@li-aolong li-aolong added GEC 语法错误纠正(Grammatical Error Correction) NLP 自然语言处理(Natural Language Processing) 开源项目 开源的项目 labels Jul 30, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GEC 语法错误纠正(Grammatical Error Correction) NLP 自然语言处理(Natural Language Processing) 开源项目 开源的项目
Projects
None yet
Development

No branches or pull requests

1 participant