Data set downloaded from http://sighan.cs.uchicago.edu/bakeoff2005/
- cws_maxent.ipynb: Word segmentation implemented using Maximum Entropy model
- cws_rnn.ipynb: Word segmentation implemented using RNN with bi-directional LSTM
- cws.ipynb: Cleaned up notebook
Model Accuracy Comparison
Training Set | Test Set | Accuracy - MaxEnt | Accuracy - RNN |
---|---|---|---|
PKU | PKU | 0.94 | 0.91 |
MSR | MSR | 0.92 | 0.96 |
PKU | MSR | 0.86 | 0.86 |
MSR | PKU | 0.88 | 0.84 |
PKU+MSR | PKU+MSR | 0.91 | 0.91 |