Skip to content

Latest commit

 

History

History
10 lines (10 loc) · 502 Bytes

README.md

File metadata and controls

10 lines (10 loc) · 502 Bytes

New-word-discovery

Chinese new word discovery

This is a simple project to find Chinese new word. Inspired by Matrix67 and ChineseWordSegmentation, wirtten with python.

Usage

  • Place the corpus in the root directory of the project.
  • All hyperparameters are configured in the "./config.py"

Involved algorithm

  • PMI
  • Left and right entropy
  • Position-Word Probability (word-initial,suffix)