zhNewsCrawler

中文新聞(語料) 爬蟲

現行中文的語料就屬wiki上最完整，無奈欠缺了新聞等即時性的資訊，因此照造了這個爬蟲：可在每天的指定時間上，自動爬取Google News所蒐到的中文新聞頭條
Wikipedia is now the most sound and complete Mandarin corpus now on the Internet. However, it lacks the information of trendy topics. As a matter of fact, this daily cralwer is built to fetch news from the headlines of Google News.

Important Notice:

I do not own the file "dict.txt.big". It's a great work from jieba team, "fxsjy" et al.

Envoronments:

In python 2.7 with "requests", "BeutifulSoup", "jieba" and "progressbar" installed.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
LICENSE.md		LICENSE.md
README.md		README.md
auto_main.py		auto_main.py
crawler.py		crawler.py
cut.py		cut.py
dict.txt.big		dict.txt.big
wiki_cut.py		wiki_cut.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

zhNewsCrawler

中文新聞(語料) 爬蟲

Important Notice:

Envoronments:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

AngusKung/zhNewsCrawler

Folders and files

Latest commit

History

Repository files navigation

zhNewsCrawler

中文 新聞(語料) 爬蟲

Important Notice:

Envoronments:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

中文新聞(語料) 爬蟲

Packages