Skip to content

Latest commit

 

History

History
51 lines (43 loc) · 1.47 KB

README.md

File metadata and controls

51 lines (43 loc) · 1.47 KB

Tatomecab

A wrapper around mecab for the Tatoeba project.

Tatomecab is of a set of tools to provide Japanese sentences with furiganas.

tatomecab.py

A library that wraps Mecab and add some more features (like parsing markers set by warifuri). It can also be used as a command line to do quick testing like mecab:

$ echo 振り仮名をつけろう | ./tatomecab.py
振	ふ
り	None
仮	が
名	な
を	None
つけろ	None
う	None

webserver.py

Exposes the tatomecab library as a webservice.

$ curl http://127.0.0.1:8842/furigana -G --data-urlencode str=振り仮名をつけろう
# Actual URL is http://127.0.0.1:8842/furigana?str=%E6%8C%AF%E3%82%8A%E4%BB%AE%E5%90%8D%E3%82%92%E3%81%A4%E3%81%91%E3%82%8D%E3%81%86
<?xml version="1.0" encoding="UTF-8"?>
<root>
<parse>
<token>
  <reading furigana=""><![CDATA[]]></reading>
  <![CDATA[]]>
  <reading furigana=""><![CDATA[]]></reading>
  <reading furigana=""><![CDATA[]]></reading>
</token>
<token><![CDATA[]]></token>
<token><![CDATA[つけろ]]></token>
<token><![CDATA[]]></token>
</parse>
</root>

Warifuri

Warifuri is a script that edits mecab dictionary to insert markers in the reading field so that furigana(s) are mapped to the character(s) they belong to, enabling proper mono ruby and group ruby.