Named entity recogtinion model made by fine-tuning sonoisa/t5-base-japanese
TEXT file. The default text is "伊藤左千夫は1893年から知人から学んだ短歌を詠むようになったが、当初は古今和歌集の流れをくむ月並調の伝統的な短歌を詠んでいた。"
Dictionaries of the recognized named entities. Span indicates the start and end of the named entity in the original sentence, and type indicates the category of the named entity. This model was trained to classify entities into one of the following categories: {人名, 法人名, 政治的組織名, その他の組織名, 地名, 施設名, 製品名, イベント名}. Finally, the 'text' contains the text of the named entity.
[{'span': [0, 5], 'type': '人名', 'text': '伊藤左千夫'}, {'span': [36, 41], 'type': '製品名', 'text': '古今和歌集'}]
An Internet connection is required when running the script for the first time, as the model files will be downloaded automatically.
Predicted named entities in the input text file will be automatically generated by running the script below.
Running this script in FP16 environments will result in an error due to the range of the floating point expression. Switch to using CPU if necessary. (This is done by setting the argument -e
to 0 in the example below)
$ python3 t5_base_japanese_ner.py -f input.txt
Here is how to use the -i
(or --input
) argument instead.
$ python3 t5_base_japanese_ner.py -i 2008年10月5日、アウェーでのレクレアティーボ・ウェルバ戦でプリメーラ・ディビシオンでの初得点を決めた。
By using the --savepath
option, the pickle of the list will be saved to the specified path.
$ python3 t5_base_japanese_ner.py -f input.txt -s result.pickle
PyTorch
ONNX opset=12