Golang 简体繁体中文互转
gocc is a golang port of OpenCC(Open Chinese Convert 開放中文轉換) which is a project for conversion between Traditional and Simplified Chinese developed by BYVoid.
gocc stands for "Golang version OpenCC", it is a total rewrite version of OpenCC in Go. It just borrows the dict files and config files of OpenCC, so it may not produce the same output with the original OpenCC.
参考以下两个仓库源进行优化,并使用Go Module进行管理, 方便新项目引用, 同时进行完善更新
go get github.com/teamlint/opencc
package main
import (
"fmt"
"log"
"github.com/teamlint/opencc"
)
func main() {
// 简体转繁体
s2t, err := openc.New("s2t")
if err != nil {
log.Fatal(err)
}
in := `自然语言处理是人工智能领域中的一个重要方向。`
out, err := s2t.Convert(in)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s\n%s\n", in, out)
//自然语言处理是人工智能领域中的一个重要方向。
//自然語言處理是人工智能領域中的一個重要方向。
// 繁体转简体
t2s, err := ccgo.New("t2s")
if err != nil {
log.Fatal(err)
}
in := "閱坊-閱讀的樂趣"
out, err := t2s.Convert(str)
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s\n%s\n", in, out)
//閱坊-閱讀的樂趣
//阅坊-阅读的乐趣
}
git clone https://github.com/teamlint/opencc
cd opencc/cmd/gocc
make install
gocc --help
echo "阅坊-阅读的乐趣" | gocc
#閱坊-閱讀的樂趣
目前支持14种
s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t
- s2t ==> Simplified Chinese to Traditional Chinese 簡體到繁體
- t2s ==> Traditional Chinese to Simplified Chinese 繁體到簡體
- s2tw ==> Simplified Chinese to Traditional Chinese (Taiwan Standard) 簡體到臺灣正體
- tw2s ==> Traditional Chinese (Taiwan Standard) to Simplified Chinese 臺灣正體到簡體
- s2hk ==> Simplified Chinese to Traditional Chinese (Hong Kong variant) 簡體到香港繁體
- hk2s ==> Traditional Chinese (Hong Kong variant) to Simplified Chinese 香港繁體到簡體
- s2twp ==> Simplified Chinese to Traditional Chinese (Taiwan Standard) with Taiwanese idiom 簡體到繁體(臺灣正體標準)並轉換爲臺灣常用詞彙
- tw2sp ==> Traditional Chinese (Taiwan Standard) to Simplified Chinese with Mainland Chinese idiom 繁體(臺灣正體標準)到簡體並轉換爲中國大陸常用詞彙
- t2tw ==> Traditional Chinese (OpenCC Standard) to Taiwan Standard 繁體(OpenCC 標準)到臺灣正體
- hk2t ==> Traditional Chinese (Hong Kong variant) to Traditional Chinese 香港繁體到繁體(OpenCC 標準)
- t2hk ==> Traditional Chinese (OpenCC Standard) to Hong Kong variant 繁體(OpenCC 標準)到香港繁體
- t2jp ==> Traditional Chinese Characters (Kyūjitai) to New Japanese Kanji (Shinjitai) 繁體(OpenCC 標準,舊字體)到日文新字體
- jp2t ==> New Japanese Kanji (Shinjitai) to Traditional Chinese Characters (Kyūjitai) 日文新字體到繁體(OpenCC 標準,舊字體)
- tw2t ==> Traditional Chinese (Taiwan standard) to Traditional Chinese 臺灣正體到繁體(OpenCC 標準)
获取最新OpenCC代码
使用 OpenCC/data/config/*.json
和 OpenCC/data/dictionary/*.txt
替换本包的 config/*.json
和 dictionary/*.txt
相关文件
OpenCC/data/config/*.json
文件中 默认匹配的是.ocd2
文件("type": "ocd2", "file": "TSPhrases.ocd2")
,全部替换为txt
即可
以下文件由部分词典文件进一步操作产生, 需要手动处理或使用 OpenCC 脚本处理
HKVariantsRev.txt
由HKVariants.txt
反转列产生JPVariantsRev.txt
由JPVariants.txt
反转列产生TWPhrases.txt
由TWPhrasesIT.txt
TWPhrasesName.txt
TWPhrasesOther.txt
合并产生TWPhrasesRev.txt
由TWPhrases.txt
反转列产生TWVariantsRev.txt
由TWVariants.txt
反转列产生
如果有新添加的语言, 修改 opencc.go
文件中 supportedConversions
conversions
值, 同时增加相关词典文件即可:
supportedConversions = "s2t, t2s, s2tw, tw2s, s2hk, hk2s, s2twp, tw2sp, t2tw, hk2t, t2hk, t2jp, jp2t, tw2t"
......
conversions = map[string]struct{}{
"s2t": {},
"t2s": {},
"s2tw": {},
"tw2s": {},
"s2hk": {},
"hk2s": {},
"s2twp": {},
"tw2sp": {},
"t2tw": {},
"hk2t": {},
"t2hk": {},
"t2jp": {},
"jp2t": {},
"tw2t": {},
}