Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

Draft
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

syimyuzya
Copy link
Member

@syimyuzya syimyuzya commented Jan 22, 2025

注意:本 PR 依賴 #8

  • 完整修正廣韻字頭順序:解決了以往互聯網公開版《廣韻》音系資料(主要是韻典和 poem 廣韻字音表)長期存在的縫合問題:已知的這些資料均衍生自有女同車「廣韻檢索 JS」之資料表,其字頭來自《廣韻全字表》而釋義來自「宋本廣韻データ」。然而《廣韻全字表》底本為巾箱本(江蘇教育出版社《宋本廣韻》)而「宋本廣韻データ」底本為澤存堂本(周祖謨《廣韻校本》),兩者收字、字序、字形皆有參差,無法簡單地得到對應關係。
    • src/字序表.csv 即為本次工作主要成果,其完整列出了本項目、poem 字音表、「宋本廣韻データ」、韻典網四種資料的字頭順序對應關係
  • 加回 poem 字音表所缺之字:包括澤存堂本失收之字,以及因未入U導致 poem 表未收之字。
  • 額外修正一些字頭、釋義:為整理字序時所發現的,仍在進一步收集中。

TODO:

  • 四個偽字小韻(597 𤜼、646 𡰝、2021 㶒、3373 𣅝)之字亦標注「當刪」
  • 「字頭當刪」一欄似乎大材小用,感覺字頭上可附注的信息不止於此(如據《形聲考》給出個別字的更可靠反切字音等)

Issues to be resolved:
- Differences caused only by character variants
- 反切s appearing in 釋義 are not in their original form, but with
  poem's corrections applied.
- 反切 now includes annotations for representing original and corrected
  forms
- 音韻地位 updated accordingly
  - Every 小韻 now has a 音韻地位

TBD:
- Fix 反切 in 釋義 and 釋義補充
- Add missing checks back
@syimyuzya syimyuzya linked an issue Jan 22, 2025 that may be closed by this pull request
@syimyuzya syimyuzya self-assigned this Jan 22, 2025
滂三A支平 → 滂三A脂平
@untunt
Copy link
Member

untunt commented Jan 23, 2025

nk2028 调整过小韵地位或反切的小韵中,《广韵形声考》对字头作了修正的如下:

  • 141 𤿎<𢻹>
  • 646 𡰝<𡰖>(忽略切韵新韵图「𡰢匣合四齊平之訛字」,无需和「𡰢」关联)
  • 2991a 𣢝<欦>

这些应该收录。99 鈹小韵下「𤿎」也应一起改为「𢻹」

《广韵形声考》中其他对字头的修正由于不涉及 nk2028 对小韵的调整,可以暂缓收录

This is for ease of comparison with the new data
TBD:
- Apply patches to _poem_'s data
- Add rows missing in _poem_'s data
- All entries are reordered according to 澤存堂本
  - This solves a long-standing issue with both 韻典 and 廣韻字音表's
    data: Both data tables combine 字頭 from 廣韻全字表 and 釋義 from
    宋本廣韻データ. However 廣韻全字表 is based on 巾箱本 while
    宋本廣韻データ is based on 澤存堂本, which creates mismatches.

- Entries missing in poem's data are added back
  - This includes characters only representable with IDS, and seveval
    additions from 廣韻校本

- More errors in 字頭 & 釋義 are corrected
  - These were discovered when the new 字序表 was being made, and are
    still WIP.
- Marked 4 invalid 小韻s (597 𤜼, 646 𡰝, 2021 㶒, 3373 𣅝)
- Added more checks on patches.csv
@syimyuzya syimyuzya force-pushed the feat-order-and-corrections branch from fc24f64 to 45e7e2d Compare January 23, 2025 03:50
New corrections:
- 141 𤿎<𢻹>
- 646 𡰝<𡰖> (小韻 no longer invalid)
- 2991 (2991a) 𣢝<欦>
Was CRLF, should be LF.
ytenx_流水序 is the 序號 on the character's page (or in the page's URL
after `/kyonh/dzih/`).
The columns of 字序表.csv has changed, so the load (and check) script in
build.py should be updated accordingly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

補充《廣韻》中的未編碼字頭
2 participants