Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

syimyuzya · 2025-01-22T08:20:28Z

注意：本 PR 依賴 #8

完整修正廣韻字頭順序：解決了以往互聯網公開版《廣韻》音系資料（主要是韻典和 poem 廣韻字音表）長期存在的縫合問題：已知的這些資料均衍生自有女同車「廣韻檢索 JS」之資料表，其字頭來自《廣韻全字表》而釋義來自「宋本廣韻データ」。然而《廣韻全字表》底本為巾箱本（江蘇教育出版社《宋本廣韻》）而「宋本廣韻データ」底本為澤存堂本（周祖謨《廣韻校本》），兩者收字、字序、字形皆有參差，無法簡單地得到對應關係。
- src/字序表.csv 即為本次工作主要成果，其完整列出了本項目、poem 字音表、「宋本廣韻データ」、韻典網四種資料的字頭順序對應關係
加回 poem 字音表所缺之字：包括澤存堂本失收之字，以及因未入U導致 poem 表未收之字。
額外修正一些字頭、釋義：為整理字序時所發現的，仍在進一步收集中。

TODO:

四個偽字小韻（597 𤜼、646 𡰝、2021 㶒、3373 𣅝）之字亦標注「當刪」
「字頭當刪」一欄似乎大材小用，感覺字頭上可附注的信息不止於此（如據《形聲考》給出個別字的更可靠反切字音等）

Issues to be resolved: - Differences caused only by character variants - 反切s appearing in 釋義 are not in their original form, but with poem's corrections applied.

- 反切 now includes annotations for representing original and corrected forms - 音韻地位 updated accordingly - Every 小韻 now has a 音韻地位 TBD: - Fix 反切 in 釋義 and 釋義補充 - Add missing checks back

TBD: - Check 反切

滂三A支平 → 滂三A脂平

untunt · 2025-01-23T03:35:47Z

nk2028 调整过小韵地位或反切的小韵中，《广韵形声考》对字头作了修正的如下：

141 𤿎<𢻹>
646 𡰝<𡰖>（忽略切韵新韵图「𡰢_{匣合四齊平}之訛字」，无需和「𡰢」关联）
2991a 𣢝<欦>

这些应该收录。99 鈹小韵下「𤿎」也应一起改为「𢻹」

《广韵形声考》中其他对字头的修正由于不涉及 nk2028 对小韵的调整，可以暂缓收录

This is for ease of comparison with the new data

TBD: - Apply patches to _poem_'s data - Add rows missing in _poem_'s data

- All entries are reordered according to 澤存堂本 - This solves a long-standing issue with both 韻典 and 廣韻字音表's data: Both data tables combine 字頭 from 廣韻全字表 and 釋義 from 宋本廣韻データ. However 廣韻全字表 is based on 巾箱本 while 宋本廣韻データ is based on 澤存堂本, which creates mismatches. - Entries missing in poem's data are added back - This includes characters only representable with IDS, and seveval additions from 廣韻校本 - More errors in 字頭 & 釋義 are corrected - These were discovered when the new 字序表 was being made, and are still WIP.

- Marked 4 invalid 小韻s (597 𤜼, 646 𡰝, 2021 㶒, 3373 𣅝) - Added more checks on patches.csv

New corrections: - 141 𤿎<𢻹> - 646 𡰝<𡰖> (小韻 no longer invalid) - 2991 (2991a) 𣢝<欦>

Was CRLF, should be LF.

ytenx_流水序 is the 序號 on the character's page (or in the page's URL after `/kyonh/dzih/`).

The columns of 字序表.csv has changed, so the load (and check) script in build.py should be updated accordingly.

syimyuzya added 6 commits December 26, 2024 20:38

Add 反切原貌

326f8e1

Issues to be resolved: - Differences caused only by character variants - 反切s appearing in 釋義 are not in their original form, but with poem's corrections applied.

Fix check.py to reflect format changes

7295b8c

Fix 釋義補充 for 菱 & 䔖 (again)

858d35b

wip: use new 小韻 data for 反切 & 音韻地位

270bd76

- 反切 now includes annotations for representing original and corrected forms - 音韻地位 updated accordingly - Every 小韻 now has a 音韻地位 TBD: - Fix 反切 in 釋義 and 釋義補充 - Add missing checks back

Restore original 反切 in 釋義

be4a0ea

TBD: - Check 反切

Check 反切 & 釋義

6c573c2

syimyuzya linked an issue Jan 22, 2025 that may be closed by this pull request

補充《廣韻》中的未編碼字頭 #9

Open

syimyuzya self-assigned this Jan 22, 2025

Correct 小韻 141 𤿎: 匹支(之)切

3e58c1a

滂三A支平 → 滂三A脂平

syimyuzya added 7 commits January 23, 2025 11:46

wip: change 釋義補充 to 釋義參照

78e8a51

This is for ease of comparison with the new data

wip: reorder 廣韻 rows with the new 字序表

416810c

TBD: - Apply patches to _poem_'s data - Add rows missing in _poem_'s data

wip: add PUA fixes in patches.csv

78bcad1

wip: include patches to poem's data

bf6c1c5

Add some notes from 形聲考 in patches.csv

721255f

Improve patches

45e7e2d

- Marked 4 invalid 小韻s (597 𤜼, 646 𡰝, 2021 㶒, 3373 𣅝) - Added more checks on patches.csv

syimyuzya force-pushed the feat-order-and-corrections branch from fc24f64 to 45e7e2d Compare January 23, 2025 03:50

syimyuzya added 6 commits January 23, 2025 14:37

Fix inconsistencies in 字序表 & add ccorrections

1d0df2f

New corrections: - 141 𤿎<𢻹> - 646 𡰝<𡰖> (小韻 no longer invalid) - 2991 (2991a) 𣢝<欦>

Fix: newline in 字序表.csv

956f499

Was CRLF, should be LF.

Check format of 校正字頭 in patches.csv

f318746

Add 韻典 to 字序表.csv

3a2166a

ytenx_流水序 is the 序號 on the character's page (or in the page's URL after `/kyonh/dzih/`).

Fix build script

0f3b5fa

The columns of 字序表.csv has changed, so the load (and check) script in build.py should be updated accordingly.

Mention about 韻典's data version

0a19cce

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

syimyuzya commented Jan 22, 2025 •

edited

Loading

untunt commented Jan 23, 2025

Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

Are you sure you want to change the base?

Reorder 廣韻字頭 and correct some 字 & 釋義 (WIP) #10

Conversation

syimyuzya commented Jan 22, 2025 • edited Loading

untunt commented Jan 23, 2025

syimyuzya commented Jan 22, 2025 •

edited

Loading