Japanese Mora-Based Preprocessing Parser

This project implements a preprocessing tool that converts Japanese sentences into space-separated mora (or pseudo-mora) units. It is designed for tasks such as phoneme alignment, lip-sync animation, and speech synthesis where fine-grained control over syllable-like units is needed. *The conversion is customized for my objectives.

Example

Input: コンピュータゲームのメーカーや、業界団体などに関連する人物のカテゴリ。

Output: コンピュタゲムノメカヤギョウカイダンタイナドニカンレンスルジンブツノカテゴリ

Output2: k on p i u t a g e m u n o m e k a i a g y o u k a i d an t a i n a d o n i k an ren s u ru jin b u ts u n o k a t e g o ri

Features

Parses Japanese text into units approximating morae (音拍)
Strips punctuation and unwanted symbols
Converts complex katakana combinations into component moras (e.g., コンピュータ → コンピュタ)
Useful for:
- Lip-sync animation
- Phoneme-level TTS
- Language modeling
- Forced alignment preprocessing

Implementation Details

Language: Python 3.8+
Dependencies: pykakasi
Conversion based on dictionary table

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
jp_phone_dict		jp_phone_dict
textgrid		textgrid
README.md		README.md
jp_tokenizer.py		jp_tokenizer.py
pre_process3.py		pre_process3.py
preprocess1_parse_files.py		preprocess1_parse_files.py
preprocess1_rename_files.py		preprocess1_rename_files.py
preprocess2_move_files.py		preprocess2_move_files.py
preprocess3_convert_phrases.py		preprocess3_convert_phrases.py
preprocess4_txt_textgrid.py		preprocess4_txt_textgrid.py
preprocess_error_parser.py		preprocess_error_parser.py
tempCodeRunnerFile.py		tempCodeRunnerFile.py
teste.py		teste.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Japanese Mora-Based Preprocessing Parser

Example

Features

Implementation Details

About

Uh oh!

Releases

Packages

Languages

feritiro/Japanese_Mora-Based_Preprocessing_Parser

Folders and files

Latest commit

History

Repository files navigation

Japanese Mora-Based Preprocessing Parser

Example

Features

Implementation Details

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages