facilitating phonologization and other text-based analyses of transcriptions that have been imported #471
alecristia
started this conversation in
Ideas
Replies: 1 comment 1 reply
-
@LoannPeurey @alix-bourree @w9k2a4i (that's Kai Jia) - this is the idea I put forward when we met, that would help Kai Jia easily analyze all of her remaining datasets (which we have in childproject format already) |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Kai Jia has been working with a set of bash then python then perl messy scripts that allow a person to go from an orthographic representation in CHAT style, like:
to a clean version with just the stuff that was said:
to a phonologized version of that (fake-phonologized in this case):
to a representation in terms of consonants and vowels:
to a "lexicon" with tokens & types of word shapes:
This is all done with regular expressions, many of which are constant across languages (CHAT -> clean) and others that are specific (clean -> phonologized).
I wonder to what extent we could incorporate some of these as tools that build on child-project standards, so it's easier to process any .cha files that have already been imported.
@alix-bourree, Kai Jia aims to re-use the ACLEW transcripts to this end soon, so your feedback will be welcome!
Beta Was this translation helpful? Give feedback.
All reactions