-
Notifications
You must be signed in to change notification settings - Fork 0
Home
paupowpow edited this page Nov 30, 2018
·
1 revision
assumptions:
- we know the language of the text at hand: english or german
- we have a set of common word endings in english and german, e.g. ["heit", "keit", "ung"] ["ery", "cation", "ed"]
- separate into words, i.e. take out spaces and special characters
- from the remaining character, make random splits into:
- 20% 1-character units
- 50% 2-character units
- 30% 3-character units