-
Notifications
You must be signed in to change notification settings - Fork 135
Multi Language
Program-y now supports Chinese characters, format and layout as to the best of my knowledge of the language. As anyone will be aware, Chinese is makeup of a sequence of pictorial words, from what I understand this has limited spacing apart from when it includes anglicized words and punctuation.
To handle this different view of the words, the input text ( that entered by a user ), then pattern text ( that between the <pattern>
tags ) and the template text ( that between then <template>
tags ) needed to be split and merged differently
- Input text needs to be split at each Chinese symbol word boundary, so that 你也好 becomes 你 也 好
- Pattern text, again needs to be split along the same lines to be matched with input text
- Template text needs to be merged so that 你 也 好 becomes 你也好
With all of the above, the combination of english and chinese needs to be handled correct.
As such new features have been added to the platform
- New internal language options controlled by the
language
config option - New Pre processor to split input words
- New Post processor to merge back output words
Adding the following config to the brain section of your config.yaml, ensures that template text is split into individual pictorials
brain:
language:
chinese: true
Add the SplitChinesePreProcessor to the preprocessor.conf file as follows, ensures input text is split correctly. This should be the last pre processor in the file
programy.processors.pre.normalize.NormalizePreProcessor
programy.processors.pre.removepunctuation.RemovePunctuationPreProcessor
programy.processors.pre.splitchinese.SplitChinesePreProcessor
Adding the MergeChinesePostProcessor to the postprocessor.conf file ensures output is formatted correctly This should be the first line of the post processor file
programy.processors.post.mergechinese.MergeChinesePostProcessor
programy.processors.post.denormalize.DenormalizePostProcessor
programy.processors.post.formatpunctuation.FormatPunctuationProcessor
programy.processors.post.formatnumbers.FormatNumbersPostProcessor
programy.processors.post.multispaces.RemoveMultiSpacePostProcessor
programy.processors.post.removehtml.RemoveHTMLPostProcessor
programy.processors.post.consoleformat.ConsoleFormatPostProcessor
Unfortunately, I do not speak any dialect of Chinese and therefore this work is based on the awesome feedback and input from Program-Y users. Please feel free to continue to feedback and help improve multi language support. If I have missed anything or got anything majorly wrong, firstly I apologise if it offends your native tongue, and secondly please let me know what rule has been broken and I'll fix as soon as I can
Email: [email protected] | Twitter: @keiffster | Facebook: keith.sterling | LinkedIn: keithsterling | My Blog
- Home
- Background
- Guiding Principles
- Reporting an Issue
- Installation
- You And Your Bot
- Bots
- Clients
- Configuration
- AIML
- Sentence Splitting
- Natural Langauge Processing
- Normalization
- Spelling
- Sentiment Analysis
- Translation
- Security
- Hot Reload
- Logging
- Out of Band
- Multi Language
- RDF Support
- Rich Media
- Asynchronous Events
- Triggers
- External Services
- Dynamic Sets, Maps & Vars
- Extensions
- Pre & Post Processors
- Custom Nodes
- The Brain Tree
- Utilities
- Building It Yourself
- Creating Your Own Bot
- Contributing
- Performance Testing
- FAQ
- History
- Website