Skip to content

Multi Language

Keith Sterling edited this page Jan 19, 2018 · 3 revisions

Chinese Support

Program-y now supports Chinese characters, format and layout as to the best of my knowledge of the language. As anyone will be aware, Chinese is makeup of a sequence of pictorial words, from what I understand this has limited spacing apart from when it includes anglicized words and punctuation.

To handle this different view of the words, the input text ( that entered by a user ), then pattern text ( that between the <pattern> tags ) and the template text ( that between then <template> tags ) needed to be split and merged differently

  • Input text needs to be split at each Chinese symbol word boundary, so that 你也好 becomes 你 也 好
  • Pattern text, again needs to be split along the same lines to be matched with input text
  • Template text needs to be merged so that 你 也 好 becomes 你也好

With all of the above, the combination of english and chinese needs to be handled correct.

As such new features have been added to the platform

  • New internal language options controlled by the language config option
  • New Pre processor to split input words
  • New Post processor to merge back output words

Config Setting

Adding the following config to the brain section of your config.yaml, ensures that template text is split into individual pictorials

brain:

    language:
      chinese: true

Pre Processor

Add the SplitChinesePreProcessor to the preprocessor.conf file as follows, ensures input text is split correctly. This should be the last pre processor in the file

programy.processors.pre.normalize.NormalizePreProcessor
programy.processors.pre.removepunctuation.RemovePunctuationPreProcessor
programy.processors.pre.splitchinese.SplitChinesePreProcessor

Post Processor

Adding the MergeChinesePostProcessor to the postprocessor.conf file ensures output is formatted correctly This should be the first line of the post processor file

programy.processors.post.mergechinese.MergeChinesePostProcessor
programy.processors.post.denormalize.DenormalizePostProcessor
programy.processors.post.formatpunctuation.FormatPunctuationProcessor
programy.processors.post.formatnumbers.FormatNumbersPostProcessor
programy.processors.post.multispaces.RemoveMultiSpacePostProcessor
programy.processors.post.removehtml.RemoveHTMLPostProcessor
programy.processors.post.consoleformat.ConsoleFormatPostProcessor

WARNING

Unfortunately, I do not speak any dialect of Chinese and therefore this work is based on the awesome feedback and input from Program-Y users. Please feel free to continue to feedback and help improve multi language support. If I have missed anything or got anything majorly wrong, firstly I apologise if it offends your native tongue, and secondly please let me know what rule has been broken and I'll fix as soon as I can

Clone this wiki locally