Dutch hyphenation #46

bertfrees · 2015-05-18T14:59:21Z

See snaekobbi/issues#2 for the various options for implementing a hyphenator.

bertfrees · 2015-05-18T14:59:43Z

Maybe a useful tip from CBB (Christelijke Bibliotheek voor Blinden en Slechtzienden): they use a version of hyph_nl_NL.dic from OpenTaal.

dkager · 2015-05-21T14:25:43Z

The OpenTaal data sounds promising. Will look at this next week and maybe you can fill me in on the best way to implement this in mod-braille (from what I read there is OpenOffice data available for this dict).

dkager · 2015-05-28T08:25:02Z

I'm guessing this is the hyphenation dictionary from OpenTaal.org that CBB is using. Maybe I can use the same approach as in snaekobbi/issues#2 for this?
I don't have test data yet, so integrating the dictionary into mod-braille could be done first.

bertfrees · 2015-05-28T08:48:19Z

The dictionary you linked is the one that is already included in Pipeline. I think CBB was maybe referring to an updated version. We'd have to ask them.

We need test data before we can do anything else. Then, if you need to modify the dictionary, it's best you copy the file to a new project (like Jukka did with pipeline-mod-celia) because the dictionary from LibreOffice is downloaded and packaged automatically.

dkager · 2015-05-28T08:55:36Z

I believe the OpenTaal data dates from 2011, but I'll see if I can confirm this with someone from CBB.
What sort of test data are you looking for?

bertfrees · 2015-05-28T09:51:10Z

Hyphenated words I guess. I understand you may not have that kind of data just lying around. But if there's nothing to test then our job is done. Then we just take what's currently available. I think at the minimum we should have a small test, if only so we can easily add more to it later. Jukka's test data is also very limited, but it's easy to add more. He did it in pipeline-mod-celia because that's were his dictionary lives, but we could have your tests in functional-testing.

dkager · 2015-06-02T06:58:05Z

So if I understand this correctly, we have:

The hyphenation dict in mod-braille.
Generic code to use this dict also in mod-braille.

And we need:

Test data in functional-testing.

For Finnish the test data is in the JUnit test case. I could clone this into another module, but think it would be a bit nicer to have something similar to liblouis' harness tests for this. I.e. experts only worry about JSON or some other format and the JUnit tests pull these in and run them.

Also, which of the three libs (Libhyphen, Hyphenator, TexHyphenator) should we use?

bertfrees · 2015-06-02T09:46:55Z

I suggest we use XML instead of JSON. Something like this. If everybody includes test data in that format in the functional-testing repo, then I can have one test (JUnit or XSpec) that runs them all. Of course from the point of view of the developer it is nice to have to tests closer to the implementation, but since you don't intend to modify the dictionary yet for the time being, that's not a problem. Later we can still copy/move the test to its own module.

Which of the libraries we should use is not so important I think. What I've done with Finnish is I convert the patterns into several formats at build time so that several implementations become available in DP2. As long as all implementations behave the same (which they should in theory, and we easily can test each of them with the same test data) we don't have to worry about which one is actually used.

bertfrees assigned dkager May 18, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dutch hyphenation #46

Dutch hyphenation #46

bertfrees commented May 18, 2015

bertfrees commented May 18, 2015

dkager commented May 21, 2015

dkager commented May 28, 2015

bertfrees commented May 28, 2015

dkager commented May 28, 2015

bertfrees commented May 28, 2015

dkager commented Jun 2, 2015

bertfrees commented Jun 2, 2015

Dutch hyphenation #46

Dutch hyphenation #46

Comments

bertfrees commented May 18, 2015

bertfrees commented May 18, 2015

dkager commented May 21, 2015

dkager commented May 28, 2015

bertfrees commented May 28, 2015

dkager commented May 28, 2015

bertfrees commented May 28, 2015

dkager commented Jun 2, 2015

bertfrees commented Jun 2, 2015