Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Liblouis table for Dutch #6

Open
6 of 7 tasks
Tracked by #39
bertfrees opened this issue Mar 16, 2015 · 29 comments
Open
6 of 7 tasks
Tracked by #39

Liblouis table for Dutch #6

bertfrees opened this issue Mar 16, 2015 · 29 comments
Assignees
Milestone

Comments

@bertfrees
Copy link
Member

  • Include some official braille code specification document
  • Write tests based on the specification
  • Validate the existing braille table for Dutch. There is currently a table for the Netherlands and a table for Belgium which are slightly different (and should ideally be merged, see Common table for Dutch liblouis/liblouis#34)
  • Improve the existing table or write a new one from scratch

Related issues:

@bertfrees
Copy link
Member Author

@dkager See branch "dkager_dutch"

@dkager
Copy link

dkager commented May 19, 2015

Thanks. Should we update nl-BE or do a new set of tests for nl-NL? Would be nice to keep them in sync, or to have just one set of tests because these tables are almost identical.
Also, I think task 1 can be marked as completed: we have the 2005 specification of the Dutch braille code (is it online somewhere?).

@bertfrees
Copy link
Member Author

Try extending nl_BE. If all the tests in there are valid for the Netherlands (Dedicon) rename it to nl. Otherwise make a new test file for nl_NL and we can get them in sync again later.

@dkager
Copy link

dkager commented May 19, 2015

I've reorganized the tests somewhat and added examples from the official documentation. There still isn't a huge lot of data. I'm not sure what else can be added, these tests cover most of the corner cases.

I still have to actually run these, but one omission I can spot right away is the special braille notation for minutes and seconds, i.e.:
The world record time is 3' 5''.
Here you don't use an apostrophe. Instead you write:
⠼⠉⠈⠔ ⠼⠑⠈⠔⠔
I don't think the nl-BE (or nl-NL) table defines this either. It may not be possible to figure this out from the context, because normal apostrophes are used.

@bertfrees
Copy link
Member Author

These are things that liblouis has difficulty with, because it knows little about the context, except for the actual text you give it (and typeform info). You could make a rule that finds occurrences of the pattern "digit-digit-apos-digit-digit-apos-apos" and translate it in a special way. For such simple heuristics this approach works, but if you need to do advanced text analysis, or if you want to mark up minutes and seconds in XML, you need something besides liblouis.

@dkager
Copy link

dkager commented May 19, 2015

I don't think we get a lot of books where this shows up, so not very high priority just yet.
One more thing that is missing is tests for emphasized text, etc. Will add that and test the whole thing on Thursday.

@bertfrees
Copy link
Member Author

The new data looks very good. It's not a problem if it's not a lot, if it covers everything it covers everything.

I noticed you deleted some tests. In almost all cases you replaced them with an equivalent test but in one or two cases you didn't (e.g. "SLAAT DE VLAM IN DE PAN..."; I know it's kind of the same as "MEER DAN DRIE WOORDEN..." but not completely because the last uppercase word doesn't end with a period). I want to avoid touching other people's tests as much as possible so if there's no good reason to delete a test I think we should leave it.

I'm gonna add back the ascii-output because it's useful for readability. I can do that with a script.

@dkager
Copy link

dkager commented May 19, 2015

Can you also re-add the all-capitals sentence?

I'm not a big fan of the ascii-input because it's redundant and to me (as a braille reader) the symbols don't make much sense. Could you wait with the ascii-input until I've finished the test data? It makes editing somewhat easier for me as I don't know the ASCII code liblouis uses. Also, I noticed that other harness tests (fi_harness.txt) don't have ASCII?
The Unicode output specification is proving to be very nice to work with, because I can verify it straight away on my braille display! :)

@bertfrees
Copy link
Member Author

I know, that's why I added the Unicode alongside with the ASCII. The Unicode is used by the test harness, the ASCII is for making it readable for sighted people (documentation purpose). I'll wait until you've finished though, then I only have to run the conversion script once.

@bertfrees
Copy link
Member Author

@dkager In commit 1a92520 I've added xfail flags to all failing tests.

@dkager
Copy link

dkager commented May 21, 2015

"xfail": "Fails because special handling of more than three capitalized words is not supported by liblouis",

So we need support for this that is similar to how emphasized text is handled?

I'll look at the Greek capital letters. They are used in maths so it is important that they work.

@bertfrees
Copy link
Member Author

So we need support for this that is similar to how emphasized text is handled?

Yes, that support is already implemented by Michael Gray, but his patch isn't merged into master yet. See liblouis#50.

I'll look at the Greek capital letters. They are used in maths so it is important that they work.

I think the nl-NL table does Greek capitals different then the nl-BE table.

@dkager
Copy link

dkager commented May 21, 2015

Forgot to bring this up re: commit 1a92520. Ignoring unknown keys in the JSON file sounds like a good idea, but I think it has to be an option. Right now if you want to "test the tests" for syntax errors you can do so. If you ignore unknown keys, you can't. So maybe add an option to the harness runner so that we can have both output and ascii-output keys.

@bertfrees
Copy link
Member Author

Okay sounds reasonable. Maybe call it --strict (i.e. make "ignore" the default, because we want to be able to automatically run all the tests as they are).

@dkager
Copy link

dkager commented May 21, 2015

Not entirely sure why it's failing in the first place since no explicit format validation is performed except for checking the input is valid JSON. Something to check next week. :)

@bertfrees
Copy link
Member Author

The validation happens implicitly when constructing a BrailleTest instance. The solution was to add **kwargs to the constructor (see 0ea22d6#diff-f1bb1dc1db8387a1553e444ef5b0da82R123)

@dkager
Copy link

dkager commented May 26, 2015

The nl-NL table has a few issues that should be easy to fix:

  • Greek letters (can probably take those from nl-BE).
  • For the percent sign some bogus whitespace is added.
  • The ² and ³ symbols need a / added to them.

Furthermore, the emphasis tests are failing. For one of them it looks like expected == received, yet the test fails with a braille difference error. I hope I'm not overlooking something!
Exp: ⠨⠓⠑⠃ ⠚⠑ ⠸⠸⠨⠧⠇⠥⠉⠓⠞ ⠇⠁⠝⠛⠎ ⠙⠑⠀⠸⠨⠁⠝⠁⠏⠕⠑⠗⠀⠁⠇ ⠛⠑⠇⠑⠵⠑⠝⠢
Rec: ⠨⠓⠑⠃ ⠚⠑ ⠸⠸⠨⠧⠇⠥⠉⠓⠞ ⠇⠁⠝⠛⠎ ⠙⠑ ⠸⠨⠁⠝⠁⠏⠕⠑⠗ ⠁⠇ ⠛⠑⠇⠑⠵⠑⠝⠢

There are also some accented letters that aren't in the standard. Need to figure out what to do with those. See eb4ef4d.

What's the best way to proceed after I fixed nl-NL? Rename the test harness to nl-NL-g1_harness.txt and associate it with the nl-NL table?

@bertfrees
Copy link
Member Author

Don't attach too much importance to the emphasis tests. Emphasis in liblouis is currently broken. For that one test that seems to have the correct output, the expected output had some blank braille patterns instead of regular spaces. I've corrected it.

For things that are not in the standard such as those accented letters maybe you could make a separate table? Are Greek letters in the standard by the way?

Copy the table you want to edit to nl and rename the test to nl. We should probably name the table g0 instead of g1 (see liblouis#16 (comment)).

@dkager
Copy link

dkager commented May 26, 2015

Don't attach too much importance to the emphasis tests. Emphasis in liblouis is currently broken.

Should we do xfails for those?

Are Greek letters in the standard by the way?

The standard describes the first three letters in an example. This matches what nl-NL does, except for the letter beta which was missing dot 1 (will push a fix in a moment). The definitions in nl-NL are the same as in the example, but I'll check if Dedicon is maybe using something else.

We should probably name the table g0 instead of g1

It's also not really contracted (.ctb versus .utb).

Also, is the patch for the MORE THAN THREE WORDS in capitals going to be included in the next release? We now have an xfail for this, but it's important to resolve the problem to get proper Dutch braille output.

@bertfrees
Copy link
Member Author

Should we do xfails for emphasis tests: yes.

The extension should probably be changed to .utb, yes.

The patch which fixes emphasis and adds support for indicating phrases of capitalized words will not be in the June release. We've targeted September.

@dkager
Copy link

dkager commented May 26, 2015

September is okay, as long as it’s in there before delivering the system. :)

@bertfrees
Copy link
Member Author

It's in the branch https://github.com/MikeGray-APH/liblouis/commits/mrg_ueb_update, in case you already want to try it out.

@dkager
Copy link

dkager commented May 28, 2015

This is pretty much done now. What remains:

@bertfrees
Copy link
Member Author

About the FIXMEs in nl-NL-chardefs.cti: they refer to the standard "World Braille Usage (3rd edition)": https://cdn.rawgit.com/liblouis/braille-specs/master/world-braille-usage-third-edition.pdf

@bertfrees
Copy link
Member Author

By the way I put those FIXMEs in there. They're mostly about differences between NL and BE that need to be sorted out.

@dkager
Copy link

dkager commented Jun 4, 2015

There are now quite a few tests that are failing (see dkager_dutch_ueb). For next week: find out why, and fix it while not breaking the nocaps table.

@dkager
Copy link

dkager commented Jul 7, 2015

The Dutch braille standard 2005 does not specify that the plus sign doesn't cancel a capitalized word. But the Dutch liblouis table does treat it that way. There is a test: P+R park and ride

Is there any additional documentation that specifies this or is it an organization-specific choice? Maybe we should discuss this in the process of unifying the Belgium and Netherlands tables.

@bertfrees
Copy link
Member Author

Yes I probably based my implementation on the test that I got, and didn't check the documentation.

@dkager
Copy link

dkager commented Jul 30, 2015

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants