Various improvements to emphasis handling #1006

bertfrees · 2020-10-16T20:09:25Z

This contains quite a lot. Best look at the commit logs to see what exactly. In short, I added a new opcode called noemphchars, improved the code readability, and harmonized the behavior of caps vs. emph on the one hand and begemph/endemph mode vs. begendword etc. on the other hand.

Fixes the following issues:

Only thing missing is a little update to the documentation.

Travis status doesn't show up anymore apparently but tests pass on Travis.

bertfrees · 2020-10-16T20:17:58Z

@josteinaj I changed some tests for Norwegian (not the table). Would you mind having a look at https://github.com/liblouis/liblouis/pull/1006/files#diff-531a3730363a4ab7232b5933f0e0a94668637586bdf712cd740307062cdc704f?

bertfrees · 2020-10-16T20:22:24Z

@garconvacher I also did a small change to the French table: see commit b4604c1. Could you verify please?

bertfrees · 2020-10-17T09:54:09Z

@dkager Some Dutch tests were fixed but one was broken. To me the new result makes sense. What do you think?

liblouis/tests/braille-specs/nl-g0_harness.yaml

Lines 207 to 209 in ab4f0fe

    
           - - Zie refertes D-NE004W, D-NB007B en D-NB007BDK. 
        
             - ⠨⠵⠊⠑ ⠗⠑⠋⠑⠗⠞⠑⠎ ⠨⠙⠤⠘⠝⠑⠼⠚⠚⠙⠨⠺⠂ ⠨⠙⠤⠘⠝⠃⠼⠚⠚⠛⠨⠃ ⠑⠝ ⠨⠙⠤⠘⠝⠃⠼⠚⠚⠛⠘⠃⠙⠅⠲ 
        
             - xfail: "'D-NE004W, D-NB007B' is now seen as 6 words (so that phrase indication is used): 'D', 'NE', 'W', 'D', 'NB' and 'B'"

bertfrees · 2020-10-18T18:04:27Z

@torchtrust I found a strange test that was added by you in 2015:

liblouis/tests/braille-specs/en-ueb-g1_harness.yaml

Line 36 in 1c7f6b6

- [Ææ, ⠠⠁⠠⠘⠖⠑⠁⠘⠖⠑]

The dots 6-1-6 at the beginning, is that correct?

josteinaj · 2020-10-23T11:20:34Z

@bertfrees By just skimming through the changes in no_typeform_harness.yaml I think it looks good, but I'll check with Kari to be sure.

garconvacher · 2020-10-26T07:51:32Z

@garconvacher I also did a small change to the French table: see commit b4604c1. Could you verify please?

@bertfrees, I can't do tests at the moment but it's match the french braille code.

bertfrees · 2020-10-30T09:32:53Z

@dkager What do you think about the "Zie refertes D-NE004W, D-NB007B en D-NB007BDK." test? "D-NE004W, D-NB007B" is now seen as 6 words (so that phrase indication is used): "D", "NE", "W", "D", "NB" and "B".

bertfrees · 2020-10-30T09:35:43Z

@egli Please review. It's a really big improvement, if I may say so myself.

egli

Looks like a good change. Afaikt you are replacing the binary way of handling emphasis with a more flexible system.
I didn't really check if all the index bounds are kept in all corner cases.
But all in all looks very good
I think we should have some doc though

tables/nl-g0.uti

tests/braille-specs/no_typeform_harness.yaml

liblouis/lou_translateString.c

bertfrees · 2020-11-02T11:13:26Z

Afaikt you are replacing the binary way of handling emphasis with a more flexible system.

Not sure what you mean by binary, but there is still a strict distinction between what we call "permanent" indication on the one hand, and word/phrase level indication on the other hand. Very little has changed for permanent indication.

I think we should have some doc though

Yes, I have to add some. But note that apart from the no noemphmodechars there is not all that much that changed.

dkager · 2020-11-05T08:56:16Z

Sorry for the delays! Regarding this test: https://github.com/liblouis/liblouis/blob/ab4f0fee3329a1ce5c97738f83a90ab5ec436509/tests/braille-specs/nl-g0_harness.yaml#L207-L209 I understand the description in the xfail, but think it would be good to ask Dorine in ‘t Veld (Dedicon) or another 6-dot braille expert from the Braille Autoriteit working group.

josteinaj · 2020-11-16T08:42:14Z

@bertfrees: Kari approves: snaekobbi#10 (comment)

bertfrees · 2020-11-20T19:11:27Z

I've sent Dorine a message on Monday. I think whatever the outcome, I'm going to merge this PR because I think overall the Dutch table is definitely improved. Several issues where fixed, only this one border case is possibly (not certainly) a regression.

dkager · 2020-11-23T07:46:42Z

Sounds sane.

bertfrees · 2020-11-24T16:55:14Z

@egli I have updated the doc. Can you quickly check again?

- behavior of begemph/endemph w.r.t. spaces (related to #1002) - order of opening and closing indicators (related to #922)

…me before/after space

Related to issue #998: Issues with emphasis in German when quotation mark or full-stop precedes/follows. see #998

In Dutch, every word part in a capitalised compound word counts in the length of a passage. see #779

so remove begcaps. Also, endcaps is automatically used as endcapsphrase (after), but rename it anyway.

fixes commit a9f7df1

Added a new resolveEmphasisBeginEnd function that is used by both capitalization and emphasis. This is mostly refactoring, but there is also a small behavioral change, namely: - An emphasized section indicated with begemph/endemph will not start or end with a space. The idea is to make capitalization and emphasis handling more and more alike in the future.

…ation For begemphphrase it is only enabled when emphmodechars is declared.

…in the table

fixes #944

…nd pass

…sis handling This changes the behavior a very little bit for tables that do not declare capsmodechars or emphmodechars, and a bit more substantially for tables that do. Tables and tests have been updated accordingly. The code has also been restructured in a way that makes it easier to see how "caps" differs from "emph" and how the various character attributes and the capsmodechars, emphmodechars and noemphchars rules affect the behavior. Extracting out this logic into a few simple functions made both that logic and the rest of the code easier to follow. See #905

This further harmonizes the behavior and further simplifies the code. The new noemphchars can be used to achieve the old behavior. Related to #1002.

… une virgule) see #713 (comment)

- use opcoderef only when it makes sense. Where they are used with a redundant "opocde" remove that. - try to move the references out of sentences, so it is easier to read - try to group references

There is a macro to generate the text "foo opcode" with a reference. Quite often though you want to have a reference to an opcode without the preceding "foo opcode" text. For that there are two new macros `opref` and `pxopref`. They are just like `ref` and `pxref` but they make sure the reference is correct (by making it "foo opcode") and they render the opcode inside a code

bertfrees added enhancement An enhancement in the functionality (not a bug fix or a table improvement) needs news Update to NEWS file needed needs doc This change in functionality needs an update in the user manual labels Oct 16, 2020

bertfrees added this to the 3.16 milestone Oct 16, 2020

bertfrees requested a review from egli October 16, 2020 20:14

bertfrees force-pushed the various-emphasis branch from b4604c1 to e15019a Compare October 19, 2020 12:40

egli approved these changes Nov 2, 2020

View reviewed changes

bertfrees self-assigned this Nov 16, 2020

egli mentioned this pull request Nov 16, 2020

Fix documentation of "caps" opcodes #953

Merged

bertfrees force-pushed the various-emphasis branch 2 times, most recently from 100f5b8 to 7d6860d Compare November 20, 2020 19:07

bertfrees removed the needs doc This change in functionality needs an update in the user manual label Nov 24, 2020

bertfrees requested a review from egli November 24, 2020 16:55

bertfrees force-pushed the various-emphasis branch from aaa4c7a to 6039326 Compare November 24, 2020 16:57

This was referenced Nov 24, 2020

Allow setting emphmodechars per emphasis class #944

Closed

In Dutch, every word part in a capitalised compound word counts in the length of a passage #779

Closed

bertfrees and others added 23 commits November 28, 2020 20:25

Add two emphasis tests

dd230ca

- behavior of begemph/endemph w.r.t. spaces (related to #1002) - order of opening and closing indicators (related to #922)

Fix no_typeform_harness.yaml: Opening/closing indicator should not co…

eec17f2

…me before/after space

Add a test for emphasis in German

e1850fd

Related to issue #998: Issues with emphasis in German when quotation mark or full-stop precedes/follows. see #998

Add a test for issue #779

7884219

In Dutch, every word part in a capitalised compound word counts in the length of a passage. see #779

Mention all authors of nl-g0_harness.yaml

a829a18

en-ueb-g1.ctb: begcaps is not used if begcapsphrase is declared

c39c825

so remove begcaps. Also, endcaps is automatically used as endcapsphrase (after), but rename it anyway.

Bugfix in markEmphases

c5f5556

fixes commit a9f7df1

Code simplifications in markEmphases

dcae1cb

begcapsphrase should be allowed to start in a word preceded by punctu…

31aa180

…ation For begemphphrase it is only enabled when emphmodechars is declared.

Dutch: fix the "'t" issue

cddf839

Refactoring: put more info in EmphasisClass type (now a struct)

b8beab0

Optimization: Don't process emphasis typeforms that are not declared …

d43963a

…in the table

Allow setting emphmodechars per emphasis class

52afefa

fixes #944

Code simplification: in resolveEmphasisWords mark whole words in seco…

04606da

…nd pass

Add a little hack to make the UEB tests pass again

94db523

Remove some of the special handling of begemph/endemph

4d92719

This further harmonizes the behavior and further simplifies the code. The new noemphchars can be used to achieve the old behavior. Related to #1002.

Fix issue in French table (Deux lettres isolées en gras, séparées par…

7e1e2d7

… une virgule) see #713 (comment)

UEB: improve translation of ligatures

4f9c7af

Update documentation

23048e3

Streamline the use of references in the doc

a372dd7

- use opcoderef only when it makes sense. Where they are used with a redundant "opocde" remove that. - try to move the references out of sentences, so it is easier to read - try to group references

egli approved these changes Nov 30, 2020

View reviewed changes

bertfrees force-pushed the various-emphasis branch from 0a89b34 to 8a08c08 Compare November 30, 2020 15:02

Update NEWS

02a6194

egli removed the needs news Update to NEWS file needed label Nov 30, 2020

Change the wording to be subtly less subtle

2122eeb

egli merged commit 3bf2b1c into master Nov 30, 2020

egli deleted the various-emphasis branch November 30, 2020 15:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various improvements to emphasis handling #1006

Various improvements to emphasis handling #1006

bertfrees commented Oct 16, 2020 •

edited

Loading

bertfrees commented Oct 16, 2020

bertfrees commented Oct 16, 2020

bertfrees commented Oct 17, 2020

bertfrees commented Oct 18, 2020

josteinaj commented Oct 23, 2020

garconvacher commented Oct 26, 2020

bertfrees commented Oct 30, 2020

bertfrees commented Oct 30, 2020

egli left a comment

bertfrees commented Nov 2, 2020 •

edited

Loading

dkager commented Nov 5, 2020 via email •

edited by bertfrees

Loading

josteinaj commented Nov 16, 2020

bertfrees commented Nov 20, 2020 •

edited

Loading

dkager commented Nov 23, 2020 via email •

edited by bertfrees

Loading

bertfrees commented Nov 24, 2020 •

edited

Loading

Various improvements to emphasis handling #1006

Various improvements to emphasis handling #1006

Conversation

bertfrees commented Oct 16, 2020 • edited Loading

bertfrees commented Oct 16, 2020

bertfrees commented Oct 16, 2020

bertfrees commented Oct 17, 2020

bertfrees commented Oct 18, 2020

josteinaj commented Oct 23, 2020

garconvacher commented Oct 26, 2020

bertfrees commented Oct 30, 2020

bertfrees commented Oct 30, 2020

egli left a comment

Choose a reason for hiding this comment

bertfrees commented Nov 2, 2020 • edited Loading

dkager commented Nov 5, 2020 via email • edited by bertfrees Loading

josteinaj commented Nov 16, 2020

bertfrees commented Nov 20, 2020 • edited Loading

dkager commented Nov 23, 2020 via email • edited by bertfrees Loading

bertfrees commented Nov 24, 2020 • edited Loading

bertfrees commented Oct 16, 2020 •

edited

Loading

bertfrees commented Nov 2, 2020 •

edited

Loading

dkager commented Nov 5, 2020 via email •

edited by bertfrees

Loading

bertfrees commented Nov 20, 2020 •

edited

Loading

dkager commented Nov 23, 2020 via email •

edited by bertfrees

Loading

bertfrees commented Nov 24, 2020 •

edited

Loading