Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper speech pauses after a sentence finished #37

Open
claell opened this issue Oct 20, 2019 · 29 comments
Open

Proper speech pauses after a sentence finished #37

claell opened this issue Oct 20, 2019 · 29 comments
Labels
bug Something isn't working

Comments

@claell
Copy link

claell commented Oct 20, 2019

I noticed that sometimes there is a lack of pause when a sentence ended. Instead it flows directly into the next sentence. That sometimes happened also with the "Soll ich weiterlesen?" prompt.

Also I noticed that dots at the end of a sentence are sometimes read out as "Punkt".

@petergtz
Copy link
Owner

@claell If you could give me some concrete pages where this happens, then I'd be more than happy to look into this. Doesn't have to be immediately, but when you stumble upon on one, just post it here.

FYI, already, I've noticed this in certain cases too (but not all versions you mentioned). Specifically, when a paragraph ends with a footnote. Unfortunately, in such cases the Wikipedia API makes the mistake of swallowing the white space after the period (and that part of the API I'm using also swallows the footnote itself, which makes sense, because we don't need it in a spoken version). Anyway, coming up with a heuristic when it is okay, to fill in a pause, when a dot is not followed by white space, is not trivial. So that's why I haven't tackled the problem yet :-/.

@claell
Copy link
Author

claell commented Oct 21, 2019

I did some search yesterday and found that those mistakes should happen on the German page for "Omelett".

Example sections:

An den Singular der femininen Form kann in Österreich noch ein 'n antreten.[1]

Das Wort stammt aus dem Französischen und wurde im 18. Jahrhundert

Here is the dot read out as "Punkt" and no pause is there between "antreten." and "Das Wort".
This might be caused by the Wikipedia API. To me that looks like a bug in their API, is there already a bug report for it?

Siehe auch

Rührei

There is no pause after "Rührei" and "Soll ich noch weiterlesen?".

@petergtz
Copy link
Owner

Okay, nice example. Have a look at:

https://de.wikipedia.org/w/api.php?format=jsonfm&action=query&prop=extracts&titles=Omelett&redirects=true&formatversion=2&explaintext=true&exlimit=1

That's the API that I'm using and as you can see in the json payload when you click on the link, it says:

 ... ein 'n antreten.Das Wort stammt...

is there already a bug report for it?

No idea. Honestly, I just didn't have the time to look into it closer yet. If you want to help report it, I'd very much appreciate your support.

There is no pause after "Rührei" and "Soll ich noch weiterlesen?".

That's indeed a bug in my code. Right here:

https://github.com/petergtz/alexa-wikipedia/blob/master/skill/skill.go#L258-L261

I'll see if I can fix it some time soon!

@petergtz petergtz added the bug Something isn't working label Oct 21, 2019
@claell
Copy link
Author

claell commented Oct 22, 2019

Thanks for testing with the API and the link. I think I found where to report bugs and created one: https://phabricator.wikimedia.org/T236128

So lets hope, it get fixed soon :)

Regarding your code, I am not sure, how to handle this. I guess the problem is that there is no dot after Rührei, so some logic has to be introduced to detect that first and insert a dot when needed?

@petergtz
Copy link
Owner

Thanks for testing with the API and the link. I think I found where to report bugs and created one: https://phabricator.wikimedia.org/T236128

So lets hope, it get fixed soon :)

That's pretty cool! Thank you. I also saw, that they already have a duplicate open. Unfortunately, they also say they don't plan to fix it. On the positive side, they would welcome a patch.

Regarding your code, I am not sure, how to handle this. I guess the problem is that there is no dot after Rührei, so some logic has to be introduced to detect that first and insert a dot when needed?

Yes, exactly.

@claell
Copy link
Author

claell commented Oct 22, 2019

Yes, I also noticed that they linked the duplicate and read they don't plan to fix it. Let's see. I assume the fix won't be too complicated, but the problem is where to look for the code lines causing this. Also the Wikipedia API seems to be pretty big feature wise and this seems only to be an extension which is apparently not that popular. So one thought was to maybe use a better supported part of their API, although I did not find any to get the content of an article at a first short look.

@petergtz
Copy link
Owner

So one thought was to maybe use a better supported part of their API, although I did not find any to get the content of an article at a first short look.

Yes, that would certainly help. I've spent quite some time one or two years ago finding an API that seemed to work and serve my purpose best. The problem is that all other APIs that I found so far, always return wikitext or HTML, but no plain text. And parsing wikitext or HTML and extracting just the right text, is completely out of scope. It's a bit of a dilemma :-).

@claell
Copy link
Author

claell commented Oct 23, 2019

Hm. At least there is an API who returns Wikitext or HTML. Did not find that either on my first short glimpse.

If one would parse something then I think Wikitext is better than HTML. It would probably have the benefit that certain things can be detected and passed to the TTS engine, for example the quotes as suggested in #35 can probably be detected this way.

So in the long run, changing to Wikitext might be useful anyway. However I know that it will require a lot of additional work to put into this project (although there might be parsers there for it that can be built on), which you do in your freetime. So I understand that it might just be to much to ask for.

@petergtz
Copy link
Owner

So I understand that it might just be to much to ask for.

I'm more than happy to accept contributions though. So if you like, give it a shot.

@claell
Copy link
Author

claell commented Oct 23, 2019

I am just not experienced with Go at all and also a bit time restricted, same as you, probably. Will keep it in mind, though. What I definitely will offer is help, if you decide to implement it.

@petergtz
Copy link
Owner

Sounds good!

@petergtz
Copy link
Owner

There is no pause after "Rührei" and "Soll ich noch weiterlesen?".

There is now :-). Please check it out.

@claell
Copy link
Author

claell commented Oct 27, 2019

Nice, thank you! Works for me.

Anyway, coming up with a heuristic when it is okay, to fill in a pause, when a dot is not followed by white space, is not trivial.

I have thought about this again today, since that would be an easier fix. I think that detecting the pattern lowercase letter, dot, uppercase letter should work for most cases and should not give many false positives. I thought about abbreviations like "z.B.", although those should normally be formatted with a protected space in between.

@petergtz
Copy link
Owner

I have thought about this again today, since that would be an easier fix. I think that detecting the pattern lowercase letter, dot, uppercase letter should work for most cases and should not give many false positives. I thought about abbreviations like "z.B.", although those should normally be formatted with a protected space in between.

Agree. Good idea! I think one step that I want to put in between though, is gathering data about this. We could first report all cases it would alter, and let it run for a week or so. Afterwards, we could check if there are be any false positives. And if there aren't, add the mechanism to insert the space.

Maybe using a github issue to list all the cases would provide the necessary transparency.

@claell
Copy link
Author

claell commented Oct 28, 2019

Sounds good, that will avoid possible annoying problems and also will deliver some stats about it beforehand.

@petergtz
Copy link
Owner

petergtz commented Nov 5, 2019

@claell In case you're curious, #40 now contains all cases seen so far where we'd be inserting a space. In a week or so we can revisit. But it already looks quite good. No false positives so far.

@claell
Copy link
Author

claell commented Nov 6, 2019

Thanks for the hint! Did not know this testing phase has already been implemented. The current results look indeed pretty promising.

@petergtz
Copy link
Owner

petergtz commented Nov 6, 2019

Well, it just went live last night. :-)

@claell
Copy link
Author

claell commented Nov 6, 2019

Ah, I just looked at the three days old comment there, but not on the edits. So you managed to implement automatic updates to this GitHub comment whenever the pattern is detected in the Skill, probably after a session has ended?

@petergtz
Copy link
Owner

petergtz commented Nov 6, 2019

Yes. And not just after a session, but on every request.

It's kind of awkward, because it doesn't always appear right away, because AWS Lambda freezes the container after a response is sent to Alexa and the update in Github is happening asynchronously. But I wanted to avoid latency in the skill response due to this.

So sometimes things get written out to the Github comment on the next request. But since it is not time critical, this seemed good enough. And indeed it works.

@petergtz
Copy link
Owner

petergtz commented Nov 6, 2019

It looks like umlauts at the end/beginning of the snippets are messing up the duplication avoidance. Will have to fix that to avoid further duplicates.

Also, the pattern currently, only takes into account A-Z. Should probably change that to any letter.

petergtz added a commit that referenced this issue Nov 6, 2019
@claell
Copy link
Author

claell commented Nov 6, 2019

And indeed it works.

Pretty cool! And there also seems to be an automatic error reporter creating GitHub issues? This looks just great! Is this from a different project offering this or original work? Might be helpful for other Skills as well, although I am not sure how many use Go and are interested in GitHub issue tracking.

It looks like umlauts at the end/beginning of the snippets are messing up the duplication avoidance.

Nice, there is duplicate avoidance, did not know that. Is this an encoding issue with umlauts?

Should probably change that to any letter.

Like Umlauts? Or other languages?

@petergtz
Copy link
Owner

petergtz commented Nov 6, 2019

And there also seems to be an automatic error reporter creating GitHub issues?

Yes, it creates github errors, but it also publishes messages on AWS SNS, which then get sent to me as emails. The emails contain the error message, a stacktrace and the request itself. That's even more convenient than to take the query in the github issue and paste it into AWS CloudWatch. I don't put all this information into a github issue, because I don't want to risk publishing data that's not supposed to be public. Sometimes, when I'm not lazy I paste the stacktrace back into the github issue for reference, but not always :-).

Is this from a different project offering this or original work?

It's original work. Actually, the original work is in my alexa-journal skill and so far, I've simply copied it over. But my plan is to extract it into a separate repo, so it can be re-used just like you already described. Indeed, though, I'm not sure if anyone else will use it.

Is this an encoding issue with umlauts?

Not an encoding issue, but because I'm chopping things off exactly after 10 bytes, instead of 10 runes.

Like Umlauts? Or other languages?

Yes. And like accents and all that kind of stuff. I just realized that it's not that easy though. Because even \w is not covering them. Maybe it's good enough the way it is.

@claell
Copy link
Author

claell commented Nov 7, 2019

Maybe it's good enough the way it is.

I think it is. It works for most cases at least, so if much more work is required to cover edge cases it is probably not worth it currently, at least unless somebody complains about it.

@petergtz
Copy link
Owner

petergtz commented Nov 7, 2019

It's getting interesting: I found 2 false positives: German "e.V." and English "Ph.D.". Both get read incorrectly by Alexa when inserting a space (she's pausing in between). Let's wait a few more days. Maybe we'll find more. (Let's still implement the algorithm as suggested by you. I think it's a great heuristic. We just need to special case our findings.)

@claell
Copy link
Author

claell commented Nov 8, 2019

I also saw the "e.V.". I thought it would normally be formatted with a protected space in between, so assumed, that would be no problem for Alexa. On Wikipedia it is written with a space in between on the "Verein" page: https://de.wikipedia.org/wiki/Verein#Eingetragener_Verein

So I assume for this example, the Alexa TTS readout just is wrong. It might be interesting to investigate, whether the Wikipedia API also swallows protected spaces or the occurrence was just written without a space in the Wikipedia article.

However, such non-breaking spaces don't seem to be used for English abbreviations: https://en.wikipedia.org/wiki/Non-breaking_space#Width_variation

So "Ph.D." should really not contain such a space.

@claell
Copy link
Author

claell commented Nov 8, 2019

Another one: "G.m.b.H." and "Holding S.p.A. übernom"

Interestingly I also saw: "für 1 Mrd.US-Dollar A". There should normally be a space in the Wikipedia article. Maybe a swallowed up non-breaking space.

Also, the duplicate detection and handling of umlauts seems to be gone again? At least when I scroll down there are duplicates again, but also some in English.

@petergtz
Copy link
Owner

the duplicate detection and handling of umlauts seems to be gone again

No, never had the time to fix it.

@claell
Copy link
Author

claell commented Nov 12, 2019

Sorry, I probably misinterpreted something. I thought it was fixed already, probably with the commit you linked. I know that one mostly doesn't have the time one wants to work on such side projects. So no pressure and I don't expect anything. Your current rate of response to issues etc. is already way higher than most projects on GitHub I experienced so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants
@petergtz @claell and others