-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proper speech pauses after a sentence finished #37
Comments
@claell If you could give me some concrete pages where this happens, then I'd be more than happy to look into this. Doesn't have to be immediately, but when you stumble upon on one, just post it here. FYI, already, I've noticed this in certain cases too (but not all versions you mentioned). Specifically, when a paragraph ends with a footnote. Unfortunately, in such cases the Wikipedia API makes the mistake of swallowing the white space after the period (and that part of the API I'm using also swallows the footnote itself, which makes sense, because we don't need it in a spoken version). Anyway, coming up with a heuristic when it is okay, to fill in a pause, when a dot is not followed by white space, is not trivial. So that's why I haven't tackled the problem yet :-/. |
I did some search yesterday and found that those mistakes should happen on the German page for "Omelett". Example sections:
Here is the dot read out as "Punkt" and no pause is there between "antreten." and "Das Wort".
There is no pause after "Rührei" and "Soll ich noch weiterlesen?". |
Okay, nice example. Have a look at: That's the API that I'm using and as you can see in the json payload when you click on the link, it says:
No idea. Honestly, I just didn't have the time to look into it closer yet. If you want to help report it, I'd very much appreciate your support.
That's indeed a bug in my code. Right here: https://github.com/petergtz/alexa-wikipedia/blob/master/skill/skill.go#L258-L261 I'll see if I can fix it some time soon! |
Thanks for testing with the API and the link. I think I found where to report bugs and created one: https://phabricator.wikimedia.org/T236128 So lets hope, it get fixed soon :) Regarding your code, I am not sure, how to handle this. I guess the problem is that there is no dot after |
That's pretty cool! Thank you. I also saw, that they already have a duplicate open. Unfortunately, they also say they don't plan to fix it. On the positive side, they would welcome a patch.
Yes, exactly. |
Yes, I also noticed that they linked the duplicate and read they don't plan to fix it. Let's see. I assume the fix won't be too complicated, but the problem is where to look for the code lines causing this. Also the Wikipedia API seems to be pretty big feature wise and this seems only to be an extension which is apparently not that popular. So one thought was to maybe use a better supported part of their API, although I did not find any to get the content of an article at a first short look. |
Yes, that would certainly help. I've spent quite some time one or two years ago finding an API that seemed to work and serve my purpose best. The problem is that all other APIs that I found so far, always return wikitext or HTML, but no plain text. And parsing wikitext or HTML and extracting just the right text, is completely out of scope. It's a bit of a dilemma :-). |
Hm. At least there is an API who returns Wikitext or HTML. Did not find that either on my first short glimpse. If one would parse something then I think Wikitext is better than HTML. It would probably have the benefit that certain things can be detected and passed to the TTS engine, for example the quotes as suggested in #35 can probably be detected this way. So in the long run, changing to Wikitext might be useful anyway. However I know that it will require a lot of additional work to put into this project (although there might be parsers there for it that can be built on), which you do in your freetime. So I understand that it might just be to much to ask for. |
I'm more than happy to accept contributions though. So if you like, give it a shot. |
I am just not experienced with Go at all and also a bit time restricted, same as you, probably. Will keep it in mind, though. What I definitely will offer is help, if you decide to implement it. |
Sounds good! |
There is now :-). Please check it out. |
Nice, thank you! Works for me.
I have thought about this again today, since that would be an easier fix. I think that detecting the pattern |
Agree. Good idea! I think one step that I want to put in between though, is gathering data about this. We could first report all cases it would alter, and let it run for a week or so. Afterwards, we could check if there are be any false positives. And if there aren't, add the mechanism to insert the space. Maybe using a github issue to list all the cases would provide the necessary transparency. |
Sounds good, that will avoid possible annoying problems and also will deliver some stats about it beforehand. |
Thanks for the hint! Did not know this testing phase has already been implemented. The current results look indeed pretty promising. |
Well, it just went live last night. :-) |
Ah, I just looked at the three days old comment there, but not on the edits. So you managed to implement automatic updates to this GitHub comment whenever the pattern is detected in the Skill, probably after a session has ended? |
Yes. And not just after a session, but on every request. It's kind of awkward, because it doesn't always appear right away, because AWS Lambda freezes the container after a response is sent to Alexa and the update in Github is happening asynchronously. But I wanted to avoid latency in the skill response due to this. So sometimes things get written out to the Github comment on the next request. But since it is not time critical, this seemed good enough. And indeed it works. |
It looks like umlauts at the end/beginning of the snippets are messing up the duplication avoidance. Will have to fix that to avoid further duplicates. Also, the pattern currently, only takes into account A-Z. Should probably change that to any letter. |
Pretty cool! And there also seems to be an automatic error reporter creating GitHub issues? This looks just great! Is this from a different project offering this or original work? Might be helpful for other Skills as well, although I am not sure how many use Go and are interested in GitHub issue tracking.
Nice, there is duplicate avoidance, did not know that. Is this an encoding issue with umlauts?
Like Umlauts? Or other languages? |
Yes, it creates github errors, but it also publishes messages on AWS SNS, which then get sent to me as emails. The emails contain the error message, a stacktrace and the request itself. That's even more convenient than to take the query in the github issue and paste it into AWS CloudWatch. I don't put all this information into a github issue, because I don't want to risk publishing data that's not supposed to be public. Sometimes, when I'm not lazy I paste the stacktrace back into the github issue for reference, but not always :-).
It's original work. Actually, the original work is in my alexa-journal skill and so far, I've simply copied it over. But my plan is to extract it into a separate repo, so it can be re-used just like you already described. Indeed, though, I'm not sure if anyone else will use it.
Not an encoding issue, but because I'm chopping things off exactly after 10 bytes, instead of 10 runes.
Yes. And like accents and all that kind of stuff. I just realized that it's not that easy though. Because even |
I think it is. It works for most cases at least, so if much more work is required to cover edge cases it is probably not worth it currently, at least unless somebody complains about it. |
It's getting interesting: I found 2 false positives: German "e.V." and English "Ph.D.". Both get read incorrectly by Alexa when inserting a space (she's pausing in between). Let's wait a few more days. Maybe we'll find more. (Let's still implement the algorithm as suggested by you. I think it's a great heuristic. We just need to special case our findings.) |
I also saw the "e.V.". I thought it would normally be formatted with a protected space in between, so assumed, that would be no problem for Alexa. On Wikipedia it is written with a space in between on the "Verein" page: https://de.wikipedia.org/wiki/Verein#Eingetragener_Verein So I assume for this example, the Alexa TTS readout just is wrong. It might be interesting to investigate, whether the Wikipedia API also swallows protected spaces or the occurrence was just written without a space in the Wikipedia article. However, such non-breaking spaces don't seem to be used for English abbreviations: https://en.wikipedia.org/wiki/Non-breaking_space#Width_variation So "Ph.D." should really not contain such a space. |
Another one: "G.m.b.H." and "Holding S.p.A. übernom" Interestingly I also saw: "für 1 Mrd.US-Dollar A". There should normally be a space in the Wikipedia article. Maybe a swallowed up non-breaking space. Also, the duplicate detection and handling of umlauts seems to be gone again? At least when I scroll down there are duplicates again, but also some in English. |
No, never had the time to fix it. |
Sorry, I probably misinterpreted something. I thought it was fixed already, probably with the commit you linked. I know that one mostly doesn't have the time one wants to work on such side projects. So no pressure and I don't expect anything. Your current rate of response to issues etc. is already way higher than most projects on GitHub I experienced so far. |
I noticed that sometimes there is a lack of pause when a sentence ended. Instead it flows directly into the next sentence. That sometimes happened also with the "Soll ich weiterlesen?" prompt.
Also I noticed that dots at the end of a sentence are sometimes read out as "Punkt".
The text was updated successfully, but these errors were encountered: