You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[*] Did you make sure a similar issue didn't exist?
[*] Did you update gTTS to the latest? (pip install --upgrade gTTS)
Current Behaviour (steps to reproduce)
The presence of 0xA0 in the input text is mostly ignored by gtts-cli. But in certain situations (the provided example) It will produce Error: 200 (OK) from TTS API. Probable cause: No audio stream in response. Unsupported language 'en' along with EOF (And it seems to be redirected to stderr without actually having a python error).
$ gtts-cli -f test -o test.mp3
working_test.txt non_working_test.txt
Even though the files contain 0xA0 which I assumed it will make the file a binary file. The file command says the opposite.
$ file non_working_test.txt
non_working_test.txt: Unicode text, UTF-8 text
gtts-cli didn't complain about none UTF-8 characters. And using iconv to remove non utf-8 characters doesn't change anything. $ iconv -f utf-8 -t utf-8 -c test does nothing to the file.
And some web pages use that character in between the text. Most text editors show it as space. Which is a bit frustrating to the user (You almost have no clue what to do or what causes the error)
And I can not blame the creator of the page since it seems like (after searching online) 0xA0 is a part of windows-1252 encoding (So if he wrote his blog in microsoft word, there's a big chance it got introduced there).
Expected Behaviour
gtts-cli should ignore that character and continue reading regardless of how and where it is present.
Context
I am writing a simple bash script that reads aloud the user's clipboard or a webpage associated with the url in the user's clipboard.
I personally have been using this command w3m "$(xclip -o)" | gtts-cli -f - | mpv - for over a year to boost productivity when reading. With some variations such less $pdf_file_or_epub_file | gtts-cli -f - | mpv - and so on and so forth.
The script basically does the same (Still very basic and under development).
And I came accross some webpages that caused that error to occure. After Some investigations I found out that the character 0xA0 is what is causing the problem.
So I created an issue and made a small workaround that uses bbe to replace the bad character with none (and then iconv for clean up since it is messing up a couple of things).
Environment
$ gtts-cli --version
gtts-cli, version 2.2.4
$ python --version
Python 3.9.12
$ uname -a
Linux Laptop 5.17.3-tkg-pds #1 TKG SMP PREEMPT Sat Apr 16 06:53:55 CET 2022 x86_64 Intel(R) Celeron(R) N4000 CPU @ 1.10GHz GenuineIntel GNU/Linux
OS: Gentoo/Linux x86_64
The text was updated successfully, but these errors were encountered:
I assume this isn't gtts-cli's fault. Since there's no actual python error. So I assume the problem is actually with the google text to speech engine. Yet the behavior itself is confusing. So I hope a fix will be applied.
@medanisjbara Thanks a lot for this well documented behaviour!
Hmm, so it's a windows-1252 character. I wonder if there's anything gTTS should (or shouldn't do) about this, like applying some filtering. I'll have to take a look with the debugging on.
Prerequisites
pip install --upgrade gTTS
)Current Behaviour (steps to reproduce)
The presence of
0xA0
in the input text is mostly ignored bygtts-cli
. But in certain situations (the provided example) It will produceError: 200 (OK) from TTS API. Probable cause: No audio stream in response. Unsupported language 'en'
along with EOF (And it seems to be redirected to stderr without actually having a python error).working_test.txt
non_working_test.txt
Even though the files contain
0xA0
which I assumed it will make the file a binary file. Thefile
command says the opposite.gtts-cli
didn't complain about none UTF-8 characters. And usingiconv
to remove non utf-8 characters doesn't change anything.$ iconv -f utf-8 -t utf-8 -c test
does nothing to the file.And some web pages use that character in between the text. Most text editors show it as space. Which is a bit frustrating to the user (You almost have no clue what to do or what causes the error)
And I can not blame the creator of the page since it seems like (after searching online)
0xA0
is a part ofwindows-1252
encoding (So if he wrote his blog in microsoft word, there's a big chance it got introduced there).Expected Behaviour
gtts-cli should ignore that character and continue reading regardless of how and where it is present.
Context
I am writing a simple bash script that reads aloud the user's clipboard or a webpage associated with the url in the user's clipboard.
I personally have been using this command
w3m "$(xclip -o)" | gtts-cli -f - | mpv -
for over a year to boost productivity when reading. With some variations suchless $pdf_file_or_epub_file | gtts-cli -f - | mpv -
and so on and so forth.The script basically does the same (Still very basic and under development).
And I came accross some webpages that caused that error to occure. After Some investigations I found out that the character
0xA0
is what is causing the problem.So I created an issue and made a small workaround that uses
bbe
to replace the bad character with none (and theniconv
for clean up since it is messing up a couple of things).Environment
The text was updated successfully, but these errors were encountered: