0xA0 is causing gtts-cli to send EOF. #353

medanisjbara · 2022-05-27T12:14:13Z

Prerequisites

[*] Did you make sure a similar issue didn't exist?
[*] Did you update gTTS to the latest? (pip install --upgrade gTTS)

Current Behaviour (steps to reproduce)

The presence of 0xA0 in the input text is mostly ignored by gtts-cli. But in certain situations (the provided example) It will produce Error: 200 (OK) from TTS API. Probable cause: No audio stream in response. Unsupported language 'en' along with EOF (And it seems to be redirected to stderr without actually having a python error).

$ gtts-cli -f test -o test.mp3

working_test.txt
non_working_test.txt
Even though the files contain 0xA0 which I assumed it will make the file a binary file. The file command says the opposite.

$ file non_working_test.txt
non_working_test.txt: Unicode text, UTF-8 text

gtts-cli didn't complain about none UTF-8 characters. And using iconv to remove non utf-8 characters doesn't change anything.
$ iconv -f utf-8 -t utf-8 -c test does nothing to the file.
And some web pages use that character in between the text. Most text editors show it as space. Which is a bit frustrating to the user (You almost have no clue what to do or what causes the error)
And I can not blame the creator of the page since it seems like (after searching online) 0xA0 is a part of windows-1252 encoding (So if he wrote his blog in microsoft word, there's a big chance it got introduced there).

Expected Behaviour

gtts-cli should ignore that character and continue reading regardless of how and where it is present.

Context

I am writing a simple bash script that reads aloud the user's clipboard or a webpage associated with the url in the user's clipboard.
I personally have been using this command w3m "$(xclip -o)" | gtts-cli -f - | mpv - for over a year to boost productivity when reading. With some variations such less $pdf_file_or_epub_file | gtts-cli -f - | mpv - and so on and so forth.
The script basically does the same (Still very basic and under development).
And I came accross some webpages that caused that error to occure. After Some investigations I found out that the character 0xA0 is what is causing the problem.
So I created an issue and made a small workaround that uses bbe to replace the bad character with none (and then iconv for clean up since it is messing up a couple of things).

Environment

$ gtts-cli --version
gtts-cli, version 2.2.4

$ python --version
Python 3.9.12

$ uname -a
Linux Laptop 5.17.3-tkg-pds #1 TKG SMP PREEMPT Sat Apr 16 06:53:55 CET 2022 x86_64 Intel(R) Celeron(R) N4000 CPU @ 1.10GHz GenuineIntel GNU/Linux

OS: Gentoo/Linux x86_64

The text was updated successfully, but these errors were encountered:

medanisjbara · 2022-05-27T12:23:10Z

I assume this isn't gtts-cli's fault. Since there's no actual python error. So I assume the problem is actually with the google text to speech engine. Yet the behavior itself is confusing. So I hope a fix will be applied.

pndurette · 2022-05-31T20:30:18Z

@medanisjbara Thanks a lot for this well documented behaviour!

Hmm, so it's a windows-1252 character. I wonder if there's anything gTTS should (or shouldn't do) about this, like applying some filtering. I'll have to take a look with the debugging on.

pndurette added the investigation label May 31, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0xA0 is causing gtts-cli to send EOF. #353

0xA0 is causing gtts-cli to send EOF. #353

medanisjbara commented May 27, 2022 •

edited

Loading

medanisjbara commented May 27, 2022

pndurette commented May 31, 2022

0xA0 is causing gtts-cli to send EOF. #353

0xA0 is causing gtts-cli to send EOF. #353

Comments

medanisjbara commented May 27, 2022 • edited Loading

Prerequisites

Current Behaviour (steps to reproduce)

Expected Behaviour

Context

Environment

medanisjbara commented May 27, 2022

pndurette commented May 31, 2022

medanisjbara commented May 27, 2022 •

edited

Loading