Skip to content

Transcribing pre-recorded Japanese file and my transcript output is '\u3053\u308c\u3042\u308a\u304c\u3068\u3046\u3054' instead of Japanese. #165

Closed Answered by jjmaldonis
JackBarker21 asked this question in General help
Discussion options

You must be logged in to vote

Hey @JackBarker21, I think this is an encoding issue with how Python saves JSON data.

When I print the transcript in your code, I see the output as Japanese characters, and when the data is saved to file it contains the unicode escaped characters (which start with \u). To fix how the data is saved, there are two small changes that you can make when saving the transcript:

    with open(f"path/{filename}.json", "w", encoding="utf8") as save_file:
            json.dump(response, save_file, indent=6, ensure_ascii=False)

Does this solve the issue for you?

Below is the exact code I used (99% yours) and attached is an example audio file in Japanese:

from deepgram import Deepgram  # pip install d…

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@JackBarker21
Comment options

Answer selected by JackBarker21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants