OpenAI voices. Voice as default command.

Add support to AI-generated voices from OpenAI API, which are very human-like. Set them as default. Change the default command to `voice`. Migrate to v1.2.4 of openai package.
paulovcmedeiros · Nov 16, 2023 · 34d3c5d · 34d3c5d
2 parents 293b874 + c291b8e
commit 34d3c5d
Show file tree

Hide file tree

Showing 18 changed files with 510 additions and 340 deletions.
diff --git a/.github/workflows/tests.yaml b/.github/workflows/tests.yaml
@@ -41,7 +41,7 @@ jobs:
       - name: Install PortAudio and PulseAudio
         run: |
           apt-get update
-          apt-get --assume-yes install portaudio19-dev python-all-dev pulseaudio
+          apt-get --assume-yes install portaudio19-dev python-all-dev pulseaudio ffmpeg
 
       #----------------------------------------------
       #  --- configure poetry & install project  ----

diff --git a/README.md b/README.md
@@ -9,21 +9,25 @@
 
 # pyRobBot: Talk and Chat with GPT LLMs
 
-An interface to OpenAI's [GPT large language models (LLMs)](https://platform.openai.com/docs/models) that implements:
-* A conventional chatbot that can be used either via web UI or terminal
-* A personal assistant that can actually interact with you by voice
+A python package that uses OpenAI's [GPT large language models (LLMs)](https://platform.openai.com/docs/models) to implement:
+* A fully configurable personal assistant that can speak and listen to you
+* An equally fully configurable text-based chatbot that can be used either via web UI or terminal
 
-The package is written in Python. The web chatbot UI is made with [Streamlit](https://streamlit.io).
-
-**See and try the [demo web app on Streamlit](https://pyrobbot.streamlit.app)!**
 
 ## Features
-- [x] Text to speech and speech to text (`rob voice`)
-  - Talk to the GPT assistant!
-  - You can choose your preferred language (e.g., `rob voice --lang pt-br`)
-- [x] Web UI
+- [x] Text to speech and speech to text
+  - Talk to the GPT assistant and the assistant will talk back to you!
+  - Choose your preferred language (e.g., `rob --lang pt-br`)
+  - Choose your preferred Text-to-Speech (TTS) engine
+    - [OpenAI Text-to-Speech](https://platform.openai.com/docs/guides/text-to-speech) (default): AI-generated *human-like* voice
+    - [Google TTS](https://cloud.google.com/text-to-speech) (`rob --tts google`): free at the time being, with decent quality
+
+
+- [x] Browser  UI (made with [Streamlit](https://pyrobbot.streamlit.app))
   - Add/remove conversations dynamically
   - Automatic/editable conversation summary title
+- [x] Terminal UI
+  - For a more "Wake up, Neo" experience
 - [x] Fully configurable
   - Support for multiple GPT LLMs
   - Control over the parameters passed to the OpenAI API, with (hopefully) sensible defaults
@@ -32,6 +36,7 @@ The package is written in Python. The web chatbot UI is made with [Streamlit](ht
   - Dynamically modifiable AI parameters in each chat separately
     - No need to restart the chat
 - [x] Autosave & retrieve chat history
+  - In the browser UI, you can even read the transcripts of your voice conversations with the AI
 - [x] Chat context handling using [embeddings](https://platform.openai.com/docs/guides/embeddings)
 - [x] Estimated API token usage and associated costs
 - [x] OpenAI API key is **never** stored on disk
@@ -42,13 +47,16 @@ The package is written in Python. The web chatbot UI is made with [Streamlit](ht
 - Python >= 3.9
 - A valid [OpenAI API key](https://platform.openai.com/account/api-keys)
   - Set in the Web UI or through the environment variable `OPENAI_API_KEY`
-- Optionally, to enable voice chat, you also need:
+- To enable voice chat, you also need:
   - [PortAudio](https://www.portaudio.com/docs/v19-doxydocs/index.html)
     - Install on Ubuntu with `sudo apt-get --assume-yes install portaudio19-dev python-all-dev`
     - Install on CentOS/RHEL with `sudo yum install portaudio portaudio-devel`
+  - [ffmpeg](https://ffmpeg.org/download.html)
+    - Install on Ubuntu with `sudo apt-get --assume-yes install ffmpeg`
+    - Install on CentOS/RHEL with `sudo yum install ffmpeg`
 
 ## Installation
-This, naturally, assumes your systems fulfills all [requirements](#system-requirements).
+This, naturally, assumes your system fulfills all [requirements](#system-requirements).
 ### Using pip
 ```shell
 pip install pyrobbot
@@ -73,25 +81,27 @@ and general `rob` options. For info about specific subcommands and the
 options that apply to them only, **please run `rob SUBCOMMAND -h`** (note
 that the `-h` goes after the subcommand in this case).
 
-
-### Using the Web UI
+### Chatting by Voice (default)
 ```shell
 rob
 ```
 
-### Chatting by Voice
+### Using the Web UI
 ```shell
-rob voice
+rob ui
 ```
 
+
 ### Running on the Terminal
 ```shell
 rob .
 ```
 
 ## Disclaimers
-This project's main purpose is to serve as a learning exercise for me, as well as tool for experimenting with OpenAI API, GPT LLMs and text-to-voice/voice-to-text. It does not claim to be the best or more robust OpenAI-powered chatbot out there.
+This project's main purpose has been to serve as a learning exercise for me, as well as tool for experimenting with OpenAI API, GPT LLMs and text-to-speech/speech-to-text.
+
+While it does not claim to be the best or more robust OpenAI-powered chatbot out there, it *does* aim to provide a friendly user interface that is easy to install, use and configure.
 
-Having said this, this project *does* aim to provide a friendly user interface that is easy to use and configure. Feel free to open an issue or submit a pull request if you find a bug or have a suggestion.
+Feel free to open an [issue](https://github.com/paulovcmedeiros/pyRobBot/issues) or, even better, [submit a pull request](https://github.com/paulovcmedeiros/pyRobBot/pulls) if you find a bug or have a suggestion.
 
 Last but not least: this project is **not** affiliated with OpenAI in any way.
diff --git a/pyproject.toml b/pyproject.toml
@@ -4,7 +4,7 @@
   license = "MIT"
   name = "pyrobbot"
   readme = "README.md"
-  version = "0.2.4"
+  version = "0.3.0"
 
 [build-system]
   build-backend = "poetry.core.masonry.api"
@@ -24,15 +24,17 @@
   # Other dependencies
   loguru = "^0.7.2"
   numpy = "^1.26.1"
-  openai = "^0.28.1"
+  openai = "^1.2.4"
   pandas = "^2.1.2"
   pillow = "^10.1.0"
   pydantic = "^2.4.2"
   streamlit = "^1.28.0"
   tiktoken = "^0.5.1"
   # Text to speech
   gtts = "^2.4.0"
+  pydub = "^0.25.1"
   pygame = "^2.5.2"
+  setuptools = "^68.2.2"             # Needed by webrtcvad-wheels
   sounddevice = "^0.4.6"
   soundfile = "^0.12.1"
   speechrecognition = "^3.10.0"
@@ -128,7 +130,7 @@
   ##################
 
 [tool.pytest.ini_options]
-  addopts = "-v --failed-first --cov-report=term-missing --cov-report=term:skip-covered --cov-report=xml:.coverage.xml --cov=./"
+  addopts = "-v --cache-clear --failed-first --cov-report=term-missing --cov-report=term:skip-covered --cov-report=xml:.coverage.xml --cov=./"
   log_cli_level = "INFO"
   testpaths = ["tests/smoke", "tests/unit"]
 

diff --git a/pyrobbot/__init__.py b/pyrobbot/__init__.py
@@ -9,8 +9,8 @@
 from importlib.metadata import metadata, version
 from pathlib import Path
 
-import openai
 from loguru import logger
+from openai import OpenAI, OpenAIError
 
 logger.remove()
 logger.add(
@@ -46,9 +46,15 @@ class GeneralDefinitions:
     @staticmethod
     def openai_key_hash():
         """Return a hash of the OpenAI API key."""
-        if openai.api_key is None:
+        try:
+            client = OpenAI()
+        except OpenAIError:
+            api_key = None
+        else:
+            api_key = client.api_key
+        if api_key is None:
             return "demo"
-        return hashlib.sha256(openai.api_key.encode("utf-8")).hexdigest()
+        return hashlib.sha256(api_key.encode("utf-8")).hexdigest()
 
     @property
     def package_cache_directory(self):
@@ -71,4 +77,3 @@ def chats_storage_dir(self):
 )
 
 # Initialize the OpenAI API client
-openai.api_key = GeneralConstants.SYSTEM_ENV_OPENAI_API_KEY
diff --git a/pyrobbot/app/multipage.py b/pyrobbot/app/multipage.py
@@ -3,8 +3,8 @@
 import datetime
 from abc import ABC, abstractmethod, abstractproperty
 
-import openai
 import streamlit as st
+from openai import OpenAI
 from pydantic import ValidationError
 
 from pyrobbot import GeneralConstants
@@ -137,12 +137,9 @@ def init_chat_credentials(self):
             help="[OpenAI API auth key](https://platform.openai.com/account/api-keys). "
             + "Chats created with this key won't be visible to people using other keys.",
         )
-        openai.api_key = (
-            self.openai_api_key
-            if self.openai_api_key
-            else GeneralConstants.SYSTEM_ENV_OPENAI_API_KEY
-        )
-        if not openai.api_key:
+
+        client = OpenAI()
+        if not client.api_key:
             st.write(":red[You need to provide a key to use the chat]")
 
     def add_page(

diff --git a/pyrobbot/argparse_wrapper.py b/pyrobbot/argparse_wrapper.py
@@ -4,11 +4,41 @@
 import sys
 
 from . import GeneralConstants
-from .chat_configs import ChatOptions
-from .command_definitions import accounting, run_on_terminal, run_on_ui, run_over_voice
+from .chat_configs import ChatOptions, VoiceChatConfigs
+from .command_definitions import (
+    accounting_report,
+    browser_chat,
+    terminal_chat,
+    voice_chat,
+)
+
+
+def _populate_parser_from_pydantic_model(parser, model):
+    _argarse2pydantic = {
+        "type": model.get_type,
+        "default": model.get_default,
+        "choices": model.get_allowed_values,
+        "help": model.get_description,
+    }
+    for field_name, field in model.model_fields.items():
+        args_opts = {
+            key: _argarse2pydantic[key](field_name)
+            for key in _argarse2pydantic
+            if _argarse2pydantic[key](field_name) is not None
+        }
+        args_opts["required"] = field.is_required()
+        if "help" in args_opts:
+            args_opts["help"] = f"{args_opts['help']} (default: %(default)s)"
+        if "default" in args_opts and isinstance(args_opts["default"], (list, tuple)):
+            args_opts.pop("type", None)
+            args_opts["nargs"] = "*"
+
+        parser.add_argument(f"--{field_name.replace('_', '-')}", **args_opts)
+
+    return parser
 
 
-def get_parsed_args(argv=None, default_command="ui"):
+def get_parsed_args(argv=None, default_command="voice"):
     """Get parsed command line arguments.
 
     Args:
@@ -21,45 +51,21 @@ def get_parsed_args(argv=None, default_command="ui"):
     """
     if argv is None:
         argv = sys.argv[1:]
-    if not argv:
-        argv = [default_command]
-
-    chat_options_parser = argparse.ArgumentParser(
-        formatter_class=argparse.ArgumentDefaultsHelpFormatter, add_help=False
-    )
-    argarse2pydantic = {
-        "type": ChatOptions.get_type,
-        "default": ChatOptions.get_default,
-        "choices": ChatOptions.get_allowed_values,
-        "help": ChatOptions.get_description,
-    }
-    for field_name, field in ChatOptions.model_fields.items():
-        args_opts = {
-            key: argarse2pydantic[key](field_name)
-            for key in argarse2pydantic
-            if argarse2pydantic[key](field_name) is not None
-        }
-        args_opts["required"] = field.is_required()
-        if "help" in args_opts:
-            args_opts["help"] = f"{args_opts['help']} (default: %(default)s)"
-        if "default" in args_opts and isinstance(args_opts["default"], (list, tuple)):
-            args_opts.pop("type", None)
-            args_opts["nargs"] = "*"
-
-        chat_options_parser.add_argument(f"--{field_name.replace('_', '-')}", **args_opts)
+    first_argv = next(iter(argv), "'")
+    info_flags = ["--version", "-v", "-h", "--help"]
+    if not argv or (first_argv.startswith("-") and first_argv not in info_flags):
+        argv = [default_command, *argv]
 
+    # Main parser that will handle the script's commands
     main_parser = argparse.ArgumentParser(
         formatter_class=argparse.ArgumentDefaultsHelpFormatter
     )
-
     main_parser.add_argument(
         "--version",
         "-v",
         action="version",
         version=f"{GeneralConstants.PACKAGE_NAME} v" + GeneralConstants.VERSION,
     )
-
-    # Configure the main parser to handle the commands
     subparsers = main_parser.add_subparsers(
         title="commands",
         dest="command",
@@ -71,45 +77,58 @@ def get_parsed_args(argv=None, default_command="ui"):
         help="command description",
     )
 
+    # Common options to most commands
+    chat_options_parser = _populate_parser_from_pydantic_model(
+        parser=argparse.ArgumentParser(
+            formatter_class=argparse.ArgumentDefaultsHelpFormatter, add_help=False
+        ),
+        model=ChatOptions,
+    )
+    chat_options_parser.add_argument(
+        "--report-accounting-when-done",
+        action="store_true",
+        help="Report estimated costs when done with the chat.",
+    )
+
+    # Voice chat
+    voice_options_parser = _populate_parser_from_pydantic_model(
+        parser=argparse.ArgumentParser(
+            formatter_class=argparse.ArgumentDefaultsHelpFormatter, add_help=False
+        ),
+        model=VoiceChatConfigs,
+    )
+    parser_voice_chat = subparsers.add_parser(
+        "voice",
+        aliases=["v"],
+        parents=[voice_options_parser],
+        help="Run the chat over voice.",
+    )
+    parser_voice_chat.set_defaults(run_command=voice_chat)
+
+    # Web app chat
     parser_ui = subparsers.add_parser(
         "ui",
         aliases=["app"],
         parents=[chat_options_parser],
         help="Run the chat UI on the browser.",
     )
-    parser_ui.set_defaults(run_command=run_on_ui)
+    parser_ui.set_defaults(run_command=browser_chat)
 
+    # Terminal chat
     parser_terminal = subparsers.add_parser(
         "terminal",
         aliases=["."],
         parents=[chat_options_parser],
         help="Run the chat on the terminal.",
     )
-    parser_terminal.add_argument(
-        "--report-accounting-when-done",
-        action="store_true",
-        help="Report estimated costs when done with the chat.",
-    )
-    parser_terminal.set_defaults(run_command=run_on_terminal)
-
-    parser_over_voice = subparsers.add_parser(
-        "voice",
-        aliases=["v"],
-        parents=[chat_options_parser],
-        help="Run the chat over voice.",
-    )
-    parser_over_voice.add_argument(
-        "--report-accounting-when-done",
-        action="store_true",
-        help="Report estimated costs when done with the chat.",
-    )
-    parser_over_voice.set_defaults(run_command=run_over_voice)
+    parser_terminal.set_defaults(run_command=terminal_chat)
 
+    # Accounting report
     parser_accounting = subparsers.add_parser(
         "accounting",
         aliases=["acc"],
         help="Show the estimated number of used tokens and associated costs, and exit.",
     )
-    parser_accounting.set_defaults(run_command=accounting)
+    parser_accounting.set_defaults(run_command=accounting_report)
 
     return main_parser.parse_args(argv)