Merge pull request #76 from etalab-ia/dev

v0.2.0
etalab-ia · Jul 29, 2024 · 45ce9b3 · 45ce9b3
2 parents 1f8c1c1 + 62bf4a9
commit 45ce9b3
Show file tree

Hide file tree

Showing 25 changed files with 1,069 additions and 505 deletions.
diff --git a/.gitignore b/.gitignore
@@ -51,6 +51,8 @@ venv/
 ENV/
 env.bak/
 venv.bak/
+.direnv/
+.envrc
 
 # mypy
 .mypy_cache/

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,34 @@
+# Changelog
+
+All notable changes to this project will be documented in this file.
+
+## 0.2.0
+
+### 🚀 Features
+
+- Use the chat history to build messages compatible with the openai api [{role, content}].
+- Support for reply in thread.
+- Improve conversation history management.
+- Manage reply in conversation + improved albert messaging.
+- Add a minimal system prompt in norag mode
+- Improve albert command and response format.
+- Add command aliases.
+- Add a grist table for user management (implement an minimalistic async grist client)
+
+### 🐛 Bug Fixes
+
+- Pyalbert version for gemma-2 support
+- Better error management
+
+### Refacto
+
+- Github actions (#72)
+- Bot commands refactorization
+- Add all bot custom messages in a dedicated AlbertMsg classe.
+
+### Scripts
+
+- Dump users state list
+- Update users table from list
+- Send error message demo
+
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,11 @@
+
+Le code est linté et les imports trié avec [Ruff](https://docs.astral.sh/ruff/) :
+```bash
+ruff check --fix --select I .
+```
+
+
+Ruff s'intégre dans la plupart des éditeurs de code. Vous pouvez automatiser le linter avec les _hooks_ de _pre-commit_ de git si vous préférez :
+```bash
+pre-commit install
+``
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Etalab
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -23,7 +23,7 @@ Le projet est un fork de [tchap_bot](https://code.peren.fr/open-source/tchapbot)
 
 Contient :
 - `app/.` : la codebase pour le Tchap bot Albert
-- `app/matrix_bot` : une bibliothèque pour pouvoir faire des bots Matrix
+- `app/matrix_bot` : une bibliothèque qui encapsule [matrix-nio](https://github.com/matrix-nio/matrix-nio) faire des bots Matrix
 
 
 ### Installation locale
@@ -52,41 +52,14 @@ Créez le fichier d'environnement `app/.env` avec les informations de connexion
 cp app/.env.example app/.env
 ```
 
-Les variables d'environnement à renseigner sont les suivantes :
+L'ensemble des variables d'environements disponibles est documenté dans le fichier suivant : [app/config.py](./app/config.py)
 
-- `JOIN_ON_INVITE` : booléen facultatif pour activer ou non l'acceptation automatique des invitations dans les salons (exemple : `JOIN_ON_INVITE=True`. Par défaut, `False`)
-- `SALT` : il est conseillé de changer la valeur du salt pour ne pas avoir celle par défaut. Il faudra en revanche qu'elle de change pas entre deux sessions.
-- `MATRIX_HOME_SERVER` : l'URL du serveur Matrix à utiliser (exemple : `MATRIX_HOME_SERVER="https://matrix.agent.ministere_example.tchap.gouv.fr"`)
-- `MATRIX_BOT_USERNAME` : le nom d'utilisateur du bot Matrix (exemple : `MATRIX_BOT_USERNAME="tchapbot@ministere_example.gouv.fr"`)
-- `MATRIX_BOT_PASSWORD` : le mot de passe du bot Matrix
-- `ERRORS_ROOM_ID` : l'identifiant du salon Tchap où les erreurs seront envoyées (exemple : `ERRORS_ROOM_ID="!roomid:matrix.agent.ministere_example.tchap.gouv.fr"`). **Attention** : le bot doit être invité dans ce salon pour pouvoir y envoyer ses messages d'erreur !
 
-Pour que le bot se connecte à l'API d'Albert, il faut également renseigner les variables suivantes :
-- `USER_ALLOWED_DOMAINS` : liste des domaines d'email autorisés pour les utilisateurs Tchap pour qu'ils puissent interagir avec le bot (exemple : `USER_ALLOWED_DOMAINS='["ministere1.gouv.fr", "ministere2.gouv.fr"]'`. Par défaut : `["*"]` (tous les domaines sont autorisés))
-- `GROUPS_USED=['albert']` : permet, dans cet exemple, d'activer toutes les commandes qui font partie du groupe "albert"
-- `ALBERT_API_URL` : l'url de l'API Albert à consommer
-- `ALBERT_API_TOKEN` : le token API utilisé pour authoriser le bot a consommer l'API Albert. Pour plus d'informations, consultez la documentation de l'API Albert
-- `ALBERT_MODEL_NAME` : le nom du modèle Albert à utiliser pour le bot (exemple : `ALBERT_MODEL_NAME='AgentPublic/albertlight-7b'`). Pour plus d'informations, consultez la documentation de l'API Albert et le [hub des modèles Albert de HuggingFace](https://huggingface.co/collections/AgentPublic/albert-662a1d95c93a47aca5cecc82)
-- `ALBERT_MODE` : le mode d'Albert à utiliser pour le bot (exemple : `ALBERT_MODE='rag'`). Pour plus d'informations, consultez la documentation de l'API Albert
-- `CONVERSATION_OBSOLESCENCE` : le temps en secondes après lequel une conversation se remet automatiquement à zéro (exemple : `CONVERSATION_OBSOLESCENCE=3600` pour une heure). Par défaut : `3600` (une heure)
+### Lancer le bot
 
-
-### Utilisation en dehors de Docker
-
-Pour lancer le bot en dehors de Docker :
-```bash
-cd app
-./.venv/bin/python3 .
-```
-
-
-### Utilisation avec Docker
-
-1. Créez un fichier `.env` à la racine du projet avec les variables d'environnement mentionnées dans [app/.env.example](./app/.env.example) y compris celles mentionnées dans la section *"For docker-compose deployment"*
-
-2. Lancer le container du bot à la racine du projet :
+Pour lancer le bot executez :
 ```bash
-docker compose up --detach
+python app
 ```
 
 
@@ -99,15 +72,7 @@ Le premier sync est assez long, et a priori non bloquant. Si vous avez une inter
 
 Le projet est en open source, sous [licence MIT](LICENSES/MIT.txt). Toutes les contributions sont bienvenues, sous forme de pull requests ou d'ouvertures d'issues sur le [repo officiel GitHub](https://github.com/etalab-ia/albert-tchapbot).
 
-Avant de contribuer au dépôt, il est nécessaire d'initialiser les _hooks_ de _pre-commit_ :
-```bash
-pre-commit install
-```
-
-Si vous ne pouvez pas utiliser de pre-commit, il est nécessaire de formatter, linter et trier les imports avec [Ruff](https://docs.astral.sh/ruff/) :
-```bash
-ruff check --fix --select I .
-```
+Pour commencer, consultez [CONTRIBUTING.md](CONTRIBUTING.md).
 
 
 ### Licence
@@ -137,7 +102,7 @@ The project is a fork of [tchap_bot](https://code.peren.fr/open-source/tchapbot)
 
 Contains:
 - `app/.`: the codebase for the Albert Tchap bot
-- `app/matrix_bot`: a library to be able to make Matrix bots
+- `app/matrix_bot`: a library that wraps [matrix-nio](https://github.com/matrix-nio/matrix-nio) to make Matrix bots
 
 
 ### Local Installation
@@ -165,40 +130,15 @@ Create the environment file `app/.env` with the connection information (or provi
 cp app/.env.example app/.env
 ```
 
-The following environment variables must be entered:
-
-- `JOIN_ON_INVITE`: optional boolean to enable or disable automatic acceptance of invitations to Tchap rooms (example: `JOIN_ON_INVITE=True`. Default: `False`).
-- `SALT`: it is advisable to change the salt value to avoid having the default one. However, it must not change between sessions.
-- `MATRIX_HOME_SERVER`: the URL of the Matrix server to be used (example: `MATRIX_HOME_SERVER=“https://matrix.agent.ministere_example.tchap.gouv.fr”`).
-- `MATRIX_BOT_USERNAME`: the Matrix bot username (example: `MATRIX_BOT_USERNAME=“tchapbot@ministere_example.gouv.fr”`)
-- `MATRIX_BOT_PASSWORD`: the Matrix bot user password
-- `ERRORS_ROOM_ID`: the Tchap room ID where errors will be sent (example: `ERRORS_ROOM_ID=“!roomid:matrix.agent.ministere_example.tchap.gouv.fr”`). **Warning**: the bot must be invited to this room to be able to send error messages!
-
-For the bot to connect to Albert API, you also need to provide the following variables:
-- `USER_ALLOWED_DOMAINS`: list of allowed email domains for Tchap users to interact with the bot (example: `USER_ALLOWED_DOMAINS='["ministere.gouv.fr"]'`. Default: `["*"]` (all domains are allowed))
-- `GROUPS_USED=['albert']`: allows, in this example, to activate all commands that are part of the albert group
-- `ALBERT_API_URL`: the URL of the Albert API to consume
-- `ALBERT_API_TOKEN`: the API token used to authorize the bot to consume the Albert API. For more info, check the Albert API documentation
-- `ALBERT_MODEL_NAME`: the name of the model to use for the bot (example: `ALBERT_MODEL_NAME='AgentPublic/albertlight-7b'`). For more info, check the Albert API documentation and the [Albert models hub on HuggingFace](https://huggingface.co/collections/AgentPublic/albert-662a1d95c93a47aca5cecc82).
-- `ALBERT_MODE`: the mode of Albert to use for the bot (example: `ALBERT_MODE='rag'`). For more info, check the Albert API documentation
-- `CONVERSATION_OBSOLESCENCE` : the time in seconds after which a conversation automatically resets (example: `CONVERSATION_OBSOLESCENCE=3600` for one hour). Default: `3600` (one hour)
+The set of available environment variables is documented in the following file: [app/config.py](./app/config.py)
 
-### Usage outside of Docker
+### Run the bot
 
-To launch the bot outside of Docker:
+To launch the bot:
 ```bash
-cd app
-./.venv/bin/python3 .
+python app
 ```
 
-### Usage with Docker
-
-1. Create a `.env` file at the root of the project with the environment variables mentioned in [app/.env.example](./app/.env.example), including those mentionned in the *"For docker-compose deployment"* section
-
-2. Launch the bot container at the root of the project:
-```bash
-docker compose up --detach
-```
 
 ### Troubleshooting
 
@@ -208,15 +148,7 @@ The first sync is quite long, and apparently non-blocking. If you interact with
 
 This project is open source, under the [MIT license](LICENSES/MIT.txt). All contributions are welcome, in the form of pull requests or issue openings on the [repo officiel GitHub](https://github.com/etalab-ia/albert-tchapbot).
 
-Before contributing to the repository, it is necessary to initialize the pre-commit hooks:
-```bash
-pre-commit install
-```
-
-If you cannot use pre-commit, it is necessary to format, lint, and sort imports with [Ruff](https://docs.astral.sh/ruff/) before committing:
-```bash
-ruff check --fix --select I .
-```
+To get started, take a look at [CONTRIBUTING.md](CONTRIBUTING.md).
 
 ### License
 

diff --git a/app/.env.example b/app/.env.example
@@ -3,8 +3,7 @@
 #
 # SPDX-License-Identifier: CC0-1.0
 
-VERBOSE=False
-SYSTEMD_LOGGING=True
+LOG_LEVEL=10
 JOIN_ON_INVITE=True
 SALT=b"\xce,\xa1\xc6lY\x80\xe3X}\x91\xa60m\xa8N"
 MATRIX_HOME_SERVER="https://matrix.agent.ministere_example.tchap.gouv.fr"
@@ -15,5 +14,5 @@ USER_ALLOWED_DOMAINS='["ministere_example.gouv.fr", "ministere_example2.gouv.fr"
 GROUPS_USED='["basic", "albert"]'
 ALBERT_API_URL="https://albert-server-url.example.com"
 ALBERT_API_TOKEN="INSERT_YOUR_TOKEN"
-ALBERT_MODEL_NAME="AgentPublic/guillaumetell-7b"
+ALBERT_MODEL="AgentPublic/llama3-instruct-8b"
 ALBERT_MODE="rag"
diff --git a/app/_version.py b/app/_version.py
@@ -0,0 +1 @@
+__version__ = "0.2.0"
diff --git a/app/bot.py b/app/bot.py
@@ -3,11 +3,14 @@
 #
 # SPDX-License-Identifier: MIT
 
-from commands import command_registry
-from config import env_config
+import time
+
 from matrix_bot.bot import MatrixBot
 from matrix_bot.config import logger
 
+from commands import command_registry
+from config import env_config
+
 # TODO/IMPROVE:
 # - if albert-bot is invited in a salon, make it answer only when if it is tagged.
 # - !models: show available models.
@@ -37,4 +40,14 @@ def main():
     #    await tchap_bot.matrix_client.send_markdown_message(room_id, command_registry.get_help())
     # tchap_bot.callbacks.register_on_startup(startup_action)
 
-    tchap_bot.run()
+    n_tries = 4
+    err = None
+    for i in range(n_tries):
+        try:
+            tchap_bot.run()
+        except Exception as err:
+            logger.error(f"Bot startup failed with error: {err}")
+            time.sleep(3)
+
+    if err:
+        raise err
diff --git a/app/bot_msg.py b/app/bot_msg.py
@@ -0,0 +1,74 @@
+from config import APP_VERSION, COMMAND_PREFIX, Config
+
+
+class AlbertMsg:
+    common_msg_prefixes = [
+        "👋 Bonjour, je suis **Albert**",
+        "🤖 Configuration actuelle",
+        "\u26a0\ufe0f **Erreur**",
+        "\u26a0\ufe0f **Commande inconnue**",
+        "**La conversation a été remise à zéro**",
+        "🤖 Albert a échoué",
+    ]
+    shorts = {
+        "help": f"Pour retrouver ce message informatif, tapez `{COMMAND_PREFIX}aide`. Pour les geek tapez `{COMMAND_PREFIX}aide -v`.",
+        "reset": f"Pour ré-initialiser notre conversation, tapez `{COMMAND_PREFIX}reset`",
+        "conversation": f"Pour activer/désactiver le mode conversation, tapez `{COMMAND_PREFIX}conversation`",
+        "debug": f"Pour afficher des informations sur la configuration actuelle, `{COMMAND_PREFIX}debug`",
+        "model": f"Pour modifier le modèle, tapez `{COMMAND_PREFIX}model MODEL_NAME`",
+        "mode": f"Pour modifier le mode du modèle (c'est-à-dire le modèle de prompt utilisé), tapez `{COMMAND_PREFIX}mode MODE`",
+        "sources": f"Pour obtenir les sources utilisées pour générer ma dernière réponse, tapez `{COMMAND_PREFIX}sources`",
+    }
+
+    failed = "🤖 Albert a échoué à répondre. Veuillez réessayez dans un moment."
+
+    reset = "**La conversation a été remise à zéro**. Vous pouvez néanmoins toujours répondre dans un fil de discussion."
+
+    user_not_allowed = "Albert est en phase de test et n'est pas encore disponible pour votre utilisateur. Contactez [email protected] pour demander un accès."
+
+    domain_not_allowed = "Albert n'est pas encore disponible pour votre domaine. Merci de rester en contact, il sera disponible après une phase beta test."
+
+    def error_debug(reason, config):
+        msg = f"\u26a0\ufe0f **Albert API error**\n\n{reason}\n\n- Albert API URL: {config.albert_api_url}\n- Matrix server: {config.matrix_home_server}"
+        return msg
+
+    def help(model_url, model_short_name, cmds):
+        msg = "👋 Bonjour, je suis **Albert**, votre **assistant automatique dédié aux questions légales et administratives** mis à disposition par la **DINUM**. Je suis actuellement en phase de **test**.\n\n"
+        msg += f"J'utilise le modèle de langage _[{model_short_name}]({model_url})_ et j'ai été alimenté par des bases de connaissances gouvernementales, comme les fiches pratiques de service-public.fr éditées par la Direction de l'information légale et administrative (DILA).\n\n"
+        msg += "Maintenant que nous avons fait plus connaissance, quelques **règles pour m'utiliser** :\n\n"
+        msg += "🔮 Ne m'utilisez pas pour élaborer une décision administrative individuelle.\n\n"
+        msg += "❌ **Ne me transmettez pas** :\n"
+        msg += "- des **fichiers** (pdf, images, etc.) ;\n"
+        msg += "- des données permettant de **vous** identifier ou **d'autres personnes** ;\n"
+        msg += "- des données **confidentielles** ;\n\n"
+        msg += "Enfin, quelques informations pratiques :\n\n"
+        msg += "🛠️ **Pour gérer notre conversation** :\n"
+        msg += "- " + "\n- ".join(cmds)
+        msg += "\n\n"
+        msg += "📁 **Sur l'usage des données**\nLes conversations sont stockées de manière anonyme. Elles me permettent de contextualiser les conversations et l'équipe qui me développe les utilise pour m'évaluer et analyser mes performances.\n\n"
+        msg += "📯 Nous contacter : [email protected]"
+
+        return msg
+
+    def commands(cmds):
+        msg = "Les commandes spéciales suivantes sont disponibles :\n\n"
+        msg += "- " + "\n- ".join(cmds)
+        return msg
+
+    def unknown_command(cmds_msg):
+        msg = f"\u26a0\ufe0f **Commande inconnue**\n\n{cmds_msg}"
+        return msg
+
+    def reset_notif(delay_min):
+        msg = f"Comme vous n'avez pas continué votre conversation avec Albert depuis plus de {delay_min} minutes, **la conversation a été automatiquement remise à zéro**. Vous pouvez néanmoins toujours répondre dans un fil de discussion.\n\n"
+        msg += "Entrez **!aide** pour obtenir plus d'informatin sur ma paramétrisatiion."
+        return msg
+
+    def debug(config: Config):
+        msg = "🤖 Configuration actuelle :\n\n"
+        msg += f"- Version: {APP_VERSION}\n"
+        msg += f"- API: {config.albert_api_url}\n"
+        msg += f"- Model: {config.albert_model}\n"
+        msg += f"- Mode: {config.albert_mode}\n"
+        msg += f"- With history: {config.albert_with_history}\n"
+        return msg