Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Addon should auto restart #224

Open
MaisonGF opened this issue Nov 17, 2024 · 1 comment
Open

Addon should auto restart #224

MaisonGF opened this issue Nov 17, 2024 · 1 comment

Comments

@MaisonGF
Copy link

I got those errors last night after a brief electricity shortage :

websockets.exceptions.ConnectionClosedOK: sent 1000 (OK); then received 1000 (OK)
2024-11-16 23:54:17,127 - asyncio              - ERROR   - Future exception was never retrieved
future: <Future finished exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=<CloseCode.NORMAL_CLOSURE: 1000>, reason=''), False)>
websockets.exceptions.ConnectionClosedOK: sent 1000 (OK); then received 1000 (OK)
2024-11-16 23:54:17,127 - asyncio              - ERROR   - Future exception was never retrieved
future: <Future finished exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=<CloseCode.NORMAL_CLOSURE: 1000>, reason=''), False)>
websockets.exceptions.ConnectionClosedOK: sent 1000 (OK); then received 1000 (OK)
2024-11-16 23:54:17,145 - asyncio              - ERROR   - Future exception was never retrieved
future: <Future finished exception=ConnectionClosedOK(Close(code=1000, reason=''), Close(code=<CloseCode.NORMAL_CLOSURE: 1000>, reason=''), False)>
websockets.exceptions.ConnectionClosedOK: sent 1000 (OK); then received 1000 (OK)
2024-11-16 23:54:18,280 - Starting tydom2mqtt
2024-11-16 23:54:18,281 - Hassio environment detected: loading configuration from /data/options.json
2024-11-16 23:54:18,281 - Validating configuration ({

2024-11-16 23:54:18,280 - Starting tydom2mqtt - I restarted manually and no issue after that.

The add-on should crash on error to allow HA to reboot it, or reboot itself, otherwise it's not solid. On the original mrwiwi version the add-on rebooted itself (forever.py rebooted the main script after a crash), IMHO it was a lot more resilient (but it's not working anymore)

Could you please allow the add-on to reboot itself after 2 errors for example ?

Thanks in advance and for the good work !

@MaisonGF
Copy link
Author

MaisonGF commented Nov 17, 2024

For the time being I created a systemd service with a python script that monitors the docker add-on logs and restart it on a supervised installation :

You can set it up with that bash script (thanks chatgpt) :

#!/bin/bash

# Variables
SERVICE_NAME="monitor_tydom"
SCRIPT_PATH="/usr/local/bin/monitor_tydom.py"
SERVICE_FILE="/etc/systemd/system/${SERVICE_NAME}.service"

# Étape 1 : Créer le script Python
cat << 'EOF' > $SCRIPT_PATH
#!/usr/bin/env python3

import subprocess
import time

# Configuration
SEARCH_TERM = "tydom2mqtt"
ERROR_KEYWORD = "ERROR"

def get_container_name(search_term):
    try:
        # Trouve le conteneur qui correspond au terme de recherche
        result = subprocess.run(
            ["docker", "ps", "--format", "{{.Names}}"],
            stdout=subprocess.PIPE,
            text=True,
            check=True
        )
        containers = result.stdout.splitlines()
        for container in containers:
            if search_term in container:
                return container
    except subprocess.CalledProcessError as e:
        print(f"Error retrieving container list: {e}")
    return None

def monitor_logs(container_name):
    try:
        # Ouvrir un flux continu des logs Docker
        process = subprocess.Popen(
            ["docker", "logs", "-f", container_name],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            text=True
        )

        for line in process.stdout:
            # Vérifie si "ERROR" est dans les logs
            if ERROR_KEYWORD in line:
                print(f"Error detected: {line.strip()}")
                restart_container(container_name)

    except Exception as e:
        print(f"Exception occurred: {e}")

def restart_container(container_name):
    try:
        print(f"Restarting container {container_name}...")
        subprocess.run(["docker", "restart", container_name], check=True)
        print(f"Container {container_name} restarted successfully.")
    except subprocess.CalledProcessError as e:
        print(f"Failed to restart container: {e}")

if __name__ == "__main__":
    container_name = get_container_name(SEARCH_TERM)
    if container_name:
        print(f"Monitoring logs for container: {container_name}")
        while True:
            monitor_logs(container_name)
            # Pause pour éviter une boucle trop rapide en cas d'erreur
            time.sleep(5)
    else:
        print(f"No container found with search term: {SEARCH_TERM}")
EOF

chmod +x $SCRIPT_PATH

# Étape 2 : Créer le fichier systemd
cat << EOF > $SERVICE_FILE
[Unit]
Description=Monitor Tydom2MQTT Docker logs and restart on errors
After=docker.service
Requires=docker.service

[Service]
ExecStart=$SCRIPT_PATH
Restart=always
RestartSec=10
StandardOutput=journal
StandardError=journal
User=root

[Install]
WantedBy=multi-user.target
EOF

# Étape 3 : Activer le service
systemctl daemon-reload
systemctl enable ${SERVICE_NAME}.service
systemctl start ${SERVICE_NAME}.service

echo "Service ${SERVICE_NAME} créé et démarré avec succès !"
```
`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant