Skip to content

tested locally on my mac #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 7 commits into
base: master
Choose a base branch
from
Draft

tested locally on my mac #35

wants to merge 7 commits into from

Conversation

mortenjc
Copy link
Collaborator

@mortenjc mortenjc commented May 8, 2025

I added a status server to the filewriter

Since this is the first non EFU that reports a status a few changed were necessary:

type_fw = 5 was added
check_fw_pipeline() was added to parse the response from the server
check_service() was updated to handle fw
some formatting

Comment on lines 142 to 161
def check_fw_pipeline(self, ipaddr, port):
if self.test:
return 5
try:
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect((ipaddr, port))
s.send(b"getstatus")
data = s.recv(256)
data2 = int(data.decode("utf-8").split()[1][0])
s.close()
return data2
except:
self.dprint("connection reset (by peer?)")
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest switching to JSON for status messages from the services we poll in the dashboard and be more verbose with the status codes.

pros:

  • easier parsing on the Python side: we can access human readable keys directly as a dictionary, rather than splitting strings and working with indexes[0][1].

  • more specific error handling - for example, if the JSON is malformed, we get a JSONDecodeError. If a key is missing, we can catch a KeyError and handle it accordingly.

Right now, any issue in the try block is treated as a connection error, which can be misleading if the actual problem is with the message format.

class ServiceStatus(IntEnum):
    WORKING = 5
    IDLE = 1
    ERROR = 2
    STOPPED = 3
    INVALID = 4
def request_status(host, port):
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
        s.connect((host, port))
        s.send(b"getstatus")

        # Read until newline delimiter
        data = b""
        while not data.endswith(b"\n"):
            chunk = s.recv(256)
            if not chunk:
                break
            data += chunk

        message = data.decode('utf-8').strip()
        try:
            return json.loads(message)
        except JSONDecodeError:
            logger.error(f"Error decoding JSON: {message}")
            return { "status": ServiceStatus.INVALID.value, "message": f"Didn't receive valid message from the service: {host}:{port}" }

then we can define check_status like that:

def check_status(self):
    response = request_status(host, port)
    try:
        status = ServiceStatus(int(response["status"]))
    except KeyError:
        logger.error(f"Status message must contain 'status' field. Ignoring message: {response}")
    except ValueError:
        logger.error(f"Status message contains invalid 'status' value. Valid status values: {list(ServiceStatus)}. Ignoring message: {response}")
    else:
        logging.debug("Setting new status: {self.idx} {status.name}")
        self.lab.setstatus(self.idx, status.value)

the debug message could look like this:

Setting new status: 127.0.0.1 WORKING

example message json:

{"status": 1, "message": "optional debug message"} # for filewriters

{"status": 5, "stage1": 5, "stage2": 5, "stage2": 5} # for efus

{"status": 2, "message": "Unhealthy", "uptime": "2d4h"} # anything else

If we agree on a common status message format for all services, we can reuse request_status and only implement a separate check_status function that extracts the necessary information from the JSON, depending on the service being polled.

I think this will make the code easier to read and debug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those ServiceStatus enumerations are what I was thinking of on the call @milnowak

I would suggest:
WRITING (green)
IDLE (green with lines)
STARTING (light green)
FINISHING (green)
FAILED (red)

and a status for OFFLINE (grey - response timed out; would need to be checked separately with ping like for EFUs)

Hopefully makes sense as a first pass from the meeting

Comment on lines 157 to 165
for line in lines:
try:
obj = json.loads(line)
# Look for any key ending with '.worker_state'
for key, value in obj.items():
if key.endswith(".worker_state"):
return int(value)
except Exception:
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for line in lines:
try:
obj = json.loads(line)
# Look for any key ending with '.worker_state'
for key, value in obj.items():
if key.endswith(".worker_state"):
return int(value)
except Exception:
continue
for line in lines:
obj = json.loads(line)
# Look for any key ending with '.worker_state'
for key, value in obj.items():
if key.endswith(".worker_state"):
return int(value)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this try/except block isn't necessary. When the line is {}, json.loads works fine and the inner for loop is simply skipped (no items, empty list). The only potential issue is a ValueError when converting to int but I’d rather let that fail. The error message will clearly show that the server sent something unexpected.

Suggested change
for line in lines:
try:
obj = json.loads(line)
# Look for any key ending with '.worker_state'
for key, value in obj.items():
if key.endswith(".worker_state"):
return int(value)
except Exception:
continue
for line in lines:
obj = json.loads(line)
# Look for any key ending with '.worker_state'
for key, value in obj.items():
if key.endswith(".worker_state"):
return int(value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the input! 🙌
I'll have it in mind going forward

I actually didn't want to have this to be reviewed since I'm yet to have it working (even with these changes). I'll have the PR as DRAFT until it's ready to be reviewed. Sorry not making it clear 😞

Comment on lines 167 to 169
except Exception:
self.dprint("connection reset (by peer?)")
return 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in this case the exception handling might be hiding the real error and just assuming the connection was reset. It could make it harder to understand what actually went wrong.

We should at least handle OSError and ConnectionError

except ConnectionError:
    print("Connection reset by peer")
except OSError as e:
    print(f"Socket error: {e}")

about workerState=0. does that mean the worker is idling? If the data is missing or couldn’t be parsed, maybe using something like -1 would make it clearer that the value is unknown, rather than a valid state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ConnectionError is a subclas of OSError so catching only OSError is just fine.

@Andre-Lx-Costa Andre-Lx-Costa marked this pull request as draft June 20, 2025 12:30
@Andre-Lx-Costa
Copy link

WIP - Will be back from DRAFT as soon as it is ready 👍

@Andre-Lx-Costa Andre-Lx-Costa self-assigned this Jun 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants