-
Notifications
You must be signed in to change notification settings - Fork 1
tested locally on my mac #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
dashboard/generate.py
Outdated
def check_fw_pipeline(self, ipaddr, port): | ||
if self.test: | ||
return 5 | ||
try: | ||
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) | ||
s.connect((ipaddr, port)) | ||
s.send(b"getstatus") | ||
data = s.recv(256) | ||
data2 = int(data.decode("utf-8").split()[1][0]) | ||
s.close() | ||
return data2 | ||
except: | ||
self.dprint("connection reset (by peer?)") | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest switching to JSON for status messages from the services we poll in the dashboard and be more verbose with the status codes.
pros:
-
easier parsing on the Python side: we can access human readable keys directly as a dictionary, rather than splitting strings and working with indexes[0][1].
-
more specific error handling - for example, if the JSON is malformed, we get a JSONDecodeError. If a key is missing, we can catch a KeyError and handle it accordingly.
Right now, any issue in the try
block is treated as a connection error, which can be misleading if the actual problem is with the message format.
class ServiceStatus(IntEnum):
WORKING = 5
IDLE = 1
ERROR = 2
STOPPED = 3
INVALID = 4
def request_status(host, port):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect((host, port))
s.send(b"getstatus")
# Read until newline delimiter
data = b""
while not data.endswith(b"\n"):
chunk = s.recv(256)
if not chunk:
break
data += chunk
message = data.decode('utf-8').strip()
try:
return json.loads(message)
except JSONDecodeError:
logger.error(f"Error decoding JSON: {message}")
return { "status": ServiceStatus.INVALID.value, "message": f"Didn't receive valid message from the service: {host}:{port}" }
then we can define check_status like that:
def check_status(self):
response = request_status(host, port)
try:
status = ServiceStatus(int(response["status"]))
except KeyError:
logger.error(f"Status message must contain 'status' field. Ignoring message: {response}")
except ValueError:
logger.error(f"Status message contains invalid 'status' value. Valid status values: {list(ServiceStatus)}. Ignoring message: {response}")
else:
logging.debug("Setting new status: {self.idx} {status.name}")
self.lab.setstatus(self.idx, status.value)
the debug message could look like this:
Setting new status: 127.0.0.1 WORKING
example message json:
{"status": 1, "message": "optional debug message"} # for filewriters
{"status": 5, "stage1": 5, "stage2": 5, "stage2": 5} # for efus
{"status": 2, "message": "Unhealthy", "uptime": "2d4h"} # anything else
If we agree on a common status message format for all services, we can reuse request_status
and only implement a separate check_status
function that extracts the necessary information from the JSON, depending on the service being polled.
I think this will make the code easier to read and debug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those ServiceStatus enumerations are what I was thinking of on the call @milnowak
I would suggest:
WRITING (green)
IDLE (green with lines)
STARTING (light green)
FINISHING (green)
FAILED (red)
and a status for OFFLINE (grey - response timed out; would need to be checked separately with ping like for EFUs)
Hopefully makes sense as a first pass from the meeting
c0ed4bc
to
bb9ff22
Compare
dashboard/generate.py
Outdated
for line in lines: | ||
try: | ||
obj = json.loads(line) | ||
# Look for any key ending with '.worker_state' | ||
for key, value in obj.items(): | ||
if key.endswith(".worker_state"): | ||
return int(value) | ||
except Exception: | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for line in lines: | |
try: | |
obj = json.loads(line) | |
# Look for any key ending with '.worker_state' | |
for key, value in obj.items(): | |
if key.endswith(".worker_state"): | |
return int(value) | |
except Exception: | |
continue | |
for line in lines: | |
obj = json.loads(line) | |
# Look for any key ending with '.worker_state' | |
for key, value in obj.items(): | |
if key.endswith(".worker_state"): | |
return int(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this try/except
block isn't necessary. When the line
is {}
, json.loads works fine and the inner for loop is simply skipped (no items, empty list). The only potential issue is a ValueError when converting to int but I’d rather let that fail. The error message will clearly show that the server sent something unexpected.
for line in lines: | |
try: | |
obj = json.loads(line) | |
# Look for any key ending with '.worker_state' | |
for key, value in obj.items(): | |
if key.endswith(".worker_state"): | |
return int(value) | |
except Exception: | |
continue | |
for line in lines: | |
obj = json.loads(line) | |
# Look for any key ending with '.worker_state' | |
for key, value in obj.items(): | |
if key.endswith(".worker_state"): | |
return int(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the input! 🙌
I'll have it in mind going forward
I actually didn't want to have this to be reviewed since I'm yet to have it working (even with these changes). I'll have the PR as DRAFT until it's ready to be reviewed. Sorry not making it clear 😞
dashboard/generate.py
Outdated
except Exception: | ||
self.dprint("connection reset (by peer?)") | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in this case the exception handling might be hiding the real error and just assuming the connection was reset. It could make it harder to understand what actually went wrong.
We should at least handle OSError and ConnectionError
except ConnectionError:
print("Connection reset by peer")
except OSError as e:
print(f"Socket error: {e}")
about workerState=0. does that mean the worker is idling? If the data is missing or couldn’t be parsed, maybe using something like -1 would make it clearer that the value is unknown, rather than a valid state.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ConnectionError
is a subclas of OSError
so catching only OSError
is just fine.
1ac5926
to
50c53f2
Compare
WIP - Will be back from DRAFT as soon as it is ready 👍 |
1d0838b
to
f3e5769
Compare
I added a status server to the filewriter
Since this is the first non EFU that reports a status a few changed were necessary:
type_fw = 5 was added
check_fw_pipeline() was added to parse the response from the server
check_service() was updated to handle fw
some formatting