Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Attendance and notes for 8/13 meeting #145

Merged
merged 1 commit into from
Aug 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 32 additions & 5 deletions docs/meetings/2024/TechArea20240813.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,19 @@

- **Coordinates:** Conference: +1-415-655-0002, PIN: 146 266 9392,
<https://morgridge-org.zoom.us/j/91987518094> (password sent separately)
- **Attending:**
- **Attending:** BrianL, Derek, Marco, Mat, Matt

## Announcements

- Doc focus this Friday
- PATh PI meeting for the next three days; Brian will be on call
- Tim T out part of this week and all of next week

### Triage Duty

Triage duty shifts Tue-Mon

- This week: TimT
- This week: Matt (replacing TimT)
- Next week: ?
- 7 (+4) open FreshDesk tickets
- 0 (+0) open GGUS ticket
Expand All @@ -30,21 +34,44 @@ Triage duty shifts Tue-Mon

Doc focus this Friday

- AI (Matt): Kuantifier status
- AI (Mat): ARM
- AI (Matt): Kuantifier status:

If Kubernetes Jobs don't have a resource request, then the processour count would show up as 0;
failing loudly on such as misconfiguration is not really possible, but this can be added as a warning,
along with notes in the Helm chart and an install guide.

- AI (Mat): ARM:

Koji builders are ready; no mass rebuid because that would require bumping the package release numbers.
Mat has been creating tickets to rebuild individual software as needed.

Next step for ARM is to add integration testing -- we will need to make VM images for the VM Universe jobs.
Nebraska has an ARM machine in a Kubernetes cluster; we may be able to make use of that.

- AI (Matt): EL9 repo
- AI (Mat): Contribute VOMS patches upstream
- AI (BrianL): Prepare tickets for this Friday's doc focus
- BrianL working on purchasing USB hubs for Yubikeys

### Discussion

None this week
- Marco continuing work on GlideinWMS development release; release canddiates are available.
Fermilab is shutting down for the last week of August and first week of September.

- Derek reporting low GPU utilization for NRAO Glideins on NRP;
Brian will show him how to log in to the PATh Facility AP so he can see the status of NRAO's jobs.

- Matt fixed the repo-rsync server, which was down due to a mismatch of VLANs between the Service and Pods.

### Support Update

- JLab (Mat): Support setting up their Pelican Origin
- JLab (Matt): Troubleshoot crashing OAuth credmon
- PATh Facility (Mat): missing GPUs
- PATH-UNL had nodes shut down due to overheating;
- PATH-Expanse had GPU pods (and CPU pods) failing to start -- at first we thought it was the
return of a volume mounting issue we've seen before (see https://opensciencegrid.atlassian.net/issues/INF-1672);
later we discovered that it was due to an outage at SDSC.

## DevOps

Expand Down
Loading