service-wide data collection: `collect` #64

jadudm · 2025-01-03T14:19:44Z

Problem

We need multiple kinds of data from our services.

Service performance. The service itself has operating parameters we'd like to track. For example, we can use Golang's memory monitoring tools to capture and analyze things like heap usage, allocations, GCs, and so forth. This will help us spot leaks (if we have them) and so on.
Service performance. At the application (as opposed to system) level, we may want to track the time of some activities. For example, how much time do we spend reading/writing objects to S3? Postgres? Unlike memory/heap usage, we have to explicitly capture this data as part of our application logic. (For example, we may want to periodically query user table statistics.
Service use. What queries do we receive, from where, and at what rate? What is the clickthrough rate on queries?

These and other needs will arise. What we want is a common architecture that is in keeping with the rest of the stack for capturing this data.

This is fundamental to our understanding of our system's operation, and will be critical for reporting out to leadership as well as our stakeholders.

When I am asked by leadership about how people are using Search.gov, I want that data to be readily to hand, so I can tell a story about how we are serving our agency partners and the public.
When I need to debug a service, I want to have application performance data ready to go so I can understand the long-term behavior of our services, as well as any immediate logs around the malperformance issue in question.

We need a new service. We'll call it collect for the moment.

This is described in jadudm/collect-design/docs/architecture/collect.md. That document should be updated to reflect design and discussion changes.

Give feedback

The collect service is integrated into the stack.
A schema exists for the top-level data object, and is applied to all data.
A schema is added to collect for every data object we collect from other services.
Data is stored to S3 as JSON objects
Options

The text was updated successfully, but these errors were encountered:

jadudm added the story label Jan 3, 2025

jadudm added this to jemison Jan 3, 2025

github-project-automation bot moved this to triage in jemison Jan 3, 2025

jadudm moved this from triage to backlog in jemison Jan 3, 2025

jadudm moved this from backlog to triage in jemison Jan 14, 2025

jadudm moved this from triage to backlog in jemison Jan 21, 2025

jadudm moved this from backlog to underway in jemison Jan 21, 2025

jadudm assigned jadudm and luisgmetzger and unassigned jadudm Jan 21, 2025

jadudm mentioned this issue Jan 22, 2025

Design of a collect service #92

Open

13 tasks