You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Service performance. The service itself has operating parameters we'd like to track. For example, we can use Golang's memory monitoring tools to capture and analyze things like heap usage, allocations, GCs, and so forth. This will help us spot leaks (if we have them) and so on.
Service performance. At the application (as opposed to system) level, we may want to track the time of some activities. For example, how much time do we spend reading/writing objects to S3? Postgres? Unlike memory/heap usage, we have to explicitly capture this data as part of our application logic. (For example, we may want to periodically query user table statistics.
Service use. What queries do we receive, from where, and at what rate? What is the clickthrough rate on queries?
These and other needs will arise. What we want is a common architecture that is in keeping with the rest of the stack for capturing this data.
How did we discover this problem?
This is fundamental to our understanding of our system's operation, and will be critical for reporting out to leadership as well as our stakeholders.
Job Story(s)
When I am asked by leadership about how people are using Search.gov, I want that data to be readily to hand, so I can tell a story about how we are serving our agency partners and the public.
When I need to debug a service, I want to have application performance data ready to go so I can understand the long-term behavior of our services, as well as any immediate logs around the malperformance issue in question.
What are we planning to do about it?
We need a new service. We'll call it collect for the moment.
Problem
We need multiple kinds of data from our services.
These and other needs will arise. What we want is a common architecture that is in keeping with the rest of the stack for capturing this data.
How did we discover this problem?
This is fundamental to our understanding of our system's operation, and will be critical for reporting out to leadership as well as our stakeholders.
Job Story(s)
What are we planning to do about it?
We need a new service. We'll call it
collect
for the moment.This is described in jadudm/collect-design/docs/architecture/collect.md. That document should be updated to reflect design and discussion changes.
How will we measure success?
Tasks
The text was updated successfully, but these errors were encountered: