Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

service-wide data collection: collect #64

Open
4 tasks
jadudm opened this issue Jan 3, 2025 · 0 comments
Open
4 tasks

service-wide data collection: collect #64

jadudm opened this issue Jan 3, 2025 · 0 comments
Assignees
Labels

Comments

@jadudm
Copy link
Contributor

jadudm commented Jan 3, 2025

Problem

We need multiple kinds of data from our services.

  1. Service performance. The service itself has operating parameters we'd like to track. For example, we can use Golang's memory monitoring tools to capture and analyze things like heap usage, allocations, GCs, and so forth. This will help us spot leaks (if we have them) and so on.
  2. Service performance. At the application (as opposed to system) level, we may want to track the time of some activities. For example, how much time do we spend reading/writing objects to S3? Postgres? Unlike memory/heap usage, we have to explicitly capture this data as part of our application logic. (For example, we may want to periodically query user table statistics.
  3. Service use. What queries do we receive, from where, and at what rate? What is the clickthrough rate on queries?

These and other needs will arise. What we want is a common architecture that is in keeping with the rest of the stack for capturing this data.

How did we discover this problem?

This is fundamental to our understanding of our system's operation, and will be critical for reporting out to leadership as well as our stakeholders.

Job Story(s)

  • When I am asked by leadership about how people are using Search.gov, I want that data to be readily to hand, so I can tell a story about how we are serving our agency partners and the public.
  • When I need to debug a service, I want to have application performance data ready to go so I can understand the long-term behavior of our services, as well as any immediate logs around the malperformance issue in question.

What are we planning to do about it?

We need a new service. We'll call it collect for the moment.

This is described in jadudm/collect-design/docs/architecture/collect.md. That document should be updated to reflect design and discussion changes.

How will we measure success?

Tasks

Preview Give feedback
@jadudm jadudm added the story label Jan 3, 2025
@jadudm jadudm added this to jemison Jan 3, 2025
@github-project-automation github-project-automation bot moved this to triage in jemison Jan 3, 2025
@jadudm jadudm moved this from triage to backlog in jemison Jan 3, 2025
@jadudm jadudm moved this from backlog to triage in jemison Jan 14, 2025
@jadudm jadudm moved this from triage to backlog in jemison Jan 21, 2025
@jadudm jadudm moved this from backlog to underway in jemison Jan 21, 2025
@jadudm jadudm assigned jadudm and luisgmetzger and unassigned jadudm Jan 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: underway
Development

No branches or pull requests

2 participants