Skip to content
This repository has been archived by the owner on Jul 2, 2024. It is now read-only.

Commit

Permalink
squash! Prettier
Browse files Browse the repository at this point in the history
  • Loading branch information
akalia25 authored and actions-user committed Jan 31, 2024
1 parent 0ed8f13 commit 3fa4ab7
Showing 1 changed file with 14 additions and 6 deletions.
20 changes: 14 additions & 6 deletions content/departments/data-analytics/architecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,26 +29,34 @@ For more info on our data stack tools, see the [tools page](tools.md).

### Transcript Events from DotCom Users

For DotCom users we are permitted to store transcript data. To ensure safe handling of this sensitive data and restricting access. The following event pipeline has been built on top of the telemetry-v2 archetiture; and routes flagged transcript events seperately.
For DotCom users we are permitted to store transcript data. To ensure safe handling of this sensitive data and restricting access. The following event pipeline has been built on top of the telemetry-v2 archetiture; and routes flagged transcript events seperately.

#### Considerations:

1. Transcript data can only be collected through v2 telemetry and stored within `privateMetadata` field of the event
2. Transcript data can only be collected for DotCom (Free) Users
3. Transcript data must include `recordsPrivateMetadataTranscript:1` in the `metadata` field of the event

#### Internal-only links to where the backend GCP changes live:

##### Pub/Sub Topic Subscriptions

- [event-telemtry-transcript-to-gcs](https://console.cloud.google.com/cloudpubsub/subscription/detail/event-telemetry-transcript-to-gcs?project=telligentsourcegraph)
- [event-telemtry-sub-v2 for non transcript events](https://console.cloud.google.com/cloudpubsub/subscription/detail/event-telemetry-sub-v2?project=telligentsourcegraph)
- [event-telemtry-transcript-to-bq](https://console.cloud.google.com/cloudpubsub/subscription/detail/event-telemetry-transcript-to-bq?project=telligentsourcegraph)
- [event-telemtry-transcript-to-bq](https://console.cloud.google.com/cloudpubsub/subscription/detail/event-telemetry-transcript-to-bq?project=telligentsourcegraph)

##### DataFlow
- [DataFlow Job](https://console.cloud.google.com/dataflow/jobs/us-central1/2024-01-18_11_35_42-11241333749608313305;graphView=0?project=telligentsourcegraph&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))) that runs on the topic subscription event-telemtry-transcript-to-bq to redact transcripts (responseText, PromptText)

- [DataFlow Job](<https://console.cloud.google.com/dataflow/jobs/us-central1/2024-01-18_11_35_42-11241333749608313305;graphView=0?project=telligentsourcegraph&pageState=(%22dfTime%22:(%22l%22:%22dfJobMaxTime%22))>) that runs on the topic subscription event-telemtry-transcript-to-bq to redact transcripts (responseText, PromptText)
- [DataFlow UDF](https://console.cloud.google.com/storage/browser/_details/sg-telemetry-v2-udf/udf/transcriptUDF.js;tab=live_object?project=telligentsourcegraph) that the DataFlow Job references (custom javascript function we can run on each event)

##### GCS
- [GCS Bucket](https://console.cloud.google.com/storage/browser/sourcegraph-cody/transcript?project=telligentsourcegraph&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false) where ML team will access transcripts

- [GCS Bucket](<https://console.cloud.google.com/storage/browser/sourcegraph-cody/transcript?project=telligentsourcegraph&pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&prefix=&forceOnObjectsSortingFiltering=false>) where ML team will access transcripts

##### BQ

- [event_telemetry table location](https://console.cloud.google.com/bigquery?project=telligentsourcegraph&pli=1&ws=!1m5!1m4!4m3!1stelligentsourcegraph!2stelemetry!3sevent_telemetry)

Below is a system diagram to illustrate the flow of transcript data further:
Below is a system diagram to illustrate the flow of transcript data further:
![image](https://storage.googleapis.com/sourcegraph-assets/handbook/BizOps/transcript-event-telemetry-pipeline.png)

0 comments on commit 3fa4ab7

Please sign in to comment.