Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IT-3448: procedure to simplify listing file downloaders #27

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

xschildw
Copy link
Contributor

This PR creates a procedure to simplify list who dowloaded files within one or more subtrees.

Example: 'call list_downloaders('2021-01-01', '25982471')

Returns a table:

USER_ID USER_NAME EMAIL SYNAPSE_PROFILE NUM_DOWNLOADS EARLIEST_DOWNLOAD_TIME LATEST_DOWNLOAD_TIME
1223456 myUser [email protected] https://www.synapse.org/#!Profile:123456 10 2023-01-14 12:01:00 2023-02-01 01:00:04

Copy link

Quality Gate Passed Quality Gate passed

Issues
6 New issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

@xschildw xschildw requested a review from thomasyu888 February 28, 2024 16:45
@@ -0,0 +1,41 @@
CREATE OR REPLACE PROCEDURE list_downloaders(start_record_date string, entity_list string)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use the DATE type here? string is also fine, but you can do something like

record_date > DATE('2024-02-16')

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

query_str2 varchar default
'download (user_id, timestamp, entity_id) as (
select fd.user_id, fd.timestamp, fd.association_object_id as entity_id
from filedownload fd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to use this function, I assume you have to do:

USE SCHEMA synapse_data_warehouse.synapse # or ...dev.synapse

Is that right?

@xschildw xschildw requested a review from a team as a code owner January 28, 2025 16:53
AS
declare
rs resultset;
query_str1 varchar default (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could these query_str1 2, 3, 4 be renamed to something that describes what the query is collecting?
Like query_for_all_nodes

'download (user_id, timestamp, entity_id) as (
select fd.user_id, fd.timestamp, fd.association_object_id as entity_id
from filedownload fd
join filetree ft on ft.id = fd.association_object_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

node_latest only contains the entities that have a Synapse ID, however, files stored within a table that have a column type of File don't have Synapse IDs. These only have the ID of the file handle, and that file handle ID is attached to the table through a filehandleassociation table (See this related Jira ticket)

Suppose I wanted to get download statistics for one or more of these kinds of files, should this be supported in this stored procedure?

Phrased another way:
Given a table in Synapse that contains a column of type File, suppose I want to retrieve the stats on the downloads for one or more rows in that table. Is there (Or should there be) a stored procedure for this functionality?

Copy link
Contributor

@BryanFauble BryanFauble left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Take a look over the comments when you get a chance. Thanks for the changes Xa

@philerooski
Copy link
Collaborator

I see a couple of stored procedures which already exist in SYNAPSE_DATA_WAREHOUSE.SYNAPSE that look similar to what we're trying to merge here. Were those procedures created "off the books" (there's no record of their creation in schemachange) and this PR is meant to put one of those procedures back on the books?

image

@thomasyu888
Copy link
Member

Were those procedures created "off the books" (there's no record of their creation in schemachange) and this PR is meant to put one of those procedures back on the books?

I think that is what happened. @xschildw created these with his SYSADMIN account directly in the warehouse and then backfilled this PR. This is the first stored procedure to be added through schemachange. @philerooski i wonder if we can create a Jira ticket to finalize this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants