-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IT-3448: procedure to simplify listing file downloaders #27
base: dev
Are you sure you want to change the base?
Conversation
|
@@ -0,0 +1,41 @@ | |||
CREATE OR REPLACE PROCEDURE list_downloaders(start_record_date string, entity_list string) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use the DATE
type here? string is also fine, but you can do something like
record_date > DATE('2024-02-16')
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
query_str2 varchar default | ||
'download (user_id, timestamp, entity_id) as ( | ||
select fd.user_id, fd.timestamp, fd.association_object_id as entity_id | ||
from filedownload fd |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to use this function, I assume you have to do:
USE SCHEMA synapse_data_warehouse.synapse # or ...dev.synapse
Is that right?
|
AS | ||
declare | ||
rs resultset; | ||
query_str1 varchar default ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these query_str1
2, 3, 4 be renamed to something that describes what the query is collecting?
Like query_for_all_nodes
'download (user_id, timestamp, entity_id) as ( | ||
select fd.user_id, fd.timestamp, fd.association_object_id as entity_id | ||
from filedownload fd | ||
join filetree ft on ft.id = fd.association_object_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
node_latest
only contains the entities that have a Synapse ID, however, files stored within a table that have a column type of File
don't have Synapse IDs. These only have the ID of the file handle, and that file handle ID is attached to the table through a filehandleassociation
table (See this related Jira ticket)
Suppose I wanted to get download statistics for one or more of these kinds of files, should this be supported in this stored procedure?
Phrased another way:
Given a table in Synapse that contains a column of type File
, suppose I want to retrieve the stats on the downloads for one or more rows in that table. Is there (Or should there be) a stored procedure for this functionality?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look over the comments when you get a chance. Thanks for the changes Xa
I think that is what happened. @xschildw created these with his SYSADMIN account directly in the warehouse and then backfilled this PR. This is the first stored procedure to be added through schemachange. @philerooski i wonder if we can create a Jira ticket to finalize this. |
a6d2502
to
948e550
Compare
This PR creates a procedure to simplify list who dowloaded files within one or more subtrees.
Example: 'call list_downloaders('2021-01-01', '25982471')
Returns a table: