Enable basic stats for non-delta tables #2

GeekSheikh · 2021-07-21T18:24:45Z

Curious to get your thoughts on collecting stats in more of the old-fashioned way to collect and store stats for non-delta tables. Obviously, this would be less performant but I think some customers would be willing to pay for it. There are a lot of extended options for Analyze Table that could be used to collect proper metrics outside of delta. :) thoughts?

ronanstokes-db · 2021-09-12T08:40:57Z

I created a table analyzer notebook some time ago to give to customers in advance of tuning exercises in situations where we don't have hands on access to their workspace. Its very useful in helping drive the conversations around the potential benefit they would get by moving to delta (i.e showing them small file issues, data size disparity etc)

This notebook performs analysis and to break down tables / datasets by:

average / min / max size of files
numbers of files and partitions
average / min / max size of files

i'll add a link to it

ronanstokes-db · 2021-09-12T08:46:59Z

Another thought is to create views that instrument access to tables to help in optimization decisions.

How would you do this ?

You could have a dataframe pipeline that creates a temporary view - but as part of the pipeline, we write stats to a side table on logical primary keys accessed etc. Would have to experiment with this to see how to do this efficiently.

My general thinking behind this is based on experimenting with code coverage - the question is how would you do this for data use ? Haven't thought through this fully

GeekSheikh added the enhancement New feature or request label Jul 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable basic stats for non-delta tables #2

Enable basic stats for non-delta tables #2

GeekSheikh commented Jul 21, 2021

ronanstokes-db commented Sep 12, 2021

ronanstokes-db commented Sep 12, 2021

Enable basic stats for non-delta tables #2

Enable basic stats for non-delta tables #2

Comments

GeekSheikh commented Jul 21, 2021

ronanstokes-db commented Sep 12, 2021

ronanstokes-db commented Sep 12, 2021