Visualizing Big Datasets: Tools, Pitfalls, Experimental Example

Talk about Data visualization in Science. Based on my experience with ratCAVE project and suggested approaches in Python I created a talk for my fellow MSNE students. The talk covers main problems with use of scatter plot for big, convolved data and explains how to address it.

Summary:

What should we keep in mind, when working with big datasets? In case of Scatter plots - 3 hyperparameters:

overplotting - avoid obscuring the data
saturation - look howmany points overlapping cause saturation of intensity points
undersampling - taking a subset might not be an answer

Or instead you can work with Heatmaps and remember to address following problems (1 hyperparameter):

undersaturation
pick the color map in accordance to the

Talk explains how to get from left to right: impretable visualization of datasets.

Presented on 01.06.2018 at the retreat for Master of Science in Neuroengineering students.

Installing

To run jupyter notebook as slides I used:

RISE

The talk was based on the use of:

pandas
seaborn
datashader

Acknowledgments

Nicholas A. Del Grosso - for supervision and inspiration for this talk
Mohammad Bashiri - for feedback

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!