Data Analyzer is a web application built using Dash and Flask on Python that lets you analyze tabular data spreadsheets using Pandas Profiler. It creates a report to describe each column in the tabular data using some commonly used statistical measures. The app also allows users to download this generated HTML report. Use the deployed application here and follow these steps to deploy this app locally.
For each column, the following statistical measures are generated:
- Type inference: detect the types of columns in a dataframe.
- Essentials: type, unique values, missing values
- Quantile statistics like minimum value, Q1, median, Q3, maximum, range, interquartile range
- Descriptive statistics like mean, mode, standard deviation, sum, median absolute deviation, coefficient of variation, kurtosis, skewness
- Most frequent values
- Histogram
- Correlations highlighting of highly correlated variables, Spearman, Pearson and Kendall matrices
- Missing values matrix, count, heatmap and dendrogram of missing values
- Text analysis learn about categories (Uppercase, Space), scripts (Latin, Cyrillic) and blocks (ASCII) of text data.
git clone https://github.com/dkedar7/Data-Analyzer
python -m venv DataAnalyzer
DataAnalyzer\Source\Activate
MacOS or Linux:
source DataAnalyzer/bin/activate
cd Analyzer/
pip install -r requirements.txt
python run.py
Use localhost:8080
to interact with the application.
The demo deployment utilizes Google Build to containerize the application, Google Container Registry for storing and managing a container and Google Cloud Run to deploy it as a web endpoint.
- Data Analyzer currently only supports tabular data, in either a .csv, .xlsx, or .xls formats
- Upload fails if there are any inconsistencies with the input file
- The app's ability to handle large data depends on memory allocated by the host machine. The demo deployment of the app may crash if memory exceeds
- Pandas Profiling can sometimes fail to auto-infer the
datetime
datatype.
Data analyzer uses the MIT license.
You need Python 3 to run this application. Other dependencies can be found in the requirements.txt file.