From 201bd7183f3824f2687d072fa46c755741d0fe7c Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Wed, 16 Aug 2023 14:22:31 +0200 Subject: [PATCH 01/13] docs: overhaul An overhaul of the documentation -- almost a complete re-write. Main differences: * Added quick-start examples and other examples * Added many internal and external links * Changed structure (notably *getting started* and *advanced usage* sections transformed into *usage*) --- docs/advanced.rst | 193 ------------------------------ docs/api_reference.rst | 2 +- docs/getting_started.rst | 37 ------ docs/index.rst | 22 ++-- docs/installation.rst | 68 ++++++----- docs/sections.rst | 41 +++++++ docs/usage.rst | 249 +++++++++++++++++++++++++++++++++++++++ 7 files changed, 337 insertions(+), 275 deletions(-) delete mode 100644 docs/advanced.rst delete mode 100644 docs/getting_started.rst create mode 100644 docs/sections.rst create mode 100644 docs/usage.rst diff --git a/docs/advanced.rst b/docs/advanced.rst deleted file mode 100644 index 3d51b70..0000000 --- a/docs/advanced.rst +++ /dev/null @@ -1,193 +0,0 @@ -.. _advanced_usage: - -Advanced usage of EDVART -=========================================== - -This section describes several concepts behind edvart -and how you can modify your report before exporting it. - -Report class ------------- - -The most important class of the package :py:class:`~edvart.report.Report`. -The report consists of sections, which can be added via methods of the `Report` class. -The report is empty by default. -The class :py:class:`~edvart.report.DefaultReport` is a subclass of `Report`, -which contains a default set of sections. - -With created instance of `Report` you can: - -1. Show the report directly in your jupyter notebook using :py:meth:`~edvart.report.Report.show` method. -2. Export a new notebook using :py:meth:`~edvart.report.Report.export_notebook` method and edit it by yourself. -3. Export the output to a HTML report. You can also use a `.tpl` template to style the report. - -Exporting to HTML ------------------ -Apart from directly exporting a `Report`, you may also wish to export a generated notebook to HTML. -To export a notebook, you may use a tool called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). -For example, to export a notebook called `notebook.ipynb` using the `lab` template, you may use the following command: - -.. code-block:: bash - - poetry run jupyter nbconvert --to html notebook.ipynb --template lab - - - -TimeseriesReport class ----------------------- - -This class is a special version of the :py:class:`~edvart.report.Report` class which is specifically meant to be used for analysis of time series. - -The main differences are a different set of default sections including :py:class:`~edvart.report_sections.TimeseriesAnalysis`, -which cannot be added to the normal `Report` and the assumption that analyzed data is time-indexed. - -Helper functions :py:func:`~edvart.utils.reindex_to_period` or :py:func:`~edvart.utils.reindex_to_datetime` -can be used to index a DataFrame by a `pd.PeriodIndex` or a `pd.DatetimeIndex` respectively. - -Each column is treated as a separate timeseries. - -.. code-block:: python - - df = pd.DataFrame( - data=[ - ['2018Q1', 120000, 11000], - ['2018Q2', 150000, 13000], - ['2018Q3', 100000, 12000], - ['2018Q4', 110000, 11000], - ['2019Q1', 120000, 13000], - ['2019Q2', 110000, 12000], - ['2019Q3', 120000, 14000], - ['2019Q4', 90000, 12000], - ['2020Q1', 130000, 12000], - ], - columns=['Quarter', 'Revenue', 'Profit'], - ) - - # Reindex using helper function to have 'Quarter' as index - df = edvart.utils.reindex_to_datetime(df, datetime_column='Quarter') - report_ts = edvart.TimeseriesReport(df) - report_ts.show() - - -Modifying sections ------------------- - -The report consists of sections. - -In current version of edvart you can find following sections: - -* TableOfContents - - - Provides table of contents with links to all other sections. - - :py:meth:`~edvart.report.ReportBase.add_table_of_contents` - -* DatasetOverview - - - Provides essential information about whole dataset - - :py:meth:`~edvart.report.ReportBase.add_overview` - -* UnivariateAnalysis - - - Provides analysis of individual columns - - :py:meth:`~edvart.report.ReportBase.add_univariate_analysis` - -* BivariateAnalysis - - - Provides analysis of pairs of columns - - :py:meth:`~edvart.report.ReportBase.add_bivariate_analysis` - -* MultivariateAnalysis - - - Provides analysis of all columns together. Currently features PCA, parallel coordinates and parallel categories subsections. - - :py:meth:`~edvart.report.ReportBase.add_multivariate_analysis` - -* GroupAnalysis - - - Provides analysis of each column when grouped a column or a set of columns. Includes basic information similar to dataset overview and univariate analysis, but on a per-group basis. - - :py:meth:`~edvart.report.ReportBase.add_group_analysis` - -* TimeseriesAnalysis - - - Provides analysis specific for time series. - - :py:meth:`~edvart.report.TimeseriesReport.add_timeseries_analysis` - - -The edvart API allows you to choose which sections you want in the final report -or modifying sections settings. - -Selection of sections -~~~~~~~~~~~~~~~~~~~~~ -You can add sections using methods `add_*` of the `Report` class. - -.. code-block:: python - - # Shows only univariate and bivariate analysis - import edvart - df = edvart.example_datasets.dataset_titanic() - report = ( - edvart.Report(df) - .add_univariate_analysis() - .add_bivariate_analysis() - ) - - -Sections configuration -~~~~~~~~~~~~~~~~~~~~~~ - -Each section can be also configured. -For example you can define which columns should be used or omitted. - -Or you can set section verbosity (described later). - -.. code-block:: python - - # Configures sections to omit or use specific columns - import edvart - - df = edvart.example_datasets.dataset_titanic() - report = edvart.Report(df) - - report.add_overview(omit_columns=["PassengerId"]).add_univariate_analysis( - use_columns=["Name", "Sex", "Age"] - ) - - - -.. _verbosity: - -Verbosity ---------- - -EDVART provides a concept of a verbosity that is used during *export* into jupyter notebook. -The verbosity helps us to generate a code with a specific level of detail. - -edvart supports three levels of verbosity: - -- LOW - - High level functions for whole sections are generated. User can modify the markdown description. -- MEDIUM - - edvart functions are generated. User can modify parameters of these functions. -- HIGH - - Raw code is generated. User can do very advanced modification such as changing visualisations style. - -The verbosity can be set to whole report or to each section separately. - -Examples: - -.. code-block:: python - - # Set default verbosity for all sections to Verbosity.MEDIUM - import edvart - from edvart import Verbosity - - df = edvart.example_datasets.dataset_titanic() - edvart.DefaultReport(df, verbosity=Verbosity.MEDIUM).export_notebook("test-export.ipynb") - - -.. code-block:: python - - # Set default verbosity to Verbosity.MEDIUM but use verbosity Verbosity.HIGH for univariate analysis - import edvart - - df = edvart.example_datasets.dataset_titanic() - edvart.DefaultReport(df, verbosity=Verbosity.MEDIUM, verbosity_univariate_analysis=Verbosity.HIGH).export_notebook("test-export.ipynb") diff --git a/docs/api_reference.rst b/docs/api_reference.rst index 624bc18..48de4f9 100644 --- a/docs/api_reference.rst +++ b/docs/api_reference.rst @@ -1,4 +1,4 @@ -API reference +API Reference ============= .. toctree:: diff --git a/docs/getting_started.rst b/docs/getting_started.rst deleted file mode 100644 index 352fa81..0000000 --- a/docs/getting_started.rst +++ /dev/null @@ -1,37 +0,0 @@ -Getting started -=============== - -1. Start with default exploratory analysis in jupyter notebook. - -.. code-block:: python - - import edvart - df = edvart.example_datasets.dataset_titanic() - edvart.DefaultReport(df).show() - -2. Generate report notebook - -.. code-block:: python - - import edvart - df = edvart.example_datasets.dataset_titanic() - report = edvart.DefaultReport(df) - report.export_notebook("titanic_report.ipynb") - -You can modify the generated notebook if you want to modify some settings. -For more advanced usage of edvart, please read the documentation section -:ref:`Advanced usage `. - -3. Generate HTML report - -.. code-block:: python - - import edvart - df = edvart.example_datasets.dataset_titanic() - report = edvart.DefaultReport(df) - report.export_html( - html_filepath="titanic_report.html", - dataset_name="Titanic", - dataset_description="Dataset that contains data for 891 of the real Titanic passengers.", - ) - diff --git a/docs/index.rst b/docs/index.rst index b235fd4..56e83aa 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -1,15 +1,12 @@ EDVART ================================ -Exploratory Data Analysis (EDA) is a very initial task a data scientist -or data analyst does when he reaches new data. -EDA refers to the critical process of performing -initial investigations on data to discover patterns, to spot -anomalies, to test hypothesis and to check assumptions with the help -of summary statistics and graphical representations. +Edvart is an open-source Python library designed to simplify and streamline +your exploratory data analysis (EDA) process. +Edvart supports different levels of customization: +from a default report generated in one line of code to a fully-customized +report down to the level of code generating the visualizations. -EDVART serves for speeding up EDA and for -creating Data analysis reports. Table of Contents ----------------- @@ -18,15 +15,16 @@ Table of Contents :maxdepth: 2 installation.rst - getting_started.rst - advanced.rst + usage.rst + sections.rst api_reference.rst .. include:: installation.rst -.. include:: getting_started.rst +.. include:: usage.rst +.. include:: sections.rst Links ------------ +----- * `GitHub repository `_ * :ref:`modindex` diff --git a/docs/installation.rst b/docs/installation.rst index 7b50e55..8e76a93 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -1,14 +1,15 @@ Installation ============ -edvart is distributed via PyPI. -Example installation with pip: +``edvart`` is distributed as a Python package via `PyPI `_. +It can be installed using ``pip``: .. code-block:: console $ pip install edvart -or you can add edvart into your environment file defined by `pyproject.toml`: +We recommend using `Poetry ` for dependency management. +You can add ``edvart`` into your Poetry environment file defined by ``pyproject.toml``: .. parsed-literal:: @@ -17,13 +18,22 @@ or you can add edvart into your environment file defined by `pyproject.toml`: edvart = "|VERSION|" +.. _extras: + Extras ------ -edvart also has an optional dependency "umap", which adds a plot called UMAP -(Universal Manifold Approximation) to Multivariate Analysis. To install edvart with the optional -extra, replace the above snippet of the `pyproject.toml` environment file with the following -snippet: +The ``edvart`` package has an optional dependency ``umap``, which adds a plot called `UMAP `_ +to :ref:`Multivariate Analysis `. + +To install edvart with the optional ``umap`` dependency via pip, run the following command: + +.. code-block:: console + + $ pip install "edvart[umap]" + +To install edvart with the optional extra using Poetry, replace the snippet +of the ``pyproject.toml`` environment file above with the following snippet: .. parsed-literal:: @@ -31,40 +41,34 @@ snippet: python = ">=3.8, <3.12" edvart = { version = "|VERSION|", extras = ["umap"] } -To install edvart with the optional "umap" dependency via pip, run the following command: +Rendering Plotly Interactive Plots +---------------------------------- -.. code-block:: console - - $ pip install "edvart[umap]" - - -Plotly -====== +Edvart uses JupyterLab ----------- +~~~~~~~~~~ To display interactive plots which use Plotly in JupyterLab, you need to install some JupyterLab extensions. -To install the required extensions, you can follow the full guide at -https://plot.ly/python/getting-started/ or simply run the following commands -(inside the JupyterLab container if running in a container): +You need to install the ``jupyter-dash`` extension to render Plotly plots in +JupyterLab. You can simply install it as a Python package to your environment, +e.g. via ``pip``: .. code-block:: console - jupyter labextension install @jupyter-widgets/jupyterlab-manager@1.1 --no-build - jupyter labextension install jupyterlab-plotly@1.5.2 --no-build - jupyter labextension install plotlywidget@1.5.2 --no-build - jupyter lab build + pip install jupyter-dash -Visual Studio Code ------------------- -To display interactive plots which use Plotly in Visual Studio Code notebooks, -you need to install the following extensions: - -* `Jupyter `_ is required to - run Jupyter notebooks in Visual Studio Code. -* `Jupyter Notebook Renderers `_ is required - to render Plotly plots in Visual Studio Code notebooks. +See https://plot.ly/python/getting-started/ for more information. +Visual Studio Code +~~~~~~~~~~~~~~~~~~ +The following extensions need to be installed to display Plotly +interactive plots in Visual Studio Code notebooks: + +* `Jupyter `_ + is required to + run Jupyter notebooks in Visual Studio Code. +* `Jupyter Notebook Renderers `_ + is required to render Plotly plots in Visual Studio Code notebooks. diff --git a/docs/sections.rst b/docs/sections.rst new file mode 100644 index 0000000..bf339f2 --- /dev/null +++ b/docs/sections.rst @@ -0,0 +1,41 @@ +Report Sections +--------------- + +Dataset Overview +~~~~~~~~~~~~~~~~ + - Provides essential information about whole dataset, such as inferred + data types, number of rows and columns, number of missing values, duplicates, etc. + - See :py:meth:`edvart.report.ReportBase.add_overview` + +Univariate Analysis +~~~~~~~~~~~~~~~~~~~ + - Provides analysis of individual columns. The analysis differs based on the data type of the column. + - See :py:meth:`edvart.report.ReportBase.add_univariate_analysis` + +Bivariate Analysis +~~~~~~~~~~~~~~~~~~ + - Provides analysis of pairs of columns, such as correlations, scatter plots, contingency tables, etc. + - See :py:meth:`edvart.report.ReportBase.add_bivariate_analysis` + + +.. _multivariate_analysis: + +Multivariate Analysis +~~~~~~~~~~~~~~~~~~~~~ + - Provides analysis of all columns together. + - Currently features PCA, parallel coordinates and parallel categories subsections. + Additionally, an UMAP section is included if the :ref:`extra` dependency ``umap`` is installed. + - See :py:meth:`edvart.report.ReportBase.add_multivariate_analysis` + +Group Analysis +~~~~~~~~~~~~~~ + - Provides analysis of each column when grouped a column or a set of columns. + Includes basic information similar to dataset overview and univariate analysis, + but on a per-group basis. + - See :py:meth:`edvart.report.ReportBase.add_group_analysis` + +Timeseries Analysis +~~~~~~~~~~~~~~~~~~~ + - Provides analysis specific for time series. + - Used with :py:class:`edvart.report.TimeseriesReport` + - See :py:meth:`edvart.report.TimeseriesReport.add_timeseries_analysis` diff --git a/docs/usage.rst b/docs/usage.rst new file mode 100644 index 0000000..fe0d973 --- /dev/null +++ b/docs/usage.rst @@ -0,0 +1,249 @@ +Usage +===== + +Quick Start +----------- + +Show a Default Report in a Jupyter Notebook +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + import edvart + df = edvart.example_datasets.dataset_titanic() + edvart.DefaultReport(df).show() + +Export the Report Code to a Jupyter Notebook +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + import edvart + df = edvart.example_datasets.dataset_titanic() + report = edvart.DefaultReport(df) + report.export_notebook( + "titanic_report.ipynb", + dataset_name="Titanic", + dataset_description="Dataset of 891 of the real Titanic passengers.", + ) + +The exported notebook contains the code which generates the report. +It can be modified to fine-tune the report. +The code can be exported with different levels of detail (see :ref:`verbosity`). + +Export a Report to HTML +~~~~~~~~~~~~~~~~~~~~~~~ + +.. code-block:: python + + import edvart + df = edvart.example_datasets.dataset_titanic() + report = edvart.DefaultReport(df) + report.export_html( + html_filepath="titanic_report.html", + dataset_name="Titanic", + dataset_description="Dataset of 891 of the real Titanic passengers.", + ) + + +Customizing the Report +---------------------- + +This section describes several concepts behind edvart and how a report +can be customized. + +Report Class +~~~~~~~~~~~~ + +The :py:class:`~edvart.report.Report` class is central to the edvart API. +A *Report* consists of sections, which can be added via methods of the :py:class:`~edvart.report.Report` class. +The class :py:class:`~edvart.report.DefaultReport` is a subclass of :py:class:`~edvart.report.Report`, +which includes a default set of sections. + +With an instance of :py:class:`~edvart.report.Report` you can: + +1. Show the report directly in a Jupyter notebook using the :py:meth:`~edvart.report.Report.show` method. +2. Export the code which generates the report to a new Jupyter notebook using + :py:meth:`~edvart.report.ReportBase.export_notebook` method. + The code can be exported with different levels of :ref:`verbosity `. + The notebook containing the exported code can be modified to fine-tune the report. +3. Export the output to a HTML file. You can specify an + `nbconvert template + `_ + to style the report. + + +Selection of Sections +~~~~~~~~~~~~~~~~~~~~~ +You can add sections using methods ``add_*`` (e.g. :py:meth:`edvart.report.ReportBase.add_overview`) of the :py:class:`~edvart.report.Report` class. + +.. code-block:: python + + # Include univariate and bivariate analysis + import edvart + df = edvart.example_datasets.dataset_titanic() + report = ( + edvart.Report(df) + .add_univariate_analysis() + .add_bivariate_analysis() + ) + +.. _sections-config: + +Configuration of Sections +~~~~~~~~~~~~~~~~~~~~~~~~~ + +Each section can be also configured. +For example you can define which columns should be used or omitted. + +.. code-block:: python + + import edvart + + df = edvart.example_datasets.dataset_titanic() + report = edvart.Report(df) + + report.add_overview(omit_columns=["PassengerId"]).add_univariate_analysis( + use_columns=["Name", "Sex", "Age"] + ) + +Subsections +*********** + +Some sections are made of subsections. For those, you can can configure which subsections are be included. + +.. code-block:: python + + import edvart + from edvart.report_sections.dataset_overview import Overview + + df = edvart.example_datasets.dataset_titanic() + report = edvart.Report(df) + + report.add_overview(subsections=[ + Overview.OverviewSubsection.QuickInfo, + Overview.OverviewSubsection.DataPreview, + ]) + + +.. _verbosity: + +Verbosity +~~~~~~~~~ + +A :py:class:`~edvart.report.Report` can be exported to a Jupyter notebook containing +the code which generates the report. The code can be exported with different levels of detail, +referred to as *verbosity*. + +It can be set on the level of the whole report or on the level of each +section or subsection separately (see :ref:`sections-config`). + +The verbosity set on a lower level overrides the verbosity set on a higher level, i.e. +the verbosity set on a subsection overrides the verbosity set on a section, which overrides +the verbosity set on the report. + +EDVART supports three levels of verbosity: + +LOW + High level functions for whole sections are exported, i.e. each the output + of each section is generated by a single function call. + Suitable for small modifications such as changing parameters of the functions, + adding commentary to the report, adding visualizations which are not in EDVART, etc. + +MEDIUM + Same as low for report sections which do not consist of subsections. + For report sections which consists of subsections, each subsection is + exported to a separate function call. + +HIGH + The definitions of (almost) all functions are exported. + The functions can be modified and used as a starting point for custom analysis. + + +Examples +******** + +.. code-block:: python + + # Set default verbosity for all sections to Verbosity.MEDIUM + import edvart + from edvart import Verbosity + + df = edvart.example_datasets.dataset_titanic() + edvart.DefaultReport(df, verbosity=Verbosity.MEDIUM).export_notebook("test-export.ipynb") + + +.. code-block:: python + + import edvart + from edvart import Verbosity + + + # Set report verbosity to Verbosity.MEDIUM but use verbosity Verbosity.HIGH for univariate analysis + df = edvart.example_datasets.dataset_titanic() + edvart.DefaultReport( + df, + verbosity=Verbosity.MEDIUM, + verbosity_univariate_analysis=Verbosity.HIGH, + ).export_notebook("exported-report.ipynb") + + +Exporting Notebooks to HTML +~~~~~~~~~~~~~~~~~~~~~~~~~~~ +An EDVART report :py:class:`~edvart.report.Report` can be directly exported +to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. + +To export a notebook to other formats including HTML, you may use a tool called +`jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). + +For example, to export a notebook called `notebook.ipynb` using the `lab` +template, you may use the following command: + +.. code-block:: bash + + poetry run jupyter nbconvert --to html notebook.ipynb --template lab + + +Reports for Time Series Datasets +-------------------------------- + +The class :py:class:`~edvart.report.TimeseriesReport` is a version +of the :py:class:`~edvart.report.Report` class which is specific for creating +reports on time series datasets. +There is also a :py:class:`~edvart.report.DefaultTimeseriesReport`, which contains +a default set of sections, similar to :py:class:`~edvart.report.DefaultReport`. + + +The main differences compared to the report for tabular data are: + +* a different set of default sections for :py:class:`~edvart.report.DefaultTimeseriesReport` +* :py:class:`~edvart.report_sections.TimeseriesAnalysis` section, which contains visualizations + for analyzing time series data +* the assumption that the input data is time-indexed and sorted by time. + +Helper functions :py:func:`edvart.utils.reindex_to_period` or :py:func:`edvart.utils.reindex_to_datetime` +can be used to index a DataFrame by a ``pd.PeriodIndex`` or a ``pd.DatetimeIndex`` respectively. + +Each column in the input data is treated as a separate time series. + +.. code-block:: python + + df = pd.DataFrame( + data=[ + ["2018Q1", 120000, 11000], + ["2018Q2", 150000, 13000], + ["2018Q3", 100000, 12000], + ["2018Q4", 110000, 11000], + ["2019Q1", 120000, 13000], + ["2019Q2", 110000, 12000], + ["2019Q3", 120000, 14000], + ["2019Q4", 160000, 12000], + ["2020Q1", 130000, 12000], + ], + columns=["Quarter", "Revenue", "Profit"], + ) + + # Reindex using helper function to have 'Quarter' as index + df = edvart.utils.reindex_to_datetime(df, datetime_column="Quarter") + report_ts = edvart.DefaultTimeseriesReport(df) + report_ts.show() From 354de63be900247859b1992081caa42dee0eff63 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:27:07 +0200 Subject: [PATCH 02/13] review: fix link to poetry --- docs/installation.rst | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/installation.rst b/docs/installation.rst index 8e76a93..c13d664 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -8,8 +8,7 @@ It can be installed using ``pip``: $ pip install edvart -We recommend using `Poetry ` for dependency management. -You can add ``edvart`` into your Poetry environment file defined by ``pyproject.toml``: +We recommend using `Poetry `_ for dependency management. .. parsed-literal:: From b536cb14d4742f5bf816770616e330676144b7bd Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:35:19 +0200 Subject: [PATCH 03/13] docs: reword installation --- docs/installation.rst | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/installation.rst b/docs/installation.rst index c13d664..3197014 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -9,6 +9,8 @@ It can be installed using ``pip``: $ pip install edvart We recommend using `Poetry `_ for dependency management. +To add ``edvart`` into a Poetry environment, add the following snippet +to the ``pyproject.toml`` environment definition file: .. parsed-literal:: @@ -22,16 +24,16 @@ We recommend using `Poetry `_ for dependency managem Extras ------ -The ``edvart`` package has an optional dependency ``umap``, which adds a plot called `UMAP `_ +Edvart has an optional dependency ``umap``, which adds a plot called `UMAP `_ to :ref:`Multivariate Analysis `. -To install edvart with the optional ``umap`` dependency via pip, run the following command: +To install Edvart with the optional ``umap`` dependency via pip, run the following command: .. code-block:: console $ pip install "edvart[umap]" -To install edvart with the optional extra using Poetry, replace the snippet +To install Edvart with the optional extra using Poetry, replace the snippet of the ``pyproject.toml`` environment file above with the following snippet: .. parsed-literal:: @@ -43,7 +45,7 @@ of the ``pyproject.toml`` environment file above with the following snippet: Rendering Plotly Interactive Plots ---------------------------------- -Edvart uses +Edvart uses `Plotly `_ to render interactive plots. JupyterLab ~~~~~~~~~~ @@ -51,9 +53,9 @@ JupyterLab To display interactive plots which use Plotly in JupyterLab, you need to install some JupyterLab extensions. -You need to install the ``jupyter-dash`` extension to render Plotly plots in -JupyterLab. You can simply install it as a Python package to your environment, -e.g. via ``pip``: +The extension ``jupyter-dash`` needs to be installed in order for Plotly plots +to be rendered correctly in JupyterLab. +It can be simply installed as a Python package, e.g. via ``pip``: .. code-block:: console From f693b46934e3b6f1eb44110e45174b9454ff0c15 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:35:35 +0200 Subject: [PATCH 04/13] review: add missing word --- docs/sections.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sections.rst b/docs/sections.rst index bf339f2..fb7efe9 100644 --- a/docs/sections.rst +++ b/docs/sections.rst @@ -29,7 +29,7 @@ Multivariate Analysis Group Analysis ~~~~~~~~~~~~~~ - - Provides analysis of each column when grouped a column or a set of columns. + - Provides analysis of each column when grouped by a column or a set of columns. Includes basic information similar to dataset overview and univariate analysis, but on a per-group basis. - See :py:meth:`edvart.report.ReportBase.add_group_analysis` From 169258f14ed4f9be113f5b25a93f120bc0caf9ad Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:37:12 +0200 Subject: [PATCH 05/13] review: reformat code snippets --- docs/usage.rst | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index fe0d973..695b0e7 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -81,6 +81,8 @@ You can add sections using methods ``add_*`` (e.g. :py:meth:`edvart.report.Repor # Include univariate and bivariate analysis import edvart + + df = edvart.example_datasets.dataset_titanic() report = ( edvart.Report(df) @@ -100,13 +102,17 @@ For example you can define which columns should be used or omitted. import edvart - df = edvart.example_datasets.dataset_titanic() - report = edvart.Report(df) - report.add_overview(omit_columns=["PassengerId"]).add_univariate_analysis( - use_columns=["Name", "Sex", "Age"] + df = edvart.example_datasets.dataset_titanic() + report = ( + edvart.Report(df) + .add_overview(omit_columns=["PassengerId"]) + .add_univariate_analysis( + use_columns=["Name", "Sex", "Age"] + ) ) + Subsections *********** @@ -117,6 +123,7 @@ Some sections are made of subsections. For those, you can can configure which su import edvart from edvart.report_sections.dataset_overview import Overview + df = edvart.example_datasets.dataset_titanic() report = edvart.Report(df) @@ -169,6 +176,7 @@ Examples import edvart from edvart import Verbosity + df = edvart.example_datasets.dataset_titanic() edvart.DefaultReport(df, verbosity=Verbosity.MEDIUM).export_notebook("test-export.ipynb") From 90cecb0585037ef300770793196fc22be0672969 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:37:37 +0200 Subject: [PATCH 06/13] review: reformulate verbosity descriptions --- docs/usage.rst | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index 695b0e7..c538b28 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -145,26 +145,26 @@ referred to as *verbosity*. It can be set on the level of the whole report or on the level of each section or subsection separately (see :ref:`sections-config`). -The verbosity set on a lower level overrides the verbosity set on a higher level, i.e. -the verbosity set on a subsection overrides the verbosity set on a section, which overrides +Specific verbosity overrides general verbosity, i.e. the verbosity set on a +subsection overrides the verbosity set on a section, which overrides the verbosity set on the report. EDVART supports three levels of verbosity: LOW - High level functions for whole sections are exported, i.e. each the output + High level functions for whole sections are exported, i.e. the output of each section is generated by a single function call. Suitable for small modifications such as changing parameters of the functions, adding commentary to the report, adding visualizations which are not in EDVART, etc. MEDIUM - Same as low for report sections which do not consist of subsections. - For report sections which consists of subsections, each subsection is + For report sections which consist of subsections, each subsection is exported to a separate function call. + Same as LOW for report sections which do not consist of subsections. HIGH The definitions of (almost) all functions are exported. - The functions can be modified and used as a starting point for custom analysis. + The functions can be modified or used as a starting point for custom analysis. Examples From 22e869606cc73c586d0cd658ee855f84edb18c42 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 14:38:27 +0200 Subject: [PATCH 07/13] docs: reword html export section --- docs/usage.rst | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index c538b28..ea48435 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -198,11 +198,11 @@ Examples Exporting Notebooks to HTML ~~~~~~~~~~~~~~~~~~~~~~~~~~~ -An EDVART report :py:class:`~edvart.report.Report` can be directly exported +A :py:class:`~edvart.report.Report` can be directly exported to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. -To export a notebook to other formats including HTML, you may use a tool called -`jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). +Jupyter notebooks can be exported to other formats including HTML, using a tool +called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). For example, to export a notebook called `notebook.ipynb` using the `lab` template, you may use the following command: From 6235ddf215f52bf5ce192f945c2b4d8c5f603545 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 15:25:27 +0200 Subject: [PATCH 08/13] review: add jupyter-dash install instruction for poetry --- docs/installation.rst | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/docs/installation.rst b/docs/installation.rst index 3197014..421fbd8 100644 --- a/docs/installation.rst +++ b/docs/installation.rst @@ -61,6 +61,15 @@ It can be simply installed as a Python package, e.g. via ``pip``: pip install jupyter-dash +to install `plotly-dash` to a Poetry environment, add the following line +under ``tool.poetry.dependencies`` in the ``pyproject.toml`` environment definition file: + + +.. code-block:: toml + + jupyter-dash = "^0.4.2" + + See https://plot.ly/python/getting-started/ for more information. Visual Studio Code From 63464da181871e73f65033c563d1bc10c0f1681b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 15:34:54 +0200 Subject: [PATCH 09/13] docs: reformat code examples --- docs/usage.rst | 32 +++++++++++++++++++------------- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index ea48435..8b02711 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -10,6 +10,8 @@ Show a Default Report in a Jupyter Notebook .. code-block:: python import edvart + + df = edvart.example_datasets.dataset_titanic() edvart.DefaultReport(df).show() @@ -19,6 +21,8 @@ Export the Report Code to a Jupyter Notebook .. code-block:: python import edvart + + df = edvart.example_datasets.dataset_titanic() report = edvart.DefaultReport(df) report.export_notebook( @@ -37,6 +41,8 @@ Export a Report to HTML .. code-block:: python import edvart + + df = edvart.example_datasets.dataset_titanic() report = edvart.DefaultReport(df) report.export_html( @@ -85,7 +91,7 @@ You can add sections using methods ``add_*`` (e.g. :py:meth:`edvart.report.Repor df = edvart.example_datasets.dataset_titanic() report = ( - edvart.Report(df) + edvart.Report(df) .add_univariate_analysis() .add_bivariate_analysis() ) @@ -100,17 +106,15 @@ For example you can define which columns should be used or omitted. .. code-block:: python - import edvart + import edvart - df = edvart.example_datasets.dataset_titanic() - report = ( - edvart.Report(df) - .add_overview(omit_columns=["PassengerId"]) - .add_univariate_analysis( - use_columns=["Name", "Sex", "Age"] + df = edvart.example_datasets.dataset_titanic() + report = ( + edvart.Report(df) + .add_overview(omit_columns=["PassengerId"]) + .add_univariate_analysis(use_columns=["Name", "Sex", "Age"]) ) - ) Subsections @@ -127,10 +131,12 @@ Some sections are made of subsections. For those, you can can configure which su df = edvart.example_datasets.dataset_titanic() report = edvart.Report(df) - report.add_overview(subsections=[ - Overview.OverviewSubsection.QuickInfo, - Overview.OverviewSubsection.DataPreview, - ]) + report.add_overview( + subsections=[ + Overview.OverviewSubsection.QuickInfo, + Overview.OverviewSubsection.DataPreview, + ] + ) .. _verbosity: From ef20f57a5c0f19e16777a24e91f403b1badb77a8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Mon, 4 Sep 2023 16:13:48 +0200 Subject: [PATCH 10/13] review: remove nbconvert command example --- docs/usage.rst | 7 ------- 1 file changed, 7 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index 8b02711..0e58ffc 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -210,13 +210,6 @@ to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. Jupyter notebooks can be exported to other formats including HTML, using a tool called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). -For example, to export a notebook called `notebook.ipynb` using the `lab` -template, you may use the following command: - -.. code-block:: bash - - poetry run jupyter nbconvert --to html notebook.ipynb --template lab - Reports for Time Series Datasets -------------------------------- From 34aa256b3447a209fc8f5b0e29e593ef0443fb20 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Tue, 5 Sep 2023 10:53:51 +0200 Subject: [PATCH 11/13] docs: add key features --- docs/index.rst | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/docs/index.rst b/docs/index.rst index 56e83aa..b3f37c0 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -7,6 +7,21 @@ Edvart supports different levels of customization: from a default report generated in one line of code to a fully-customized report down to the level of code generating the visualizations. +Key Features +------------ +* **One-line Reports**: Generate a comprehensive set of pandas DataFrame visualizations using a single Python statement. + Edvart supports: + * Data overview, + * Univariate analysis, + * Bivariate analysis, + * Multivariate analysis, + * Grouped analysis, + * Time series analysis. + +* **Customizable Reports**: Produce, iterate, and style detailed reports in Jupyter notebooks and HTML formats. +* **Flexible API**: From high-level simplicity in a single line of code to detailed control, choose the API level that fits your needs. +* **Interactive Visualizations**: Many of the visualizations are interactive and can be used to explore the data in detail. + Table of Contents ----------------- From 0c04d0d3602e92dcfce60c69daff5e4aafb654e4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Wed, 6 Sep 2023 12:45:01 +0200 Subject: [PATCH 12/13] review: move HTML exporting section under export example --- docs/usage.rst | 15 ++++++--------- 1 file changed, 6 insertions(+), 9 deletions(-) diff --git a/docs/usage.rst b/docs/usage.rst index 0e58ffc..1e47241 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -52,6 +52,12 @@ Export a Report to HTML ) +A :py:class:`~edvart.report.Report` can be directly exported +to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. + +Jupyter notebooks can be exported to other formats including HTML, using a tool +called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). + Customizing the Report ---------------------- @@ -202,15 +208,6 @@ Examples ).export_notebook("exported-report.ipynb") -Exporting Notebooks to HTML -~~~~~~~~~~~~~~~~~~~~~~~~~~~ -A :py:class:`~edvart.report.Report` can be directly exported -to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. - -Jupyter notebooks can be exported to other formats including HTML, using a tool -called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). - - Reports for Time Series Datasets -------------------------------- From e62704fa9b01e53019e66fe9a78dad517f04a4c9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Michal=20Bel=C3=A1k?= Date: Wed, 6 Sep 2023 12:45:20 +0200 Subject: [PATCH 13/13] docs: add note on exporting notebooks to html --- docs/usage.rst | 2 ++ 1 file changed, 2 insertions(+) diff --git a/docs/usage.rst b/docs/usage.rst index 1e47241..49da871 100644 --- a/docs/usage.rst +++ b/docs/usage.rst @@ -57,6 +57,8 @@ to HTML via the :py:meth:`~edvart.report.ReportBase.export_html` method. Jupyter notebooks can be exported to other formats including HTML, using a tool called `jupyter nbconvert` (https://nbconvert.readthedocs.io/en/latest/). +This can be useful to create a HTML report from a notebook which was exported +using the :py:meth:`~edvart.report.ReportBase.export_notebook` method. Customizing the Report ----------------------