Automated File Exploration System

Automated exploration of files with structured data on them (csv, txt, Excel) in a folder structure to extract metadata and potential usage of information.

If you have a bunch of sctructured data in plain files, this library is for you.

Installation

pip install -q git+https://github.com/darenasc/auto-fes.git
pip install -q ydata_profiling sweetviz # to make profiling tools work

How to use it

Command line

afes --help

afes explore --help
afes explore <PATH_TO_FILES_TO_EXPLORE>

afes generate --help
afes generate <PATH_TO_FILES_TO_EXPLORE> # or
afes generate <PATH_TO_FILES_TO_EXPLORE> <OUTPUT_FILE_WITH_CODE>

afes profile --help
afes profile <PATH_TO_FILES_TO_EXPLORE> # or
afes profile <PATH_TO_FILES_TO_EXPLORE> <OUTPUTS_PATH_FOR_REPORTS> # or
afes profile <PATH_TO_FILES_TO_EXPLORE> <OUTPUTS_PATH_FOR_REPORTS> <PROFILE_TOOL> # 'ydata-profiling' or 'sweetviz'

Python scripts and notebooks

from afes import afe

# Path to folder with files to be explored
TARGET_FOLDER = "<PATH_TO_FILES_TO_EXPLORE>"
OUTPUT_FOLDER = "<PATH_TO_OUTPUTS>"

# Run exploration on the files
df_files = afe.explore_files(TARGET_FOLDER)

# Generate pandas code to load the files
afe.generate_code(df_files)

# Run profiling on each file
afe.profile_files(df_files, profile_tool="ydata-profiling", output_path=OUTPUT_FOLDER)
afe.profile_files(df_files, profile_tool="sweetviz", output_path=OUTPUT_FOLDER)

What can you do with AFES

Explore
Generate code
Profile

flowchart LR
    Explore --> Generate
    Explore --> Profile
    Generate --> PandasCode
    Profile --> ydata-profile@{ shape: doc }
    Profile --> sweetviz@{ shape: doc }

Explore

from afes import afe

# Path to folder with files to be explored
TARGET_FOLDER = "<PATH_TO_FILES_TO_EXPLORE>"

# Run exploration on the files
df_files = afe.explore_files(TARGET_FOLDER)
df_files

The df_files dataframe will look like the following table, depending on the files you plan to explore.

|      | path                                              | name                     | extension |    size | human_readable |  rows | separator |
| ---: | :------------------------------------------------ | :----------------------- | :-------- | ------: | :------------- | ----: | :-------- |
|    1 | /content/sample_data/auto_mpg.csv                 | auto_mpg                 | .csv      |   20854 | 20.4 KiB       |   399 | comma     |
|    2 | /content/sample_data/car_evaluation.csv           | car_evaluation           | .csv      |   51916 | 50.7 KiB       |  1729 | comma     |
|    3 | /content/sample_data/iris.csv                     | iris                     | .csv      |    4606 | 4.5 KiB        |   151 | comma     |
|    4 | /content/sample_data/wine_quality.csv             | wine_quality             | .csv      |  414831 | 405.1 KiB      |  6498 | comma     |
|    5 | /content/sample_data/california_housing_test.csv  | california_housing_test  | .csv      |  301141 | 294.1 KiB      |  3001 | comma     |
|    6 | /content/sample_data/california_housing_train.csv | california_housing_train | .csv      | 1706430 | 1.6 MiB        | 17001 | comma     |

Checkout the example.py file and then run it from a terminal with python as the following code, or using a Jupyter notebook.

Generate code

Using the dataframe df_files generated in the explore phase, you can generate working python pandas code to be used.

The function generate_files() will generate python code to load the files using pandas.

from afes import afe

# Path to folder with files to be explored
TARGET_FOLDER = "<PATH_TO_FILES_TO_EXPLORE>"
OUTPUT_FOLDER = "<PATH_TO_OUTPUTS>"

df_files = afe.explore_files(TARGET_FOLDER)
afe.generate_code(df_files)

The generated code will look like this:

### Start of the code ###
import pandas as pd

df_auto_mpg = pd.read_csv('/content/sample_data/auto_mpg.csv', sep = ',')
df_car_evaluation = pd.read_csv('/content/sample_data/car_evaluation.csv', sep = ',')
df_iris = pd.read_csv('/content/sample_data/iris.csv', sep = ',')
df_wine_quality = pd.read_csv('/content/sample_data/wine_quality.csv', sep = ',')
df_california_housing_test = pd.read_csv('/content/sample_data/california_housing_test.csv', sep = ',')
df_california_housing_train = pd.read_csv('/content/sample_data/california_housing_train.csv', sep = ',')

### End of the code ###

"code.txt" has the generated Python code to load the files.

By default the code is printed to the standard output but also written by default to the ./code.txt file.

Note: you can replace the .txt extension by .py to make it a working Python script.

Profile

Using the dataframe df_files generated in the explore phase, the function profile(df_files) will automatically load and profiline the files using ydata-profiling or sweetviz.

# Path to folder with files to be explored
TARGET_FOLDER = "<PATH_TO_FILES_TO_EXPLORE>"
OUTPUT_FOLDER = "<PATH_TO_OUTPUTS>"

# Run exploration on the files
df_files = afe.explore_files(TARGET_FOLDER)

afe.profile_files(df_files, profile_tool="ydata-profiling", output_path=OUTPUT_FOLDER) # or
afe.profile_files(df_files, profile_tool="sweetviz", output_path=OUTPUT_FOLDER)

By default, it will process the files using ydata-profiling by size order starting with the smallest file. It will create the reports and export them in HTML format. It will store the reports in the same directory where the code is running or it save them in a given directory with the output_path = '<YOUR_OUTPUT_PATH>' argument.

Contributing

Open an issue to request more
functionalities or feedback.

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
src/afes		src/afes
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
Pipfile		Pipfile
README.md		README.md
mkdocs.yml		mkdocs.yml
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Automated File Exploration System

Installation

How to use it

Command line

Python scripts and notebooks

What can you do with AFES

Explore

Generate code

Profile

Contributing

About

Releases

Packages

Languages

License

darenasc/auto-fes

Folders and files

Latest commit

History

Repository files navigation

Automated File Exploration System

Installation

How to use it

Command line

Python scripts and notebooks

What can you do with AFES

Explore

Generate code

Profile

Contributing

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages