Parqueology

This is a personal fun project where I play with and learn about Parquet and Python.

What is Parquet?

Apache Parquet is a columnar storage file format optimized for use with big data processing frameworks. It is designed for efficiency and performance, making it a popular choice for storing and analyzing large datasets.

What does Parqueology do?

So far, I built two functionalities that were inspired by my struggle with parquet files.

1. CSV to Parquet Converter:

Using this tab, you can upload a csv file (max 200MB in size) and convert it to a Parquet file.

2. Parquet File Viewer:

Using this tab, you can upload a parquet file, and it will show you details about its metadata and schema, and a preview of the first 20 rows.

What's Inside the repo so far?

Right now, the whole app is in one file: app/streamlit_app.py

.
├── app                   # parqueology app files
├── data                  # Sample data
├── requirements.txt
└── README.md

Note: This app uses Streamlit, an open-source framework for building data apps. For simple experimental projects like this one, Streamlit is often recommended over heavier frameworks like Django or Flask.

Clone and run the parqueology app on Linux

Assuming you're a debian/ubuntu-based linux environment (personally I use Linux Mint), here are the basic steps you need to follow:

Ensure you have the latest version of python3, python3-pip, python3-venv and git

sudo apt update
sudo apt install --only-upgrade python3 python3-pip python3-venv git
python3 --version
pip3 --version
git --version

Clone this repo

cd ~Projects/ # Replace with your preferred directory for Git repos
git clone https://github.com/monjacoder/parqueology.git
cd parqueology

Set up a Virtual Environment

python3 -m venv venv
source venv/bin/activate

Install dependencies
```
pip install -r requirements.txt
```
Run the app using streamlit
```
streamlit run app/streamlit_app.py
```

Contributions

If you have suggestions, feedback, or ideas, feel free to open an issue or reach out to me!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parqueology

What is Parquet?

What does Parqueology do?

1. CSV to Parquet Converter:

2. Parquet File Viewer:

What's Inside the repo so far?

Clone and run the parqueology app on Linux

Contributions

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
app		app
data		data
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

monjacoder/parqueology

Folders and files

Latest commit

History

Repository files navigation

Parqueology

What is Parquet?

What does Parqueology do?

1. CSV to Parquet Converter:

2. Parquet File Viewer:

What's Inside the repo so far?

Clone and run the parqueology app on Linux

Contributions

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages