This Python script extracts text from a PDF file, reads data from a CSV file, and generates a new PDF that combines the extracted text with tabular data.
- Extracts text from a PDF file.
- Reads structured data from a CSV file.
- Generates a new PDF with extracted text and CSV data using
reportlab
. - Handles multi-page output if data exceeds one page.
Before running the script, ensure you have the following installed:
- Python 3.7 or later
- pip (Python package manager)
- Clone the repository:
git clone https://github.com/Sachinbisht27/pdf_csv_merger.git
cd pdf_csv_merger
- Install required dependencies:
pip install -r requirements.txt
Ensure you have the following Python libraries installed:
pip install PyPDF2 pandas reportlab
-
Place your input files in the same directory:
input.pdf
(PDF file to extract text from)data.csv
(CSV file containing structured data)
-
Run the script:
python script.py
- The generated output will be saved as
output.pdf
in the same directory.
pdf_csv_merger/
│── input.pdf # Source PDF file
│── data.csv # CSV file with data
│── output.pdf # Generated PDF file
│── script.py # Main Python script
│── requirements.txt # Required dependencies
│── README.md # Project documentation
The generated PDF (output.pdf
) will contain:
- Extracted text from
input.pdf
(truncated if too long). - CSV data formatted as plain text.
This project is licensed under the MIT License.