Skip to content

A Python script that extracts text from a PDF, reads data from a CSV file, and generates a new PDF combining the extracted text and CSV data. Uses PyPDF2, pandas, and reportlab for PDF handling and text rendering.

License

Notifications You must be signed in to change notification settings

Sachinbisht27/pdf_csv_merger

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF & CSV Merger

Overview

This Python script extracts text from a PDF file, reads data from a CSV file, and generates a new PDF that combines the extracted text with tabular data.

Features

  • Extracts text from a PDF file.
  • Reads structured data from a CSV file.
  • Generates a new PDF with extracted text and CSV data using reportlab.
  • Handles multi-page output if data exceeds one page.

Prerequisites

Before running the script, ensure you have the following installed:

  • Python 3.7 or later
  • pip (Python package manager)

Installation

  1. Clone the repository:
git clone https://github.com/Sachinbisht27/pdf_csv_merger.git  
cd pdf_csv_merger  
  1. Install required dependencies:
pip install -r requirements.txt  

Dependencies

Ensure you have the following Python libraries installed:

pip install PyPDF2 pandas reportlab  

Usage

  1. Place your input files in the same directory:

    • input.pdf (PDF file to extract text from)
    • data.csv (CSV file containing structured data)
  2. Run the script:

python script.py  
  1. The generated output will be saved as output.pdf in the same directory.

File Structure

pdf_csv_merger/  
│── input.pdf     # Source PDF file  
│── data.csv      # CSV file with data  
│── output.pdf    # Generated PDF file  
│── script.py     # Main Python script  
│── requirements.txt # Required dependencies  
│── README.md     # Project documentation  

Example Output

The generated PDF (output.pdf) will contain:

  • Extracted text from input.pdf (truncated if too long).
  • CSV data formatted as plain text.

License

This project is licensed under the MIT License.

About

A Python script that extracts text from a PDF, reads data from a CSV file, and generates a new PDF combining the extracted text and CSV data. Uses PyPDF2, pandas, and reportlab for PDF handling and text rendering.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages