README

Presentation

irspdf is a simple textual information retrieval system for pdf documents.

Text is extracted from pdf with pdfplumber.

Standard text preprocessing for information retrieval is applied:

StopWord removal
Stemming
Punctuation removal
Lowercase conversion

The ranking function used is BM25.

Installation

Install with pip

pip install irspdf

OR install from github

git clone https://github.com/Jibril-Frej/irspdf.git
cd irspdf && python setup.py install

Usage

Build a collection

from irspdf import build
build(folder_path, collection_path)

folder_path : path of the folder that contains all the pdf files to include to the collection.

collection_path : file where the collection will be saved

Query the collection

from irspdf import query
query(collection_path)

collection_path : file where the collection is saved

Update the collection

from irspdf import update
update(folder_path, collection_path)

folder_path : path of the folder that contains all the pdf files to add to the collection.

collection_path : file where the original collection is saved

Useful links

Documentation: https://irspdf.readthedocs.io/en/latest/.

Source Code: https://github.com/Jibril-Frej/irspdf

Package: https://pypi.org/project/irspdf/

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
docs		docs
irspdf		irspdf
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Presentation

Installation

Install with pip

OR install from github

Usage

Build a collection

Query the collection

Update the collection

Useful links

About

Releases

Packages

Languages

Jibril-Frej/irspdf

Folders and files

Latest commit

History

Repository files navigation

README

Presentation

Installation

Install with pip

OR install from github

Usage

Build a collection

Query the collection

Update the collection

Useful links

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages