Skip to content

Commit

Permalink
Cleanup & version bump.
Browse files Browse the repository at this point in the history
Prepearing for next version release, with some breaking changes in API.
  • Loading branch information
dipietrantonio committed Apr 19, 2020
1 parent eec5dae commit 4dfeb1c
Show file tree
Hide file tree
Showing 7 changed files with 28 additions and 35 deletions.
11 changes: 5 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ of a PDF document and uses its entries to give the user the ability to locate PD
the file and parse them into suitable Python objects.

**DISCLAIMER**: this package hasn't reached a stable version (>= 1.0.0) yet. Although the parser
API is quite simple it may change suddenly from one release to anther. All breaking changes will
be properly notified in the release notes.
API is quite simple it may change suddenly from one release to the next one. All breaking changes
will be properly notified in the release notes.


## Quick example
Expand Down Expand Up @@ -76,7 +76,7 @@ a better way to understand the PDF than writing a parser for it?

## Documentation

You can read the documentation for this package on [readthedocs.io](https://pdf4py.readthedocs.io/en/latest/).
You can read the documentation on [readthedocs.io](https://pdf4py.readthedocs.io/en/latest/).


## Contributing
Expand All @@ -92,8 +92,7 @@ Contributions are more than welcome! Please, when writing code or documentation
- to adopt as much as possible a test-driven development process. Each contribution must be accompanied by a
test addition/modification.

If you are wondering in which way you can help, check the [TO-DO list](todo.md). For now it will do as a
simple "road map".
If you are wondering in which way you can help, check the [TODO list](https://github.com/Halolegend94/pdf4py/blob/master/TODO.md). For now it will do as a simple "road map".

If you found a bug, please file a new issue here on GitHub. Proposing fixes, changes and additions can
If you have found a bug, please file a new issue here on GitHub. Proposing fixes, changes and additions can
be done through a pull request.
6 changes: 2 additions & 4 deletions todo.md → TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,7 @@ and can assume the value `LOW`, `MEDIUM` or `HIGH`.
- [HIGH] (TO DO) To implement tests for some of the stream filters.
- [MEDIUM] (TO DO) To analyze performances and to compare them with other libraries.
- [LOW] (TO DO) To go through the 2.0 standard and see if there are major changes.
- [LOW] (TO DO) To implement support for the 'Extends' keyword in a object stream.
- [HIGH] (TO DO) Not to decrypt string in a cross reference dictionary.
- [HIGH] (TO DO) Not to decrypt strings in a cross reference dictionary.
- [HIGH] (TO DO) High some information about the cross reference table or about Cross Reference
Streams (their identifiers).
- [MEDIUM] (TO DO) Better handling of Compressed Object Streams (parse them only once and save them
in a Python object)
- [MEDIUM] (TO DO) Better handling of Compressed Object Streams.
2 changes: 1 addition & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
author = 'Cristian Di Pietrantonio'

# The full version, including alpha/beta/rc tags
release = '0.0.2'
release = '0.1.0'


# -- General configuration ---------------------------------------------------
Expand Down
25 changes: 13 additions & 12 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
@@ -1,3 +1,11 @@
.. toctree::
:maxdepth: 3
:hidden:

tutorials
modules/index
standard_coverage

pdf4py's documentation
==================================

Expand All @@ -8,7 +16,6 @@ extraction). In particular, it defines the class `Parser` that reads the *Cross
of a PDF document and uses its entries to give the user the ability to locate PDF objects within
the file and parse them into suitable Python objects.


.. image:: https://travis-ci.org/Halolegend94/pdf4py.svg?branch=master
:target: https://travis-ci.org/Halolegend94/pdf4py
:alt: Build Status
Expand All @@ -20,6 +27,11 @@ the file and parse them into suitable Python objects.
.. image:: https://img.shields.io/pypi/dm/pdf4py?color=brightgreen
:target: https://pypi.org/project/pdf4py/

**DISCLAIMER**: this package hasn't reached a stable version (>= 1.0.0) yet. Although the parser
API is quite simple it may change suddenly from one release to the next one. All breaking changes
will be properly notified in the release notes.


Quick example
-------------

Expand Down Expand Up @@ -81,14 +93,3 @@ there was not an established Python module to easily parse a PDF document. In or
why I delved into the PDF 1.7 specification: since that moment I've got interested more and more
in the inner workings of one of the most important and ubiquitous file format. And what's
a better way to understand the PDF than writing a parser for it?


Table of Contents
-----------------

.. toctree::
:maxdepth: 3

tutorials
modules/index
standard_coverage
10 changes: 5 additions & 5 deletions docs/source/standard_coverage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ PDF 1.7 standard coverage
In this file the progress in implementing all the features in the `PDF 1.7 standard <http://wwwimages.adobe.com/www.adobe.com/content/dam/acom/en/devnet/pdf/pdfs/PDF32000_2008.pdf>`_ is tracked.
Chapters 1 to 6 of the standard are devoted to give a general introduction to the standard whereas Chapter 7 is where the PDF syntax is
defined. It follows that the best way to keep track of the progress is to specify for each section whether the illustrated features have
been implemented or not. As the development goes on, the various sections decribing features that have been supported will be marked with
been implemented or not. As the development goes on, the various sections describing features that have been supported will be marked with
an check symbol (✓) in the following table. Moreover, the tilde symbol (~) means almost every aspect is supported or that the implementation
seems to work but more testing is necessary. Finally, the cross symbol (✗) informs that there is no support at this stage for the associated
feature.
Expand Down Expand Up @@ -68,15 +68,15 @@ feature.
+-------------------+---------------------------------+----------------------------------------+
| 7.5.6 | Incremental updates ||
+-------------------+---------------------------------+----------------------------------------+
| 7.5.7 | Object streams | ~ (`Extend` option support missing) |
| 7.5.7 | Object streams | |
+-------------------+---------------------------------+----------------------------------------+
| 7.5.8 | Cross Reference Streams ||
+-------------------+---------------------------------+----------------------------------------+
| *7.6* | *Encription* | ~ (no embedded files for now) |
| *7.6* | *Encryption* | ~ (no File Specs and Public Key Crypto)|
+-------------------+---------------------------------+----------------------------------------+
| 7.6.1 | General ||
+-------------------+---------------------------------+----------------------------------------+
| 7.6.2 | General Encription Algorithm ||
| 7.6.2 | General Encryption Algorithm ||
+-------------------+---------------------------------+----------------------------------------+
| 7.6.3 | Standard Security Handler | ~ (permission bits ignored) |
+-------------------+---------------------------------+----------------------------------------+
Expand Down Expand Up @@ -144,7 +144,7 @@ feature.
+-------------------+---------------------------------+----------------------------------------+

Subsequent chapters describe higher level aspects that are built on top of the PDF syntax and elementary objects.
As of now there is no support for those features, as exmplained in the landing page of the documentation.
As of now there is no support for those features, as explained in the landing page of the documentation.

In addition, the AESV3 encryption method specified in the
`PDF 1.7 Extension 3 document <https://www.adobe.com/content/dam/acom/en/devnet/acrobat/pdfs/adobe_supplement_iso32000.pdf>`_
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

setuptools.setup(
name="pdf4py",
version="0.0.2",
version="0.1.0",
author="Cristian Di Pietrantonio",
author_email="[email protected]",
description="A PDF parser written in Python3 with no external dependencies.",
Expand Down
7 changes: 1 addition & 6 deletions tests/functional_tests.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import logging
from binascii import unhexlify

KEYWORDS_OF_INTEREST = ['Extends', 'F']


def parse_object(parser, obj, visited):
if isinstance(obj, parpkg.PDFStream):
Expand All @@ -15,16 +15,11 @@ def parse_object(parser, obj, visited):
for x in obj:
parse_object(parser, x, visited)
elif isinstance(obj, dict):
interesting_keys = set(obj.keys()).intersection(KEYWORDS_OF_INTEREST)
if len(interesting_keys) > 0:
raise Exception('Found keyword(s) {} in dictionary {}'.format(interesting_keys, obj))
for k in obj:
parse_object(parser, obj[k], visited)
elif isinstance(obj, parpkg.PDFReference) and obj not in visited:
visited.add(obj)
x = parser.parse_reference(obj)
if isinstance(x, parpkg.PDFIndirectObject):
x = x.value
parse_object(parser, x, visited)


Expand Down

0 comments on commit 4dfeb1c

Please sign in to comment.