Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix small bugs with docx reader such as non-integer sizes in docx sty… #367

Merged
merged 2 commits into from
Nov 7, 2023

Conversation

IlyaKozlov
Copy link
Contributor

I use dedoc and am faced with some problems in the wild (error during the docx handling). I could not provide you with the real documents but have created an artificial one with the same problem and added it to test

@NastyBoget NastyBoget merged commit 3dc8b61 into develop Nov 7, 2023
2 checks passed
@NastyBoget NastyBoget deleted the fix_docx branch November 7, 2023 13:50
NastyBoget added a commit that referenced this pull request Nov 24, 2023
* Use older pydantic version (#364)

* Added rtf format to docx convertor (#366)

Co-authored-by: Alexander Golodkov <[email protected]>

* fix small bugs with docx reader such as non-integer sizes in docx sty… (#367)

* fix small bugs with docx reader such as non-integer sizes in docx style and filename with dots and spaces

* Rename test

---------

Co-authored-by: Nasty <[email protected]>

* TLDR-462 gpu for 1.1 (#365)

* TLDR-462 - test on GPU work

* TLDR-354 images attachments extraction from PDF (#368)

* Benchmarks before changes

* Add image extraction to tabby

* Fix document partial parsing

* Use start_page, end_page in java tabby execution

* Fix txtlayer classification tests

* Fixes in partial parsing

* Fix tests

* TLDR-518: Fix tabby partially read  (#372)

* Fix tabby partially read

* Add more tests

* Fix tabby page slice parameters

* Fix extract table in tabby with page range parameter

---------

Co-authored-by: Nasty <[email protected]>

* TLDR-514 creating document classes tutorial (#369)

* TLDR-517 attachments_dir (#370)

* TLDR-533 extract images from PDF to attachments_dir (#374)

* new version 1.1.1 (#375)

---------

Co-authored-by: Alexander Golodkov <[email protected]>
Co-authored-by: Alexander Golodkov <[email protected]>
Co-authored-by: IlyaKozlov <[email protected]>
Co-authored-by: raxtemur <[email protected]>
Co-authored-by: Andrey Mikhailov <[email protected]>
Co-authored-by: Nikita Shevtsov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants