Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added dynamic section restructuring for non-PMC articles. #20

Open
wants to merge 75 commits into
base: main
Choose a base branch
from

Conversation

Thomas-Rowlands
Copy link
Collaborator

Sections from journals such as Human Molecular Genetics were not cleanly contained separately, instead just having their contents listed all as siblings throughout the main text.

This update restructures matches that display this type of layout, then restructures them into empty parent divs more akin to PMC articles. The rest of the AC code base can then treat matches in the same manner as before effectively.

White space removal has been commented out for now until a few kinks are worked out.

Thomas-Rowlands and others added 30 commits April 26, 2022 19:48
…quality improvements, as well as a major refactoring of the Tables.py code design. Python code throughout AC has been mostly standardised with PEP compliance, however more work will be needed to cover the entire code base.

Bugs Fixed:
- Offset incorrect for the first "data_section"
- HTML comments present in AC output.
- Table title offset should now correct.
- Offset value is now assigned to 'table_section_title'.
- Empty 'table_section_title' fields are no longer output.
- Removed duplicate pipe characters from table headers.
- Table header strings should now display in the correct order, separated by pipes.
- The first column has returned to the table outputs, even with no cell values.
- Corrected ordering of data section rows.
- Table footers are now duplicated for each split table output.
- Tables should now split into multiple smaller tables under the correct circumstances; when encountering subheaders.
- Super rows will no longer cause tables to be split.
@Thomas-Rowlands
Copy link
Collaborator Author

Re-using this PR since it is based on the same branch of my fork. This now includes a merge of all existing branches within my fork (see changes above).

Main changes
Improved handling of poor HTML structuring in other journal articles.
Reworked BioC tables code from months back (see above)
Up-to-date supplementary material processing changes
XML output for BioC tables

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants