-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Deploying to gh-pages from @ 9b2525a 🚀
- Loading branch information
1 parent
98c2fc4
commit ff52991
Showing
95 changed files
with
3,341 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
*.pyc | ||
__pycache__ | ||
*.swp | ||
.~lock* | ||
tables | ||
venv*/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Changelog | ||
|
||
|
||
## [1.0.0] - 20.11.2023 | ||
|
||
### Changed features | ||
- Python +3.9 required | ||
- requirements.txt replaced with .pytoml+poetry build config | ||
- `rfhg` as callable Python module | ||
|
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
ClarTable | ||
========= | ||
|
||
Installation | ||
------------ | ||
Works with Python `^3.9` | ||
```bash | ||
git clone [email protected]:clarin-eric/resource-families-html-generator.git # via SSH or | ||
git clone https://github.com/clarin-eric/resource-families-html-generator.git # via HTTPS | ||
cd ./resource-families-html-generator/ | ||
pip install . | ||
``` | ||
|
||
About | ||
----- | ||
*ClarTable* is a Python module for generating html presentation layer for tabular data from .csv file. | ||
|
||
### Usage | ||
|
||
#### Locally: | ||
```bash | ||
usage: python -m rfhg [-h] -i PATH -r PATH -o PATH | ||
|
||
Create html table from given data and rules. | ||
To navigate static resources within the module prepend `static.` | ||
to the path, eg. `-r static.rules/rules.json` | ||
|
||
optional arguments: | ||
-h, --help show this help message and exit | ||
-i PATH path to a .csv file or folder with .csv files | ||
-r PATH path to a .json file with rules | ||
-o PATH path to file where output html table will be generated | ||
``` | ||
|
||
|
||
#### Via CI: | ||
The html tables for resource families can be generated via GitHub. Push new .csv files to `/resouce_families` and after processing they will appear in gh-pages branch. | ||
|
||
### CSV format | ||
In order to create html table from .csv file with default rules, the file requires __all of following columns__ (order not important). Note that names of columns are case sensitive. If you need generator to consider additional columns contact <[email protected]> or adjust __rules.json__. | ||
|
||
Make sure, that your .csv files __use ; (semicolon)__ as a column separator. | ||
|
||
Single cell may containt multiple paragraphs or structures split with __#SEP__ separator. Following the example below the Description cell consists of 3 paragraphs. Some of the cells depend on others, looking into Buttons cell there are 2 buttons names split with the separator and respective URLs in Buttons_URL. | ||
|
||
Corpus | Corpus_URL | Language | Size | Annotation | Licence | Description | Buttons | Buttons_URL | Publication | Publication_URL | Note | ||
-------|------------|----------|------|------------|---------|-------------|---------|-------------|-------------|-----------------|------- | ||
Example Corpus Name | www.examplaryurl.com | English | 100 million tokens | tokenised, PoS-tagged, lemmatised | CC-BY | First examplary sentence #SEPSecond examplary sentence to be started from new line #SEPExample with ```<a href="http://some.url">hyperlink</a>``` in it | Concordancer#SEPDownload | https://www.concordancer.com/ #SEPhttps://www.download.com | Smith et al. (3019) | https://publication.url | Note text to be displayed in button field | ||
|
||
Resulting table: | ||
![Examplary table](docs/media/example.png) | ||
|
||
### Table titles and ordering | ||
Table title will be derived from the .csv file name in format X-table_title.csv, where X is index used for table ordering. | ||
Tables can be grouped into sections by storing them in the intermediate directory within corpora that is subject to the same indexation principle as .csv files. | ||
For example corpora with structure: | ||
```bash | ||
Historical corpora | ||
├── 1-Historical corpora in the CLARIN infrastructure | ||
│ ├── 1-Monolingual corpora.csv | ||
│ └── 2-Multilingual corpora.csv | ||
└── 2-Other historical corpora | ||
├── 1-Monolingual corpora.csv | ||
└── 2-Multilingual corpora.csv | ||
``` | ||
Will produce: | ||
|
||
![Examplary corpora](docs/media/corpora.png) | ||
|
||
### Rules format | ||
Rules are composed of nested json notation of tags and field. | ||
Given rule: | ||
```javascript | ||
{"tags": [ | ||
{"tag": "<table class=\"table\" cellspacing=\"2\">", "tags": [ | ||
{"tag": "<thead>", "tags": [ | ||
{"tag": "<tr>", "tags": [ | ||
{"tag": "<th>", "text": "Corpus name"} | ||
]} | ||
]}, | ||
{"tag": "<tbody>", "tags": [ | ||
{"tag": "<tr>", "tags": [ | ||
{"tag": "<td valign=\"top\"", "tags": [ | ||
{"tag": "<p>", "fields": [ | ||
{"text": "<strong>Field data</strong> will be inserted here: %s", "columns": ['column_name_in_csv_file']} | ||
]} | ||
]} | ||
]} | ||
]} | ||
]} | ||
]} | ||
``` | ||
|
||
Generated html table with names of corpora, assuming there were only 2 rows in a .csv file | ||
```html | ||
<table class ="table" cellspacing="2"> | ||
<thead> | ||
<tr> | ||
<th valign="top">Corpus name | ||
</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td valign="top"> | ||
<p> | ||
<strong>Field data</strong> will be inserted here: NKJP 2.1.4 | ||
</p> | ||
</td> | ||
</tr> | ||
</tbody> | ||
<tbody> | ||
<tr> | ||
<td valign="top"> | ||
<p> | ||
<strong>Field data</strong> will be inserted here: Common Crawl | ||
</p> | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
``` | ||
<table class ="table" cellspacing="2"> | ||
<thead> | ||
<tr> | ||
<th valign="top">Corpus name | ||
</th> | ||
</tr> | ||
</thead> | ||
<tbody> | ||
<tr> | ||
<td valign="top"> | ||
<p>Some text here | ||
<strong>Field data</strong> will be inserted here: NKJP 2.1.4 | ||
</p> | ||
</td> | ||
</tr> | ||
</tbody> | ||
<tbody> | ||
<tr> | ||
<td valign="top"> | ||
<p>Some text here | ||
<strong>Field data</strong> will be inserted here: Common Crawl | ||
</p> | ||
</td> | ||
</tr> | ||
</tbody> | ||
</table> | ||
|
||
|
||
|
||
\<tbody\> tag encloses tags and fields for row creation, only tags nested within \<tbody\> ... \</tbody\> can contain "fields": [] | ||
|
||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
[tool.poetry] | ||
name = "resource_families_html_generator" | ||
description = "CLARIN presentation layer generator for Resource Families" | ||
version = "1.0.0-dev" | ||
license = "./LICENSE.txt" | ||
authors = [ | ||
"Michał Gawor <[email protected]>", | ||
"Alexander König <[email protected]>", | ||
] | ||
maintainers = [ | ||
"Michał Gawor <[email protected]>", | ||
"Alexander König <[email protected]>", | ||
] | ||
packages = [ | ||
{ include = "rfhg" }, | ||
] | ||
include = [ | ||
"rfhg/static/*", | ||
] | ||
|
||
[tool.poetry.dependencies] | ||
json5 = '0.9.14' | ||
numpy = '1.26.2' | ||
pandas = '2.1.3' | ||
python = "^3.12" | ||
python-dateutil = '2.8.2' | ||
|
||
[build-system] | ||
requires = ["poetry-core>=1.0.0"] | ||
build-backend = "poetry.core.masonry.api" |
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
#!/usr/bin/env python3 | ||
|
||
import argparse | ||
import os | ||
import re | ||
|
||
from .clartable import Clartable | ||
from .reader import read_data, read_rules, resolve_static | ||
from .utils import table_title, section_title | ||
|
||
parser = argparse.ArgumentParser(description='Create html table from given data and rules. To use static resources as arguments use `static.<path_inside_rfhg/static>`') | ||
parser.add_argument('-i', metavar='PATH', default='static.resource_families/', help='path to a .csv file or folder with .csv files. Note that nesting data files inside multiple directories will generated nested tables respective to directory nesting.') | ||
parser.add_argument('-r', metavar='PATH', default='static.rules/rules.json', help='path to json file with rules') | ||
parser.add_argument('-o', metavar='PATH', required=True, help='path to file where output html table will be written') | ||
|
||
args = parser.parse_args() | ||
|
||
|
||
if __name__ == "__main__": | ||
rules = read_rules(args.r) | ||
clartable = Clartable(rules) | ||
|
||
output_path = args.o | ||
if not os.path.exists(output_path): | ||
os.makedirs(output_path) | ||
if os.path.isdir(output_path): | ||
file_name = os.path.basename(os.path.normpath(output_path)) + '.html' | ||
output_path = os.path.join(output_path, file_name) | ||
output = open(output_path, 'w') | ||
|
||
# input is a single file | ||
input_path = resolve_static(args.i) | ||
if os.path.isfile(os.path.normpath(input_path)): | ||
print("Processing file: ", input_path) | ||
print(input_path) | ||
data = read_data(input_path) | ||
print(data) | ||
title = table_title(input_path) | ||
table = title + clartable.generate(data) | ||
output.write(table) | ||
# input is a folder | ||
else: | ||
print("Processing directory: ", input_path) | ||
for root, subdir, files in os.walk(input_path): | ||
subdir.sort() | ||
files.sort() | ||
if len(files) > 0: | ||
if os.path.basename(root) != '': | ||
output.write(section_title(root)) | ||
for _file in files: | ||
print("Processing file: ", _file) | ||
data = read_data(os.path.join(root, _file)) | ||
# generate table: | ||
if _file != '': | ||
table = table_title(_file) | ||
else: | ||
table = '' | ||
table += clartable.generate(data) | ||
output.write(table) |
Oops, something went wrong.