Deploying to gh-pages from @ 9b2525a 🚀

clarin-eric · Nov 20, 2023 · ff52991 · ff52991
1 parent 98c2fc4
commit ff52991
Show file tree

Hide file tree

Showing 95 changed files with 3,341 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,6 @@
+*.pyc
+__pycache__
+*.swp
+.~lock*
+tables
+venv*/
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -0,0 +1,10 @@
+# Changelog
+
+
+## [1.0.0] - 20.11.2023
+
+### Changed features
+- Python +3.9 required
+- requirements.txt replaced with .pytoml+poetry build config
+- `rfhg` as callable Python module
+
diff --git a/LICENSE.txt b/LICENSE.txt
diff --git a/README.md b/README.md
@@ -0,0 +1,155 @@
+ClarTable
+=========
+
+Installation
+------------
+Works with Python `^3.9`
+```bash
+git clone [email protected]:clarin-eric/resource-families-html-generator.git # via SSH or
+git clone https://github.com/clarin-eric/resource-families-html-generator.git # via HTTPS
+cd ./resource-families-html-generator/
+pip install .
+```
+
+About
+-----
+*ClarTable* is a Python module for generating html presentation layer for tabular data from .csv file.
+
+### Usage
+
+#### Locally:
+```bash
+usage: python -m rfhg [-h] -i PATH -r PATH -o PATH
+
+Create html table from given data and rules. 
+To navigate static resources within the module prepend `static.` 
+to the path, eg. `-r static.rules/rules.json`
+
+optional arguments:
+  -h, --help  show this help message and exit
+  -i PATH     path to a .csv file or folder with .csv files
+  -r PATH     path to a .json file with rules
+  -o PATH     path to file where output html table will be generated
+```
+
+
+#### Via CI:
+The html tables for resource families can be generated via GitHub. Push new .csv files to `/resouce_families` and after processing they will appear in gh-pages branch.   
+
+### CSV format
+In order to create html table from .csv file with default rules, the file requires __all of following columns__ (order not important). Note that names of columns are case sensitive. If you need generator to consider additional columns contact <[email protected]> or adjust __rules.json__.
+
+Make sure, that your .csv files __use ; (semicolon)__ as a column separator. 
+
+Single cell may containt multiple paragraphs or structures split with __#SEP__ separator. Following the example below the Description cell consists of 3 paragraphs. Some of the cells depend on others, looking into Buttons cell there are 2 buttons names split with the separator and respective URLs in Buttons_URL.
+
+Corpus | Corpus_URL | Language | Size | Annotation | Licence | Description | Buttons | Buttons_URL | Publication | Publication_URL | Note
+-------|------------|----------|------|------------|---------|-------------|---------|-------------|-------------|-----------------|-------
+Example Corpus Name | www.examplaryurl.com | English | 100 million tokens | tokenised, PoS-tagged, lemmatised | CC-BY | First examplary sentence #SEPSecond examplary sentence to be started from new line #SEPExample with ```<a href="http://some.url">hyperlink</a>``` in it | Concordancer#SEPDownload | https://www.concordancer.com/ #SEPhttps://www.download.com | Smith et al. (3019) | https://publication.url | Note text to be displayed in button field
+
+Resulting table:
+![Examplary table](docs/media/example.png)
+
+### Table titles and ordering
+Table title will be derived from the .csv file name in format X-table_title.csv, where X is index used for table ordering. 
+Tables can be grouped into sections by storing them in the intermediate directory within corpora that is subject to the same indexation principle as .csv files.
+For example corpora with structure:
+```bash
+Historical corpora
+├── 1-Historical corpora in the CLARIN infrastructure
+│   ├── 1-Monolingual corpora.csv
+│   └── 2-Multilingual corpora.csv
+└── 2-Other historical corpora
+    ├── 1-Monolingual corpora.csv
+    └── 2-Multilingual corpora.csv
+```
+Will produce:
+
+![Examplary corpora](docs/media/corpora.png)
+
+### Rules format
+Rules are composed of nested json notation of tags and field. 
+Given rule:
+```javascript
+{"tags": [
+	{"tag": "<table class=\"table\" cellspacing=\"2\">", "tags": [
+		{"tag": "<thead>", "tags": [
+			{"tag": "<tr>", "tags": [
+				{"tag": "<th>", "text": "Corpus name"}
+			]}	
+		]},
+		{"tag": "<tbody>", "tags": [
+			{"tag": "<tr>", "tags": [
+				{"tag": "<td valign=\"top\"", "tags": [
+					{"tag": "<p>", "fields": [
+						{"text": "<strong>Field data</strong> will be inserted here: %s", "columns": ['column_name_in_csv_file']}
+					]}
+				]}
+			]}
+		]}
+	]}
+]}
+```
+
+Generated html table with names of corpora, assuming there were only 2 rows in a .csv file
+```html
+<table class ="table" cellspacing="2">
+        <thead>
+                <tr>
+                        <th valign="top">Corpus name
+                        </th>
+                </tr>
+        </thead>
+        <tbody>
+                <tr>
+                        <td valign="top">
+                                <p>
+                                <strong>Field data</strong> will be inserted here: NKJP 2.1.4
+                                </p>
+                        </td>
+                </tr>
+        </tbody>
+        <tbody>
+                <tr>
+                        <td valign="top">
+                                <p>
+                                <strong>Field data</strong> will be inserted here: Common Crawl
+                                </p>
+                        </td>
+                </tr>
+        </tbody>
+</table>
+
+```
+<table class ="table" cellspacing="2">
+        <thead>
+                <tr>
+                        <th valign="top">Corpus name
+                        </th>
+                </tr>
+        </thead>
+        <tbody>
+                <tr>
+                        <td valign="top">
+                                <p>Some text here
+                                <strong>Field data</strong> will be inserted here: NKJP 2.1.4
+                                </p>
+                        </td>
+                </tr>
+        </tbody>
+        <tbody>
+                <tr>
+                        <td valign="top">
+                                <p>Some text here
+                                <strong>Field data</strong> will be inserted here: Common Crawl
+                                </p>
+                        </td>
+                </tr>
+        </tbody>
+</table>
+
+
+
+\<tbody\> tag encloses tags and fields for row creation, only tags nested within \<tbody\> ... \</tbody\> can contain "fields": []
+
+
diff --git a/pyproject.toml b/pyproject.toml
@@ -0,0 +1,30 @@
+[tool.poetry]
+name = "resource_families_html_generator"
+description = "CLARIN presentation layer generator for Resource Families"
+version = "1.0.0-dev"
+license = "./LICENSE.txt"
+authors = [
+    "Michał Gawor <[email protected]>",
+    "Alexander König <[email protected]>",
+]
+maintainers = [
+    "Michał Gawor <[email protected]>",
+    "Alexander König <[email protected]>",
+]
+packages = [
+    { include = "rfhg" },
+]
+include = [
+    "rfhg/static/*",
+]
+
+[tool.poetry.dependencies]
+json5 = '0.9.14'
+numpy = '1.26.2'
+pandas = '2.1.3'
+python = "^3.12"
+python-dateutil = '2.8.2'
+
+[build-system]
+requires = ["poetry-core>=1.0.0"]
+build-backend = "poetry.core.masonry.api"
diff --git a/rfhg/__init__.py b/rfhg/__init__.py
diff --git a/rfhg/__main__.py b/rfhg/__main__.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python3
+
+import argparse
+import os
+import re
+
+from .clartable import Clartable
+from .reader import read_data, read_rules, resolve_static
+from .utils import table_title, section_title
+
+parser = argparse.ArgumentParser(description='Create html table from given data and rules. To use static resources as arguments use `static.<path_inside_rfhg/static>`')
+parser.add_argument('-i', metavar='PATH', default='static.resource_families/', help='path to a .csv file or folder with .csv files. Note that nesting data files inside multiple directories will generated nested tables respective to directory nesting.')
+parser.add_argument('-r', metavar='PATH', default='static.rules/rules.json', help='path to json file with rules')
+parser.add_argument('-o', metavar='PATH', required=True, help='path to file where output html table will be written')
+
+args = parser.parse_args()
+
+
+if __name__ == "__main__":
+    rules = read_rules(args.r)
+    clartable = Clartable(rules)
+
+    output_path = args.o
+    if not os.path.exists(output_path):
+        os.makedirs(output_path)
+    if os.path.isdir(output_path):
+        file_name = os.path.basename(os.path.normpath(output_path)) + '.html'
+        output_path = os.path.join(output_path, file_name)
+    output = open(output_path, 'w')
+
+    # input is a single file
+    input_path = resolve_static(args.i)
+    if os.path.isfile(os.path.normpath(input_path)):
+        print("Processing file: ", input_path)
+        print(input_path)
+        data = read_data(input_path)
+        print(data)
+        title = table_title(input_path)
+        table = title + clartable.generate(data)
+        output.write(table)
+    # input is a folder
+    else:
+        print("Processing directory: ", input_path)
+        for root, subdir, files in os.walk(input_path):
+            subdir.sort()
+            files.sort()
+            if len(files) > 0:
+                if os.path.basename(root) != '':
+                    output.write(section_title(root))
+                for _file in files:
+                    print("Processing file: ", _file)
+                    data = read_data(os.path.join(root, _file))
+                    # generate table:
+                    if _file != '':
+                        table = table_title(_file)
+                    else:
+                        table = ''
+                    table += clartable.generate(data)
+                    output.write(table)