This repository contains a set of files to parse documentation of experiments in eLabFTW.
A more detailed explanation regarding LISTER is provided in the following paper: https://pubs.acs.org/doi/10.1021/acs.jcim.3c00744
Additional resources are avaible to try out LISTER on your own:
- Demo server for eLabFTW, with contents that can be parsed with LISTER: https://elabftw.pharm.hhu.de/login.php. Please use anonymous login to browse the content.
- eLabFTW-importable Structure and comments to see how we implement our eLabFTW structure: https://github.com/CPCLab/lister-container.
- eLabFTW-importable Materials and methods examples, annotated with LISTER annotation rules: https://github.com/CPCLab/materials-and-methods.
LISTER is an effective tool that simplifies metadata extraction for your eLabFTW experiments, improving the findability and reproducibility of your work. By using LISTER, you can save time and effort in documenting your research and annotating your research data while ensuring that your metadata is well-organized and easily accessible. LISTER formats your metadata in both Excel format, making it easy for humans to read and machines to process, as illustrated in Table 1.
Par. No. | Key | Value | Measure | Unit |
---|---|---|---|---|
- | section level 0 | Remarks | ||
- | section level 0 | Precultures | ||
4 | Date of experiment | 29.09.2017 | ||
5 | expression strain | P. putida KT2440 pVLT33::pigC | ||
5 | negative control | empty vector strain | ||
5 | inoculum | single colony | ||
5 | growth media | LB Kan | 5 | mL |
5 | temperature | 30 | °C | |
5 | shaking | 250 | rpm | |
5 | time | overnight |
Table 1. An example of the extracted metadata from LISTER in tabular Excel format.
LISTER's semi-automated metadata extraction is powered by a user-friendly annotation mechanism that empowers experiment documentation writers to effortlessly highlight metadata pairs they consider crucial. The appeal of this annotation system lies in its simplicity and readability for both humans and machines. It works as illustrated in Figure 1.
Figure 1. Lightweight annotated experiment document snippets are extracted for (a) metadata in JSON and Excel format and (b) clean Word documentation
Once the metadata is extracted, LISTER creates a clean Word document along with the metadata extraction. When exporting to a Word document, LISTER removes the annotations and simplifies the content, presenting you with a clean and simplified document that is ready for sharing or publication, ensuring that others can reproduce your experiments with ease, as illustrated in Figure 2.
Figure 2. An example of a Word experiment documentation generated by LISTER.
LISTER's annotation process can be streamlined by integrating with your eLabFTW implementation's protocol/materials and methods catalog. Our collection of materials and methods is illustrated in Figure 3. These protocols are pre-annotated using LISTER's annotation scheme (see below for details). When you import these pre-annotated catalog entries into your experiment entries (which can be done with a click in eLabFTW), you ensure consistency throughout your research documentation without having to write and annotate the documentation from scratch. The adaptation of the actual value being used, however, has to be done manually.
Figure 3. A collection of pre-annotated reusable catalog entries that can be imported to relevant experiments.
If your lab's eLabFTW implementation utilizes specific groupings (we coined these groupings as containers), such as by publication, project, study, or system, LISTER offers a convenient feature that allows you to extract metadata for all experiments within a group with just a single click. Moreover, LISTER provides the flexibility to include additional information related to your groupings (such as a publication), further enhancing the context and usability of your extracted metadata. In the following example, LISTER also provides the information of publication's title, authors, journal, status, and DOI for some experiments belonging to a publication along with related project, study, and system information. This is illustrated in Figure 4.
Figure 4. A publication consists of several linked experiments, all of which can be extracted for their metadata in a single click.
By embracing a structured approach to your eLabFTW organization, your experiments will remain well-organized and easily accessible. LISTER's one-click metadata parsing for experiments belonging to, for example, a study is an illustration of LISTER's metadata extraction efficiency. With a single click, LISTER generates a comprehensive folder structure where each experiment directory contains a Word document detailing the experiment without bracket annotations, an Excel file with a table of metadata entries, a JSON metadata file, and any attachments you have provided in your corresponding eLabFTW experiment entries. This structured output ensures that your experiment documentation is clear, concise, and ready for research data publication, sharing, and archival. This is illustrated in Figure 5.
Figure 5. A study container, once extracted, will have their experiments' metadata, documentation, and attachments extracted into respective subfolders.
The resulting metadata, once uploaded into a platform based on the Data Portal CMS DSpace 7, will be indexed for searchability. DSpace7 also supports logical search operators to search the key-value pairs for your metadata.
Figure 6. User interface for parsing an experiments.
Figure 7. User interface for parsing a container (e.g., Publications, Project, etc).
Figure 8. How the annotation is done to enable parsing via LISTER. This annotated fragment can be derived from reusable, and lab-curated experiment protocols/material and methods. See the Annotation Mechanism section below.
Figure 9. Linked items section of an experiment, in which the tabular content will be parsed to gather more context w.r.t. e.g., Study, Project, and System.
Figure 10. Metadata output in the Excel sheet, after parsing with LISTER.
LISTER is distributed as an executable file for Windows, Linux, or macOS (with an Intel chipset). The executable file for each platform is available on the release page, along with another, platform-specific file.
-
For Windows and Linux, place the executable file (
lister.exe
on Windows orlister
on Linux) within the same folder as theconfig.json
file. -
For macOS, create the directory
~/Apps/lister
first and place the executablelister.app
andconfig.json
in this directory.
Parsing an eLabFTW entry requires
-
general parameters
-
eLabFTW
API token
andAPI endpoint
, which can be obtained from the eLabFTW instance's administrator of the lab or university, -
Default output directory, i.e., a directory path used to store the parsing outputs
-
-
specific parameter
Experiment ID
orDatabase ID
for the entry to be parsed.
The annotation mechanism allows extracting metadata from experiment documentation as .xlsx and .json files. In the following points, the basic elements of annotating protocol/MM to be parsed by LISTER are described.
-
Key-Value (KV) elements.
-
A KV pair is written as
{value|key}
in an experiment entry. -
If applicable, a KV pair is extendable with measure and unit. Therefore, there are two more variations for writing a KV pair:
{measure|unit|key}
the measure and unit will be mapped into value and unit.{measure|unit|value|key}
the measure and unit will be taken as given.
-
For example, “Two {100|mL|LB Kan|expression media} cultures in {unbaffled Erlenmeyer|flasks}” consists of two patterns of pair:
{measure|unit|value|key}
->{100|mL|LB Kan|expression media}
{value|key}
->{unbaffled Erlenmeyer|:flasks:}
-
Keys are hidden by default in the .docx output file to avoid superfluous text.
-
To make the keys visible, they can be placed within colons as
{value|:key:}
, such as{unbaffled Erlenmeyer|:flasks:}
”.
-
-
Order.
- As there can be identical keys within an experiment entry, disambiguation is needed.
- The disambiguation is done through the paragraph number, which will be extracted and associated with each KV pair.
-
Comments. There are three types of comments supported in LISTER.
- Comments parsed as-is.
- This retains both brackets and content in the word document.
- Annotation is done using a regular bracket
()
. - Annotation example:
(This comment will be parsed as is, retaining both the content and the brackets in the .docx file.)
.
- Invisible comments.
- Used to specify additional notes (regarding, e.g., parameter use) that should be hidden from the final experiment documentation output.
- Annotation is done using a pair of underscores inside a regular comment.
(_ _)
- Annotation example:
(\_This comment will be invisible in the .docx output file.\_)
.
- Comments are retained but without brackets.
- This is typically used for comments within KV pairs.
- Annotation is done using brackets and a double colon
(: :)
- Annotation example:
(:This comment's bracket will be invisible in the .docx output file, but the text content will be kept.:)
.
- Comments parsed as-is.
-
Conditionals and iterations handling.
- LISTER supports documenting conditionals and iterations, but this should be used cautiously: As the final experiment documentation is unlikely to have these conditional and iteration clauses, researchers are required to resolve them by adapting the experiment parameter values with the actual values used during the experiment.
-
Reference management.
- References can be provided if the referred source has a DOI.
- Annotation is done using regular brackets and providing the DOI (not URI) in the bracket.
- The DOI will be converted into Arabic numerals in square brackets, which refer to the reference provided at the bottom of the document.
- References are only retained in the docx output, but not the metadata outputs (.xlsx/.json).
- Example:
(DOI_CODE)
, such as(10.1073/pnas.062492699)
will be written as[1]
in the experiment body, and as a numbered list of DOI codes by the end of the experiment documentation.
-
Sections.
- The keywords section or subsection are designated to provide a separation between sections or subsections.
- This is done by using the
<section|section name>
annotation. - Multiple subsections are also supported, with e.g.,
<subsubsection|section name>
, which will output different sectioning levels in the .xlsx and .json files and different heading levels in the .docx file.
Extracted item | Description | Representation | Example | Extracted order,key, value, and optionally measure, unit in the metadata |
---|---|---|---|---|
Section | The section name | <section|section name> |
<section|Structure Preparation> |
"-",section level 0,Structure Preparation, -, - |
Order | The order of the steps, based on the order of the paragraph in the experiment documentation | - | - | - |
Key | The key for the metadata, connected to the value | {value|key} |
{sequence alignment|stage} |
<order>, stage, sequence alignment, -, - |
Comment, please also see the bullet points about comments above for variations | Comments are allowed within the key-value annotation, represented within regular brackets. Comments can be placed both/either before and/or after the key and/or value | {value|(comment) key} or {value (comment)|key} or {value (comment)|(comment) key} |
{receptor residue|(minimization) target} |
<order>, target, receptor residue, -, - |
Value | The value of the metadata is the first item within the curly brackets | {value|key} |
{sequence alignment|stage} |
, stage, sequence alignment, -, - |
Measure and Unit | The measure and unit of corresponding key/value pairs | {measure|unit|value|key} |
{100|mL|LB Kan|expression media} |
<order>, expression media, LB Kan, 100, mL |
Value and Unit | In some cases, value is attached to a unit directly, without having to provide a measure | {value|unit|key} |
{250|rpm|shaking} |
<order>, shaking,250, -, rpm |
Control flow: for each |
Extract multiple key-value pairs related to for each iteration |
{value|unit|key} |
<for each|generated pose> |
<order>, step type,iteration, -,- <order>, flow type,for each, -,- <order>, flow parameter, generated pose, -,- |
Control flow: for |
Extract multiple key-value pairs related to for iteration |
<for|key|[range]|iteration operation|magnitude> |
<for|pH|[1-7]|+|1> |
-<order>, step type,iteration, -, - <order>, flow type,for, -, - <order>, flow parameter, pH, -, - <order>, flow range, [1-7], -, - <order>, start iteration value,1, -, - <order>, end iteration value,7, -,- . <order>, flow operation, +,-, - <order>, flow magnitude, 1, -, - |
Control flow: while |
Extract multiple key-value pairs related to while iteration |
<while|key|logical operator|value> ... <iterate|iteration operation|magnitude> |
<while|pH|lte|7> ... <iterate|+|1> |
<order>, step type,iteration, -, - <order>, flow type,while, -, - <order>, flow parameter, pH, -, - <order>, flow logical parameter, lte, -, - <order>, flow compared value, 7*, -, -<order>, flow type,*iterate*(after while) <order>flow operation, +,-, - <order>, flow magnitude, 1, -, - |
Control flow: if |
Extract multiple key-value pairs related to if iteration |
<if|key|logical operator|value> |
<if|pH|lte|7> |
<order>, step type,conditional, -,- <order>, flow type,*if, -, -* <order>, flow parameter, pH. <order>, flow logical parameter, lte, -, - <order>, flow compared value, 7 |
Control flow: else if |
Extract multiple key-value pairs related to else if iteration |
<else if|key|logical operator|value> |
<else if|pH|between|[8-12]> |
<order>, step type,conditional, -, - <order>, flow type,*else if, -, - <order>, flow parameter, pH, -, - <order>, flow logical parameter, between, -, - <order>, flow range, [8-12], -, - <order>, start iteration value,8, -, - <order>, end iteration value,12, -, - |
Control flow: else |
Extract multiple key-value pairs related to else iteration |
<else> |
<order>, step type,conditional, -, - <order>, flow type,else, -, - |
A logical operator is used to decide whether a particular condition is met in an iteration/conditional block. It is available for while, if , and else if control flows. The following logical operators are supported:
-
e
: equal -
ne
: not equal -
lt
: less than -
lte
: less than equal -
gt
: greater than -
gte
: greater than equal -
between
: betweenk
An iteration operator is used to change the value of a variable in a loop. It is available for while and for. The following iteration operators are supported:
-
+
: iteration using addition -
-
: iteration using subtraction -
%
: iteration using modulo -
*
: iteration using multiplication -
/
: iteration using division
Instance example | Can the table be parsed as metadata? | Can annotated text be parsed? | Should the the metadata specified before (be it table OR annotated text) be parsed? | |
---|---|---|---|---|
Experiment entries | - | No, we still need to define how heterogenous table can be extracted to key-value pairs | Yes | Yes |
Management-instance database entries | Project, System, Study | Yes, as long as it is in a two-column structure | No, it is deemed to be unnecessary as KV are already in tabular form | Yes |
SOP-instance database entries | MM, Method, Methods, Protocol, Protocols | No | Yes | No, it should have already been inserted into the experiment instead, and the parameter should have been adapted |
Container-instance database entries | Publication | Yes | No, it is deemed to be unnecessary as KV are already in tabular form | Yes |
LISTER checks and reports the following syntax issues upon parsing:
-
Orphaned brackets.
-
Mismatched data types for conditionals and iterations.
-
Mismatched argument numbers for conditionals and iterations.
-
Invalid control flows.
Images are extracted from the experiment documentation, but there is no metadata or naming scheme for the extracted images.
-
Avoid referring to, e.g., a section without explicitly using a key-value pair (avoid, e.g., "Repeat step 1 with similar parameters"), as this will make the metadata extraction for that particular implicit step impossible.
-
To minimize confusion regarding units of measurement (e.g.,
fs
vsps
), please explicitly state the units within the value portion of the key-value pair, e.g.,{0.01|ps|gamma_ln}
.
-
The base directory contains the metadata extraction script.
-
The output directory contains the extracted metadata: step order – key – value – measure – unit in JSON and XLSX format.
-
Packaging is done through the PyInstaller library and has to be done on the respective platform. PyInstaller should be installed first.
-
A .spec file to build LISTER can be generated using the pyi-makespec command, e.g.,
pyi-makespec --onedir lister.py
to create a spec file to package the LISTER app as one directory instead of one file. -
The spec file for each platform is provided in the root folder of the LISTER GitHub repository.
-
The resulting packaged app will be available under the dist directory, which is created automatically during the build process.
-
It is recommended to use virtual environments using python's venv or anaconda.
-
Using venv
-
create venv virtual environment inside lister directory named
venv
which will use python3.9 as the interpreter:python3.9 -m venv venv
-
set IDE to use the created
venv
environment as python interpreter to use - in pycharm it is in the Settings - Project: lister - Python Interpreter - Add Interpreter which is set to/lister/venv/bin/python
-
activate the venv environment:
source venv/bin/activate
-
install required libraries:
pip install xlsxwriter gooey python-docx elabapy beautifulsoup4 pyinstaller pandas latex2mathml
-
build using the build scripts mentioned below
-
-
-
One directory version - on the root folder of the repo, run
pyinstaller .\build-scripts\build-windows-onedir.spec
-
One file version - on the root folder of the repo, run
pyinstaller .\build-scripts\build-windows-onefile.spec
- One file version - on the root folder of the repo, run
pyinstaller build-scripts/build-linux-onefile.spec
- One file version - on the root folder of the repo, run
pyinstaller build-scripts/build-macos-onefile.spec
Decompressing a single-executable lister app into a temporary directory likely caused this problem. The multi-file distribution (aka one-directory version) can be used instead, although it is not as tidy as compared to the single-executable LISTER app.
LISTER only supports the default OS' light theme, a custom user/dark theme is therefore not supported.
When the following error 'charmap' codec can't encode characters in position...
appears, open cmd.exe
as an administrator before running LISTER and type the following:
setx /m PYTHONUTF8 1
setx PATHEXT "%PATHEXT%;.PY"
The error win32ctypes.pywin32.pywintypes.error: (110, 'EndUpdateResourceW', 'The system cannot open the device or file specified.'
happens because of file access problems on Windows. Ensure that the directory is neither read-only nor auto-synced to cloud storage , exclude the repo folder from antivirus scanning, and/or try removing both the build
and dist
directories. Both of these directories are automatically generated upon packaging. Cloud storage synchronization may also be the cause of this issue.
Please consider using environment management system such as anaconda to package the app. Install conda locally along with the dependencies stated in the requirements.txt
. In the release, python 3.9.15 was used. LISTER runs fine on macOS v13.0.1 and macOS v10.12.4 within intel machines.
Running lister.py
directly from your IDE on macOS may lead to the following message:
This program needs access to the screen. Please run with a Framework build of python, and only when you are logged in on the main display of your Mac.
Run the script from terminal using pythonw lister.py
instead.
The following warning appears when running lister.py
directly in terminal:
Python[67201:349757] WARNING: Secure coding is not enabled for restorable state! Enable secure coding by implementing NSApplicationDelegate.applicationSupportsSecureRestorableState: and returning YES.
This warning can be ignored and does not affect the functionality of LISTER.
The requests.exceptions.HTTPError: 403 Client Error: Forbidden for url:...
happens because the specified API token/key does not have access rights to an entry (or its underlying entries).
Check that the user with specified token has access to the entries directly linked to the experiments/database items/containers.
If the following error:
requests.exceptions.ConnectionError: HTTPConnectionPool(host='..., port=80): Max retries exceeded with url: ... (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at ...>: Failed to establish a new connection: [Errno 61] Connection refused'))
occurs, use https
instead of http
as an API endpoint.