Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract metadata from DB5 document #736

Closed
wants to merge 7 commits into from
Closed

Extract metadata from DB5 document #736

wants to merge 7 commits into from

Conversation

tomschr
Copy link
Collaborator

@tomschr tomschr commented Aug 7, 2024

Issue

When Docserv needs to extract metadata, it would have to deal with different sources: DocBook 5, assemblies, or ASCIIDoc. To ease parsing of all these different formats, a common interface would be helpful.

For example, daps could implement a daps meta or daps metadata which calls a XSLT stylesheet to extract all of that.

Implementation

The PR contains this stylesheet that is called by the daps meta or daps metadata command.

Conceptually, the stylesheet should only be applied to a full profiled XML file. Perhaps the daps bigfile could be used for that (or its internal target).

  • Outputs plain text which each meta data on a separate line
  • The following meta data are detected:
    • meta[@name = 'productname'] and productname/productnumber
    • info/title or on the parent title
    • info/subtitle or on the parent subtitle
    • meta[@name='title']
    • meta[@name='description']
    • meta[@name='social-descr']
    • revhistory/revision[1]/date
    • meta[@name='task']
    • meta[@name='series']
    • meta[@name='category']
    • meta[@name='type']
    • The list comes from https://confluence.suse.com/x/f4GZW

By default, it outputs warnings if a meta data couldn't be found. The warnings go to stderr. For debugging purpuoses it might be helpful, but can give false positives. The warnings can be suppressed by passing the parameter with-warn=0 to your XSLT processor.

Example output from DC-SLES-modules using the bigfile from daps bigfile:

$ xsltproc daps-xslt/metadata/extract-metadata.xsl /home/toms/repos/GH/SUSE/doc-sle/build/.tmp/article-modules_bigfile.xml
WARNING: Missing element: meta[@name='type']
# Metadata output
productname=[sles;15 SP6]SUSE Linux Enterprise Server
title=Modules and Extensions Quick Start
seo-title=Modules and Extensions Quick Start
seo-description=How to use the modules and extensions available for the SUSE Linux Enterprise family
seo-social-descr=Use modules and extensions for SLE
date=2024-06-26
task=Administration;Installation
series=Products & Solutions
category=High Availability

* This is the stylesheet that makes the "daps meta[data]" command work
* Outputs plain text which each meta data on a separate line
* The following meta data are detected:
  - meta[@name = 'productname'] and productname/productnumber
  - info/title or title
  - info/subtitle or subtitle
  - meta[@name='title']
  - meta[@name='description']
  - meta[@name='social-descr']
  - revhistory/revision[1]/date
  - meta[@name='task']
  - meta[@name='series']
  - meta[@name='category']
  - meta[@name='type']
@fsundermeyer
Copy link
Member

No merge into beta3, this will be released as beta4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants