GitHub - brick/structured-data: Microdata, RDFa Lite & JSON-LD structured data reader for PHP

Brick\StructuredData

A PHP library to read Microdata, RDFa Lite & JSON-LD structured data in HTML pages.

This library is a foundation to read schema.org structured data in brick/schema, but may be used with other vocabularies.

Installation

This library is installable via Composer:

composer require brick/structured-data

Requirements

This library requires PHP 8.1 or later. It makes use of the following extensions:

These extensions are enabled by default, and should be available in most PHP installations.

Project status & release process

This library is under development. It is likely to change fast in the early 0.x releases. However, the library follows a strict BC break convention:

The current releases are numbered 0.x.y. When a non-breaking change is introduced (adding new methods, fixing bugs, optimizing existing code, etc.), y is incremented.

When a breaking change is introduced, a new 0.x version cycle is always started.

It is therefore safe to lock your project to a given release cycle, such as 0.2.*.

If you need to upgrade to a newer release cycle, check the release history for a list of changes introduced by each further 0.x.0 version.

Introduction

The library unifies reading the 3 supported formats (Microdata, RDFa Lite & JSON-LD) under a common interface:

interface Brick\StructuredData\Reader
{
    /**
     * Reads the items contained in the given document.
     *
     * @param DOMDocument $document The DOM document to read.
     * @param string      $url      The URL the document was retrieved from. This will be used only to resolve relative
     *                              URLs in property values. No attempt will be performed to connect to this URL.
     *
     * @return Item[] The top-level items.
     */
    public function read(DOMDocument $document, string $url) : array;
}

There are 3 implementations of this interface, one for each format:

MicrodataReader
RdfaLiteReader
JsonLdReader

The read() method returns the top-level items found in the document. Every Item consists of:

An optional id (itemid in Microdata, resource in RDFa Lite, @id in JSON-LD)
An array of zero or more types; each type is a URL, for example http://schema.org/Product
An associative array of zero or more properties; each property has a URL as a key, for example http://schema.org/price, and maps to an array of one or more values; values can be plain strings, or nested Item objects

Quickstart

Here is a working example that reads Microdata from a web page. Just change the URL and give it a try:

use Brick\StructuredData\Reader\MicrodataReader;
use Brick\StructuredData\HTMLReader;
use Brick\StructuredData\Item;

// Let's read Microdata here;
// You could also use RdfaLiteReader, JsonLdReader,
// or even use all of them by chaining them in a ReaderChain
$microdataReader = new MicrodataReader();

// Wrap into HTMLReader to be able to read HTML strings or files directly,
// i.e. without manually converting them to DOMDocument instances first
$htmlReader = new HTMLReader($microdataReader);

// Replace this URL with that of a website you know is using Microdata
$url = 'http://www.example.com/';
$html = file_get_contents($url);

// Read the document and return the top-level items found
// Note: the URL is only required to resolve relative URLs; no attempt will be made to connect to it
$items = $htmlReader->read($html, $url);

// Loop through the top-level items
foreach ($items as $item) {
    echo implode(',', $item->getTypes()), PHP_EOL;

    foreach ($item->getProperties() as $name => $values) {
        foreach ($values as $value) {
            if ($value instanceof Item) {
                // We're only displaying the class name in this example; you would typically
                // recurse through nested Items to get the information you need
                $value = '(' . implode(', ', $value->getTypes()) . ')';
            }

            // If $value is not an Item, then it's a plain string

            echo "  - $name: $value", PHP_EOL;
        }
    }
}

Current limitations

No support for the itemref attribute in MicroDataReader
No support for the prefix attribute in RdfaLiteReader; only predefined prefixes are supported right now
No proper support for @context in JsonLdReader; right now, only strings are accepted in @context, and they are considered a vocabulary identifier; this works fine with simple markup like the one used in the examples on schema.org, but may fail with more complex documents.

Note about JSON-LD's `@context`

While JsonLdReader should be able to handle a proper context object in the future, its goal will never be to be a fully compliant JSON-LD parser; in particular, it will never attempt to fetch a JSON-LD context referenced by a URL.

This is consistent with how indexing robots typically crawl the web, they do not fetch remote contexts, which relieves them from fetching additional documents to extract structured data from a web page.

The aim of JsonLdReader, and the other Reader implementations for that matter, is to be able to parse a document with the same capabilities as Google Structured Data Testing Tool or Yandex Structured data validator, no more, no less. These tools do not load external context files.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.github		.github
src		src
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
composer.json		composer.json
phpunit.xml		phpunit.xml
psalm-baseline.xml		psalm-baseline.xml
psalm.xml		psalm.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Repository files navigation

Brick\StructuredData

Installation

Requirements

Project status & release process

Introduction

Quickstart

Current limitations

Note about JSON-LD's `@context`

About

Uh oh!

Releases 3

Sponsor this project

Uh oh!

Packages

Uh oh!

Languages

Uh oh!

License

brick/structured-data

Folders and files

Latest commit

History

Repository files navigation

Brick\StructuredData

Installation

Requirements

Project status & release process

Introduction

Quickstart

Current limitations

Note about JSON-LD's @context

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 3

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Languages

Note about JSON-LD's `@context`

Packages