Skip to content

Latest commit



184 lines (130 loc) · 4.55 KB

File metadata and controls

184 lines (130 loc) · 4.55 KB

The Utilities Package


Regular expressions can get quite complicated at times. The RegEx package is designed to make them easier to create, more readable and easier to maintain.

Using RegEx

As an example of the simple handling of RegEx, an implementation of a parser for HTTP URLs according to RFC 1738 is shown here. You find the BNF (Backus-Naur-Form) definition from Section 5 of the RFC in the comment for each element.

use Joomla\Utilities\RegEx;

// For character classes, the native way is simplest
$digit      = '[0-9]';
$alpha      = '[a-zA-Z]';
$alphadigit = '[a-zA-Z0-9]';
$hex        = '[0-9a-fA-F]';
$safe       = '[$\-_.+]';
$extra      = '[!*\'(),]';

// BNF: digits = 1*digit
$digits = RegEx::oneOrMore($digit);

// BNF: unreserved = alpha | digit | safe | extra
$unreserved = RegEx::anyOf($alpha, $digit, $safe, $extra);

// BNF: escape = "%" hex hex
$escape = '%' . $hex . $hex;

// BNF: uchar = unreserved | escape
$uchar = RegEx::anyOf($unreserved, $escape);

// BNF: domainlabel = alphadigit | alphadigit *[ alphadigit | "-" ] alphadigit
$domainlabel = RegEx::anyOf(
    $alphadigit . RegEx::noneOrMore(RegEx::anyOf(array($alphadigit, '-'))) . $alphadigit

// BNF: toplabel = alpha | alpha *[ alphadigit | "-" ] alphadigit
$toplabel = RegEx::anyOf(
    $alpha . RegEx::noneOrMore(RegEx::anyOf(array($alphadigit, '-'))) . $alphadigit

// Add the toplabel to the result with key 'tld'
$toplabel = RegEx::capture($toplabel, 'tld');

// BNF: hostname = *[ domainlabel "." ] toplabel
$hostname = RegEx::noneOrMore($domainlabel . '\.') . $toplabel;

// Add the hostname to the result with key 'domain'
$hostname = RegEx::capture($hostname, 'domain');

// BNF: hostnumber = digits "." digits "." digits "." digits
$hostnumber = $digits . '\.' . $digits . '\.' . $digits . '\.' . $digits;

// Add the hostnumber to the result with key 'ip'
$hostnumber = RegEx::capture($hostnumber, 'ip');

// BNF: host = hostname | hostnumber
$host = RegEx::anyOf($hostname, $hostnumber);

// Add the host to the result with key 'host'
$host = RegEx::capture($host, 'host');

// BNF: port = digits
$port = $digits;

// Add the port to the result with key 'port'
$port = RegEx::capture($port, 'port');

// BNF: hostport = host [ ":" port ]
$hostport = $host . RegEx::optional(':' . $port);

// BNF: hsegment = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
$hsegment = RegEx::noneOrMore(RegEx::anyOf($uchar, '[;:@&=]'));

// BNF: hpath = hsegment *[ "/" hsegment ]
$hpath = $hsegment . RegEx::noneOrMore('/' . $hsegment);

// Add the hpath to the result with key 'path'
$hpath = RegEx::capture($hpath, 'path');

// BNF: search = *[ uchar | ";" | ":" | "@" | "&" | "=" ]
$search = RegEx::noneOrMore(RegEx::anyOf(array($uchar, '[;:@&=]')));

// Add the search to the result with key 'query'
$search = RegEx::capture($search, 'query');

// BNF: httpurl = "http://" hostport [ "/" hpath [ "?" search ]]
$httpurl = 'http://' . $hostport . RegEx::optional('/' . $hpath) . RegEx::optional('\?' . $search);

$regex = '~^' . $httpurl . '$~';
$subject = '';

$parts = RegEx::match($regex, $subject);


    [host] =>
    [domain] =>
    [tld] => com
    [port] => 8080
    [path] => index.php
    [query] => foo=bar


As you can see from the example above, RegEx::match() returns the matches that have been appropriately marked using RegEx::capture(). Only the matches that have a value are returned. If the Regular Expression does not match, the result is an empty array.


Assign a key to an expression.

use Joomla\Utilities\RegEx;

$regex = RegEx::capture('[0-9]+', 'number');
print_r(RegEx::match($regex, 'abc123def'));


    [number] => 123


Add a 'zero or one' quantifier to an expression.

use Joomla\Utilities\RegEx;

print(RegEx::optional('regex')); // (?:regex)?


Add a 'one or more' quantifier to an expression.

use Joomla\Utilities\RegEx;

print(RegEx::oneOrMore('regex')); // (?:regex)+


Add a 'zero or more' quantifier to an expression.

use Joomla\Utilities\RegEx;

print(RegEx::noneOrMore('regex')); // (?:regex)*


Define a list of alternative expressions.

use Joomla\Utilities\RegEx;

print(RegEx::anyOf('a', 'b', 'c')); // (?:a|b|c)

$array = array('a', 'b', 'c');
print(RegEx::anyOf($array)); // (?:a|b|c)