node-unicode-data

JavaScript-compatible Unicode data generator. Arrays of code points, arrays of symbols, and regular expressions for every Unicode version’s categories, scripts, script extensions, blocks, bidi data, and other properties — neatly packaged into a separate npm package per Unicode version.

Using the data in your scripts

To use the generated data, simply install one of the npm modules generated by this script. Separate packages are available for each Unicode version. This allows you to do stuff like:

// Get an array of all code points with the `White_Space` property:
const codePoints = require('@unicode/unicode-6.3.0/Binary_Property/White_Space/code-points');
// Get an array of strings (containing one symbol each) in the `Lu` category:
const symbols = require('@unicode/unicode-6.3.0/General_Category/Uppercase_Letter/symbols');
// Get a regular expression that matches any symbol in the `Aegean Numbers` block:
const regex = require('@unicode/unicode-6.3.0/Block/Aegean_Numbers/regex');
// Get an array of all code points in the `Egyptian_Hieroglyphs` script:
const hieroglyphs = require('@unicode/unicode-6.3.0/Script/Egyptian_Hieroglyphs/code-points');
// Get the canonical category a given code point belongs to:
// (Note: U+0041 is LATIN CAPITAL LETTER A)
const category = require('@unicode/unicode-6.3.0/General_Category').get(0x41);
// Get an array of all code points with a given bidi class:
const lre = require('@unicode/unicode-6.3.0/Bidi_Class/Left_To_Right_Embedding/code-points');
// Get the directionality of a given code point:
const directionality = require('@unicode/unicode-6.3.0/Bidi_Class').get(0x41);
// What glyph is the mirror image of `«` (U+00AB)?
const mirrored = require('@unicode/unicode-6.3.0/Bidi_Mirroring_Glyph').get(0xAB);
// Get a regular expression that matches all opening brackets:
const openingBrackets = require('@unicode/unicode-6.3.0/Bidi_Paired_Bracket_Type/Open/regex');
// …you get the idea.

For more information, see the README for the package you’re interested in. Here’s the full list of npm packages generated by this script:

Note that these READMEs are auto-generated by this script, too – they describe all the data that is available for that particular Unicode version. To programmatically get this list of available categories, scripts, script extensions, blocks, and properties for a given Unicode version, just require the main module for that version:

> require('unicode-6.3.0');
{
	'Binary_Property': [
		'Alphabetic', 'Any', 'ASCII', 'ASCII_Hex_Digit', 'Assigned', …
	],
	'General_Category': [
		'Cased_Letter','Close_Punctuation','Connector_Punctuation', …
	],
	'Script': [
		'Arabic', 'Armenian', 'Avestan', …
	],
	'Script_Extensions': [
		'Arabic', 'Armenian', 'Avestan', …
	],
	'Block': [
		'Aegean Numbers', 'Alchemical Symbols', …
	],
	'Case_Folding': [
		'C', 'F', 'S', 'T'
	],
	'Bidi_Class': [
		'Arabic_Letter', 'Arabic_Number', 'Boundary_Neutral', …
	],
	'Bidi_Mirroring_Glyph': [],
	'Bidi_Paired_Bracket_Type': [
		'Close', 'None', 'Open'
	]
}

For project maintainers

After cloning this repository, before doing anything else, run:

./clone-repos.sh

This clones all the generated repositories to your local output folder. You can then make changes to node-unicode-data, and use ./bootstrap.sh to commit and push changes to each of these repositories.

Generating the data

npm run-script download (re-)downloads the Unicode source files for all the Unicode versions defined in data/resources.js, saving them in the data folder.

npm run-script build generates data for categories, scripts, blocks, and properties for all the Unicode versions defined in data/resources.js. This may take a few minutes… In total, roughly 1.5 GB of data is generated. The regular expressions are generated using Regenerate.

Testing

npm test generates the data for the oldest and latest available Unicode version. This is a good way to test changes to the generator scripts before running npm run-script generate.

npm run-script cover generates the code coverage report.

Author


Mathias Bynens

License

This module is available under the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
data		data
scripts		scripts
static		static
templates		templates
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.nvmrc		.nvmrc
LICENSE-MIT.txt		LICENSE-MIT.txt
README.md		README.md
bootstrap.sh		bootstrap.sh
clone-repos.sh		clone-repos.sh
index.js		index.js
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

node-unicode-data

Using the data in your scripts

For project maintainers

Generating the data

Testing

Author

License

About

Uh oh!

Releases

Packages

Languages

License

sengthaite/node-unicode-data

Folders and files

Latest commit

History

Repository files navigation

node-unicode-data

Using the data in your scripts

For project maintainers

Generating the data

Testing

Author

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages