-
Notifications
You must be signed in to change notification settings - Fork 0
Tools to squeeze knowledge out of the Unicode standard
Codepoints/unicodeinfo
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
U N I C O D E I N F O ======================= Tools to squeeze knowledge out of the Unicode standard SQLITE DATABASE: ------------------ The directory db contains tools to create an Sqlite3 database from Unicode’s data. Run `make` to create the database db/ucd.sqlite. The database itself contains every field as defined in “Unicode Character Database in XML” <http://www.unicode.org/reports/tr42/>. Boolean fields (in the original represented by literal "Y" and "N" values) are converted to INTEGER(1) fields, referenced codepoints are put in a separate table. The “#” notation mentioned in TR42 is resolved. Every codepoint in the BMP has a graphical representation (PNG, 16x16px) attached. This is provided by the GNU Unifont project (see license below). The db also contains tables of all blocks, planes, scripts and named character sequences as well as usual aliases for codepoints (including HTML entities and digraph names). For building the database you need: * Python 2.5 or newer * GNU make or compatible * wget for file retreaval * basic tools like Perl, sed, cat. Should be there on any *NIX (incl. Mac and Cygwin) * for the images: + ImageMagick's convert + pngcrush All in all it should be straight forward to get from downloading to the database. PYTHON TOOLS: --------------- The folder python/unicodeinfo contains a python package for handling the information of the SQLite database. It's not fully finished yet unfortunately. Use at your own risk. -- here be dragons. THE SMALL PRINT: ------------------ The database tools are provided in the public domain, as long as this doesn’t collide with the Terms of Usage of the Unicode Consortium <http://www.unicode.org/copyright.html>. The name “unicodeinfo” of this project does not imply any kind of affiliation of the author with Unicode Inc. The graphical representations retrieved via db/unifont.sh are subject to the UNIFONT LICENSE below. The applicable license, where public domain dedication is not possible, is the Creative Commons Zero License: <http://creativecommons.org/publicdomain/zero/1.0/> UNIFONT LICENSE: ------------------ If you want to use the images provided by db/unifont.sh you are bound to the following usage terms re GNU Unifont: Roman Czyborra released his work (Perl scripts and font .hex files) under the following terms: All of my works you find here are freeware. You may freely copy, use, quote, modify or redistribute them as long as you properly attribute my contribution and have given a quick thought about whether Roman might perhaps be interested to read what you did with his stuff. Horizontal rules don't apply. David Starner released what in this package is named "hex2bdf-split" under the same license as Roman Czyborra's work. License for all of Paul Hardy's work (except "johab2ucs2" and "blanks.hex", mentioned separately), Makefile and debian/ mods by Anthony Fok, and modified software from Luis Gonzalez Miranda (with permission granted to Paul Hardy): These are released under the terms of the GNU General Public License version 2, or (at your option) a later version. License for Fonts: Any fonts using glyphs from the "wqy-cjk.hex" file (including the default TrueType font) are bound by the terms of the Wen Quan Yi font license. Those fonts are released under the terms of the GNU General Public License (GPL) versionn 2, with the exception that embedding the font in a document does not by itself bind that document to the terms of the GNU GPL. Any fonts that do not use glyphs from the "wqy-cjk.hex" file fall under the above "License for all of Roman Czyborra's work". The fonts in "./font/precompiled" do use wqy-cjk.hex, and so are licensed under the GNU GPL version 2, with the exception that embedding the font in a document does not in itself bind that document to the terms of the GNU GPL. The following paragraphs explaining the exception is taken from the Wen Quan Yi font distribution: ** GPL v2.0 license with font embedding exception: As a special exception, if you create a document which uses this font, and embed this font or unaltered portions of this font into the document, this font does not by itself cause the resulting document to be covered by the GNU General Public License. This exception does not however invalidate any other reasons why the document might be covered by the GNU General Public License. If you modify this font, you may extend this exception to your version of the font, but you are not obligated to do so. If you do not wish to do so, delete this exception statement from your version. License for "blanks.hex": There is one exception to the above rules: Paul Hardy earlier released the "blanks.hex" file into the public domain.
About
Tools to squeeze knowledge out of the Unicode standard
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published