Skip to content

Guidelines for editing dictionary

Koko edited this page Apr 6, 2025 · 126 revisions

⚠️ Warning: the dictionary is still influx. This document might be outdated.

ilo Token make use of dictionary. It is like a typical dictionary but it's made for computers to read. Although it should be fairly human readable. The dictionary defines word to word or word to phrase translation of each word.

ilo Token has two dictionaries: global dictionary and custom dictionary.

The global dictionary contains definitions that everyone will see. The global dictionary lives within the source code and may be edited by anyone through GitHub.

The custom dictionary lives within the website and is different for everyone. The main purpose of custom dictionary is to be able to customize the dictionary and test it without bothering to use GitHub. Although the custom dictionary can be used to extend ilo Token with more non-pu words. It comes with limitations however.

Note to readers

We want this document to be as approachable and accessible as possible. We don't want it to be overly technical. If something is confusing, please report it:

Target vocabulary for global dictionary

We're not going to add more words. Unless it gets added in lipu Linku in "common" category.

The target vocabulary for ilo Token is:

  • Words listed in pu.
  • Words listed in nimi ku suli.
  • Words used in su.
  • Words listed in lipu Linku from core to common.

Once words are added, it shall stay there, so words that has been in "common" category but later moved to lower categories shall stay in ilo Token.

Using the custom dictionary

In order to use the custom dictionary, you need to be familiar with the syntax of the dictionary. You can read the syntax guidelines further below; Make use of the table of contents found in the sidebar. We also provide some quick dictionary setups.

The interface should be straightforward. At the top of the modal window is the Import Word text box. It is used to import existing definitions from the global dictionary so you can then extend or modify it.

At the center is the custom dictionary. You don't need to add every word there. You can just add words you'll modify or introduce.

You can "delete" words in which ilo Token will no longer recognize it. This is done by adding the definition head but leaving the body blank. Add comments so you know it's intentional.

kokosila:
    # (deleted)

Particles are hardcoded and therefore cannot be completely removed.

If you think your custom dictionary setup is better than the global dictionary. Please tell us so it can be added to the global dictionary:

Custom dictionary limitations

If you're interested in extending ilo Token with more non-pu words, know that the dictionary is designed only for words in the global dictionary. Even the global dictionary comes with limitations. Some but not all limitations include:

  • No more particles and other grammatical words such as "alu".
  • With preverbs, your only choice are linking verbs, catenative verbs, and modal verbs.
  • Words starting with uppercase letter such as "Pingo" is undefinable since it is different from the convention. It can only be recognized as a proper word. "kalamARR" however is definable.

Code structure

We recommend taking a look at the global dictionary first to get a sense of how it is written down. You'll see that it is defined like the following:

word:
    definitions;
    definitions;

another word:
    definitions;
    definitions;

Syntax is important, please don't forget the semicolon.

Sometimes, words are considered synonyms like "ale" and "ali". In these cases, we merge them together:

ale, ali:
    definitions;
    definitions;

Each definitions may contain any of these: word unit, tag, and placeholder. Consider the following.

seli:
    burn(v) [object];

burn is the word unit, (v) is the tag, and [object] is the placeholder. Word units and tags always comes together, the tag represents what kind the word is, usually its part of speech. Placeholders represents a place that ilo Token may fill in, although placeholders aren't used much. Placeholders are mainly used to keep the definitions as unambiguous as possible.

A word unit may span multiple words:

jan:
    human being(n);

Sometimes, word units are separated by forward slash /. The function of these are dependent on what the tag is but it tends to be for defining different forms or conjugations.

ona:
    they/them(personal pronoun plural);
    it/it(personal pronoun singular);

A tag may contain more information which are sometimes needed depending on the tag.

pan:
    baked(adj qualifier) goods(n plural);

A definition may have multiple word units, tags, and placeholders, forming a phrase.

olin:
    have(v) strong(adj opinion) emotional(adj opinion) bond(n singular) with(prep) [object];

All of these syntax isn't free-form, it must follow a certain pattern. ilo Token isn't going to magically understand it all. Definitions may be rewritten or simplified in order to fit within the limitations.

"kokosila" for example has to be written like the following, we can't add the "in an environment where Toki Pona is more appropriate" part.

kokosila:
    speak(v) a(d article singular) non-Toki Pona(adj qualifier) language(n singular);

The patterns are explained further below.

Adding comments

This is a way to tell computers "just ignore what I've written here". In the dictionary, it is denoted by hash sign #. Whatever followed by # are ignored. This is useful for disabling pieces of codes as well as writing notes meant for contributors instead of computers.

You may find a couple of comments in the code.

Escaping

If a word contains a special symbol, wrap it inside backticks `, ilo Token will not include the backticks.

pu:
    interact(v) with(prep) the(d article) book(n singular) titled(adj) Toki Pona`:` The Language of Good(proper n);

You can wrap backticks itself in case you're wondering: ```.

The following symbols are needed to be escaped. Not all are currently in use but are reserved for future use in case the syntax changes. #, (, ), *, +, /, :, ;, <, =, >, @, [, \, ], ^, `, {, |, }, ~.

Defining nouns

Use the tag (n) to define nouns. With some exceptions, you may also use this for pronouns since pronouns tend to act like a noun.

kasi:
    plant(n);

You may add determiners and adjectives before it.

sewi:
    highest(adj origin) part(n);

Adjectives before nouns may not be compounded. Just removing the word and is a good work around.

palisa:
    # bad
    long(adj size) and(c) hard(adj material) thing(n);

    # good
    long(adj size) hard(adj material) thing(n);

You may add an adjective and proper noun after the noun.

pu:
    the(d article) book(n) titled(adj) Toki Pona`:` The Language of Good(n proper);

ilo Token will automatically apply conjugations e.g. singular and plural forms, but if you wish to force it to be singular only or plural only, add singular or plural to the tag.

telo:
    liquid(n singular);

mani:
    savings(n plural);

In some cases, automatic conjugation can fail, these tends to happen with pronouns. In these cases, you may use slash / and manually define the singular and plural forms, or limit it as singular only or plural only if needed as explained above.

ni:
    this/these(n);
    that/those(n);

seme:
    what/what(n);
    which/which(n);

TODO: mention gerund

Defining personal pronouns

Because personal pronouns has different forms when used as subject or object. We need to define them separately from nouns. Use the tag (personal pronoun) to define them.

There is no automatic conjugation. Use slashes / and define them as follows: singular subject, singular object, plural subject, and then plural object.

You also need to specify whether it is first, second, or third person.

mi:
    I/me/we/us(personal pronoun first);

Sometimes, pronouns only have a singular form or a plural form. In these cases, include singular or plural in the tag. You'll only need to write the subject and object form.

ona:
    they/them(personal pronoun third plural);
    it/it(personal pronoun third singular);

Just an amusing side note: to ilo Token, it is it/it not it/its.

Remember to only consider the grammatical number and not the semantic number: they/them, while can refer to a singular person, is always grammatically plural as it always follows are when used as a subject.

Remember to define possessives as well, these are determiners.

Defining adjectives

Use the (adj) tag to define adjective. You'll need to classify what kind of adjective it is which is needed for reordering chains of adjectives. Apparently, it's "Big Red Balloon" and not "Red Big Balloon"

pona:
    good(adj opinion)

Here are the classification for adjectives and will be ordered from left to right. These are based on the list found on Wikipedia.

  • opinion
  • size
  • physical quality – Particularly a visible quality e.g. flat, circular
  • age
  • color
  • origin – Where it comes from or where it is located e.g. "nearby object"
  • material – Including the property of the material e.g. "hard object"
  • qualifier – Particularly a modifier of compound nouns e.g. "transgender person"

These are just rough categories to aid in sorting adjective and are not set in stone. If new categories are needed, please open a new issue.

Some adjectives may belong in two or more categories, in these cases, test it out. Here's an example: the "land" in "land animal", it can be origin or qualifier. We'll try it with another adjective whose category is in the middle of origin and qualifier, let's say "hard" which is material. Then we'll test it: "hard land animal" or "land hard animal", the former feels less awkward, and so we can determine "land" in "land animal" is a qualifier.

Adjectives may be followed by adverb.

jelo:
    lime(av) yellow(adj color);

Adjectives may be compounded using and(c). This form is currently limited: there can't be adverbs; there can't be more than 2 adjectives; and there can't be conjunctions other than "and". If lifting these limitations is needed please open an issue and tell why.

linja:
    long(adj size) and(c) flexible(adj material);

ilo Token may remove the word "and" when translating: "moku linja" becomes "long flexible food".

TODO: mention gerund-like

Defining determiners

To ilo Token, determiners and adjectives are different classification. Determiners acts as limiter instead of modifier.

Use the tag (d). You'll need to specify its classification:

ale, ali:
    every(d distributive);

Here are the classification of determiners:

  • article e.g. "the", "a", and "an"
  • demonstrative e.g. "that balloon"
  • distributive e.g. "every balloon" or "each balloon"
  • interrogative e.g. "which balloon"
  • possessive e.g. "my balloon"
  • quantifier e.g. "few balloons" or "many balloons"
  • negative e.g. "not balloon"
  • numeral e.g. "1 balloon" – For pu numbers, use numerals instead

Sometimes, determiners limits what grammatical number the noun can be. In these cases, define them inside the tag as well using keywords singular or plural after the determiner classification.

ale, ali:
    all(d distributive plural);

The determiner "all" forces the noun to be plural e.g. "all apples".

Remember to only consider the grammatical number:

ala:
    zero(d quantifier plural);

Giving an example noun for an explanation: "zero apples", while this refers to zero apples, it is grammatically plural by its form.

Take note that we are using "zero" as an example here, don't actually use this example. "zero" is better defined as 0(num) using numeral definition.

Sometimes, determiners itself has singular or plural forms. In these cases, use slash /. There is no automatic conjugation for this.

ni:
    this/these(d demonstrative);
    that/those(d demonstrative);

Defining numerals

Numerals are technically part of determiner or noun. But since numbers in Toki Pona has interesting grammatical functions, numerals are defined separately. Remember these are for exact numbers, like actual integers. For words describing a rough number e.g. "few", "many", use determiner instead.

Use the tag (num). Use arabic numerals instead of english word in latin. Use 5, not five.

luka:
    5(num);

Defining verbs

Use the tag (v) to define verbs. Verbs may follow particles treating as the two as a singular word.

awen:
    continue(v);

open:
    turn on(v) [object];

Verb definitions may come with direct or indirect objects.

musi:
    have(v) fun(n singular);

ku:
    interact(v) with(prep) the(d article) Toki Pona(adj qualifier) Dictionary(n singular);

Use the placeholder [object] for transitive verbs. You may use prepositions.

ante:
    change(v) [object];

lukin, oko:
    look(v) at(prep) [object];

Verb definitions are also used for preverbs. ilo Token supports translating preverbs into linking verbs, catenative verbs, and modal verbs. Make sure to use the [predicate] placeholder.

awen:
    remain(v linking) [predicate];

alasa:
    try to(v) [predicate];

ken:
    can(v modal) [predicate];

These preverbial definitions are also used for non-preverbs: "mi alasa" will translate to "I try to".

The automatic conjugation may fail. In these cases, provide all the needed conjugations with the following order: present plural or infinitive, present singular, then finally past.

mu:
    hiss/hisses/hissed(v);

Defining adverbs

Defining adverbs are as easy as it can get. Use the tag (adv).

pona:
    nicely(adv);

Don't add so(adv) to the word "a", this is hardcoded instead. (adv) definitions are for content words.

Defining fillers

These are for words "a", "n" and other similar words. Use the tag (f).

Fillers are permitted to be elongated. You'll have to provide different length elongation in a strict pattern: Only one letter can be repeated and it must be in a consistent increasing pattern only increasing by one.

a:
    ah/aah/aaah(f);

n:
    hm/hmm/hmmm(f);

You may just provide just 2 forms but we recommend sticking to 3.

a:
    ah/aah(f);

You may not provide any elongation at all, these won't be used when "a" or "n" are elongated.

a:
    ah(f);

Translating "a a a" to "hahaha" is hardcoded in the code. You don't need to define them.

See also interjection.

Defining interjections

Defining interjections are as easy as it can get. Use the tag (i).

mu:
    bark(i);

Interjection definitions are only used when the Toki Pona word is used alone or with "a" in the sentence.

Don't use interjection for particles "a" and "n", use filler instead. (i) definitions are for content words.

Defining preposition

These are for Toki Pona prepositions. Toki Pona preposition happens to be translatable into English preposition. Use the tag (prep). Placeholder [indirect object] are needed.

lon:
    in(prep) [indirect object];

A bit of laziness on the developer's part: You may define adjective-preposition phrase as well as nested preposition as a single preposition.

sama:
    similar to(prep) [indirect object];

kepeken:
    by means of(prep) [indirect object];

Defining noun-preposition phrase

For example "kili lili" can mean "part of fruit". You may define this kind of definition like the following.

lili:
    part(n) of(prep) [headword];

Defining particle definition

These are for Toki Pona particles. The functionality of particles are hardcoded and cannot be customized with dictionary alone. These definitions are only used in dictionary mode when ilo Token is queried with a single word. Use the closest English word that the word can translate to. Use the tag (particle def).

anu:
    or(particle def);

You may instead describe how the word is used, wrap it in square brackets [], you'll have to wrap it in backticks ` too because square brackets are special characters used for placeholders.

a:
    `[`placed after something for emphasis or emotion`]`(particle def);

Definition order

Order matters, ilo Token will try to use the first definition and output it first, although not always. So please reorder the definitions from most-likely definition to least.

Avoiding calques and confusing definitions

We borrow definitions from lipu Linku which itself avoids calques. However, we still need to avoid words that generally has multiple meaning that could be confused at. For example, the word "cool", which is simultaneously the word for "lete" and "epiku" which have different meaning, so the word "cool" should be avoided.

Shrinking down definition number

ilo Token will show many output, and it may be very numerous. To counteract this, please reduce the number of definition if possible, try to use words with broad meaning that aligns well with the Toki Pona word.

Using lipu Linku

We recommend using lipu Linku as a reference. lipu Linku is very high quality. You may borrow definitions from it. You may deviate from lipu Linku if needed. Consider contributing to lipu Linku as well.