Skip to content

Latest commit

 

History

History
205 lines (140 loc) · 10.4 KB

File metadata and controls

205 lines (140 loc) · 10.4 KB

US/UK Spelling Converter

You provide the text, with either US/UK-spelling.

We return the same text, converted to either system.

We have you covered -- for about 20,000 words.

TOC

  1. TOC
  2. Online Demos
  3. Features
  4. Functionality
  5. Example Usage
  6. Code Structure and Design

Online Demos

Check out the code in an online demo...

Simple Demo Hosted by Us

Editable, Online Sandbox Demo (at IDEone.com)

Note: Since there are text limits to online compilers, we reduced the actual list of words covered to make this demo run.

Features

Regularly updated! Please submit corrections, additions, fixes, anything!

How many words are covered?

  • Total of 20,000 words covered, with multiple sources.
    • Source: VarCon/ISpell (18,000 words).
    • Source: WordsWorldWide (8,000 words).
    • Source: Our own personal list.
      • BtA List: Literary and archaic British variants (1500's to 1900's): (~500 words).
      • BtA List: Alternative Latinized spellings of Russian and French names: (~1,500 words).
      • BtA List: Alternative dashed-form words ("hundredfold" versus "hundred-fold"): (~2,000 words).
    • These lists were used to cross-check each other, correct errors, and remove duplicates.
    • Letter-sorted lists for easily updating and checking on words: A (1314 words), B (687 words), C (1,807 words), D (1,427 words), E (948 words), F (678 words), G (654 words), H (1,066 words), I (590 words), J (149 words), K (264 words), L (641 words), M (1,312 words), N (716 words), O (532 words), P (2,273 words), Q (57 words), R (1,071 words), S (2,024 words), T (800 words), U (1,259 words), V (450 words), W (177 words), X (0 words), Y (75 words), z (63 words).
  • Variants for British words.
    • For example, "unrealisable" and "unrealiseable".
  • Words are defined with simple associative array, making for a quick transfer to Perl, C++, Java, etc..
    • For example, the syntax of somekey=>"somevalue" is widely-used throughout many languages, or easily converted to their versions of this syntax.
  • Permissively-licensed
    • Do whatever you want with the code!
    • For example, see what others are doing with their personal, commercial, and legal rights as endowed by BSD-3-clause-licensed software.

Functionality

General Behavior

How in general does it work?

  • Exact / Error-Resistant
    • British/American Spelling Converter uses regular expression checking with /\b$word\b/, so this makes it impossible to corrupt words.
    • For example, "Ax" becomes "Axe", but "Axiomatic" will remain as "Axiomatic", and cannot become "Axeiomatic", which would be incorrect.
  • Fast / Efficient
    • Every mass-replace is done within a single preg_replace() call, using arrays as arguments
    • This means that the script will finish much sooner.
  • Reliable / Atomic / Deterministic
    • American-ize/British-ify will not corrupt meaning.
    • For example, 'discus' and 'diskus' have reverse meanings in US/UK, swapping them in or out will cause the text to change each time you "Americanize" or "Britishify" it. So, we don't do these types of swaps.

Precise Behavior - Use Cases

How exactly does it work?

  • Only all lower case, all upper case, or first letter capitalized versions are converted.
    • Example: American=>English, "axe"=>"ax", "AXE" would be converted to "AX" or vice versa, but "AxE would not be converted to Ax".
  • Apostrophes are treated as word boundaries.
    • Example: American=>English, "axe"=>"ax", "the ax's handle" would be converted to "the axe's handle."
  • Only precisely whole, known words are converted.
    • Example: American=>English, "axe"=>"ax", this will not convert "axed" to axd", because the "-d" concluding character indicates that it is an entirely different word.
  • Dashes are treated as word boundaries only when not preceded and followed by a dash.
    • Example: American=>English, "affecteffect=>affect-effect", this will convert "the affect-effect of it" to "the affecteffect of it", but it will not convert "these every-night-affect-effect-happenings are" to "these every-every-night-affecteffect-happenings are", as the dash here implies new meaning than when solely alone.
  • British alternates are handled.
    • Example: American=>English, "amoebas"=>["amoebae", "amebas", "amebae",], if converting to English, "amoebas" will be replaced with "amoebae", the most contemporary term, and if converting to American, "amoebae", "amebas", etc., will all be converted to the single, American equivalent.

Some test sentences...

The neighbour walked to the theatre's centre, manoeuvred about the sabre, and proceeded to reconnoitre the sepulchre in ochre.

The rumour spread that splendour and flavour were affected by our behaviour, so walk a metre in my mitre while carrying a litre of nitre.

The connexion with industrialisation remains with the municipalisation of the calibre of the fibre of the spectre, not with the meagre and sombre saltpetre with all its colour and honour.

Example Usage

How do I use the British/American Spelling Converter?

Americanize Text Example

How do I convert British-spelling text to American-spelling text?

require('AmericanBritishSpellings.php');
$american_british_spellings = new AmericanBritishSpellings([]);

$text = "Axiomatically ax that door, would you, my neighbour?";     // British input text source

$americanized = $american_british_spellings->SwapBritishSpellingsForAmericanSpellings(['text'=>$text]);

print($americanized);   // output: Axiomatically axe that door, would you, my neighbor?

Britishize Text Example

How do I convert American-spelling text to British-spelling text?

require('AmericanBritishSpellings.php');
$american_british_spellings = new AmericanBritishSpellings([]);

$text = "Axiomatically axe that door, would you, my neighbor?";     // American input text source

$britishized = $american_british_spellings->SwapAmericanSpellingsForBritishSpellings(['text'=>$text]);

print($britishized);   // output: Axiomatically ax that door, would you, my neighbour?

Code Structure and Design

Coding Languages

What coding languages are used in the British/American Spelling Converter?

The entire project is coded in the following...

  • PHP - For processing the text and storing the US/UK words.

Exclude List

How do you avoiding adding words that would break the deterministic / atomistic model of functionality?

We do this with an exclude list, which also details the conflict in the words themselves.

Check it out: Exclude List.

AmericanBritishSpellings.php - Technical Overview

What are the functions in the sourcecode files for?

AmericanBritishSpellings.php

Class for converting text from US/UK spellings to US/UK spellings.

  • __construct($args)
    • Constructor.
    • Load the words into the converter class for ready use.
  • SwapBritishSpellingsForAmericanSpellings($args)
    • Convert text with British spellings to text with American spellings.
  • SwapAmericanSpellingsForBritishSpellings($args)
    • Convert text with American spellings to text with British spellings.
  • GetSpellingsAndReplacements($args)
    • Get spellings and replacements based on the desired end language.
  • BuildSpellingAlternates($args)
    • Building spelling alternatives for British and American dialects.
  • BuildSpellingAlternatesForLanguage($args)
    • Building spelling alternates for a single particular dialect of a language (either British or American, in our case).
  • BuildSearchRegex($args)
    • Build an array of search regexes when given an array of search terms.
  • BuildSearchRegex($args)
    • Build a single search regex for a single search term.
  • BuildSpellingReplacements()
    • Build the replacements to be used for the search terms.

AmericanBritishSpellings_Words.php

Class for building word lists for converting UK/US english dialects.

  • __construct($args)
    • Constructor.
    • Nothing to do here.
  • GetBritishToAmericanSpellings()
    • Build a mapping of British to American spellings.
  • GetAmericanToBritishSpellings()
    • Build a mapping of American to British spellings from the /Language/Words/AmericanBritish/ classes.

AmericanBritishWords_A.php ... AmericanBritishWords_Z.php

  • __construct($args)
    • Constructor.
    • Load the words into the converter class for ready use.
  • AmericanBritishWords()
    • List of US/UK spellings for words starting with : A...Z.