this is the abraxa lexicon, a multilingual vector memory milvus database of phonetic transcriptions to search for cognates and homophones across languages
much of the code in this repository was written as language processing modules for parsing data, for formatting and uploading to database.
work in progress
languages supported rn english greek arabic farsi german french burmese dutch turkish
with version 1. now that ive figured out vector mapping support is coming quickly for
japanese chinese hebrew finnish swedish frisian swahili khazak tamil and middle egyptian
update 5/17/2023: abraxa lexicon IPA(international phonetic alphabet) mapping key finalized
(link to key) (link to key breakdown)
and (link to how the abraxa lexicon works for deep language searches)
todo make graphics showing abraxa workflow for user literacy: input word is given in english or whatever epitran to phoneme phoneme to key to embedding embedding to abraxa
configure search overall homophone accuracy strength 0-100
consonant strength 0-100 vowel strength 0-100
todo group consanants embeddings and vowels embeddings offer configuration variable for string comparison strength
exception options if M and W are to be grouped in a search z and N A and V H and I f and s
from old literature
will take input from philologists once search is live
to configure a wide range of vector search configurations
--i just finished formatting all the json files uploading today into milvus vector memory