Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compiling with HFST tools #13

Open
jaxon-shi opened this issue May 13, 2015 · 2 comments
Open

compiling with HFST tools #13

jaxon-shi opened this issue May 13, 2015 · 2 comments

Comments

@jaxon-shi
Copy link

Wondering if you have any pointers / instructions for compiling the latest version of TRmorph with HFST tools. These instructions from Apertium are for the SFST version. Similarily, these pre-compilied HFST turkish resources are based on trmorph-0.2.1. It appears an older version is also being used by the HFST web demo -- as it does not have nearly the coverage of your current system.

If possible, I would like to build trmorph.hfst or trmorph.hfst.ol (an hfst transducer in optimized lookup format) from a recent version of TRmorph for testing. It's unclear whether to use hfst-xfst for this (with something like hfst-xfst source ...?), hfst-lexc, hfst-foma, or something else. Any guidance you could provide would be very much appreciated.

@jaxon-shi
Copy link
Author

As a workaround, the following seems to work for converting trmorph.fst to the optimized HFST format.

This requires:

  1. trmorph.fst built with foma ~ 0.9.17
  2. a local installation of HFST with foma support enabled. For example, with something like:
    ./configure --enable-proc --enable-lexc --enable-all-tools

Once HFST is properly installed, hfst-fst2fst can covert the trmorph.fst file to other formats. The only caveat is that the foma .fst file must first be manually decompressed in order for the hfst tools to read it. (see the note at the bottom of this page

# create a decompressed copy of the foma transducer

cp trmorph.fst trmorph.foma.gz
gunzip trmorph.foma.gz

# then the invert + convert steps are the same as before 

# for an analyzer
hfst-invert -i trmorph.foma | hfst-fst2fst -O -o trmorph.hfst.ol

# for a generator
hfst-fst2fst -O -i trmorph.a -o trmorph.hfst.ol

echo "kuşlarda" | hfst-lookup trmorph.hfst.ol
> kuşlarda kuş<N><pl><loc>    0.000000
kuşlarda   kuş<N><pl><loc><0><V>  0.000000
kuşlarda   kuş<N><pl><loc><0><V><cpl:pres><3p>    0.000000
kuşlarda   kuş<N><pl><loc><0><V><cpl:pres><3s>    0.000000

One advantage of converting trmorph.fst to the optimized lookup format (trmorph.hfst.ol) is that it can be used with the stand alone HFST Runtime Interface, which is an Apache Licensed C++ library.

@coltekin
Copy link
Owner

Thanks for the report and the workaround.

The issue is with newer versions of foma which I believe is used by HFST for compiling xfst source. TRmorph does not compile with foma 0.9.18. I was aware of this for a while, but could not find time to isolate the problem before. It seems foma 0.9.18 fails with lexc entries with regular expressions. TRmorph uses this in quite a few palces. A workaround within TRmorph source code is not trivial.

I have just opened an issue with foma. Hopefully, it will be fixed soon. Meanwhile, your workaround is probably the best solution for using TRmorph with HFST.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants