Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create data syllabifier #2

Open
bumbu opened this issue Apr 17, 2014 · 4 comments
Open

Create data syllabifier #2

bumbu opened this issue Apr 17, 2014 · 4 comments

Comments

@bumbu
Copy link
Member

bumbu commented Apr 17, 2014

A script that will take data_mined.json as input and syllabify each phrase. Expected result is a JSON file with format:

[{
  "phrase": "Post title",
  "url": "http://9vo.lt/news/post1",
  "metadata": [
    {
      "word": "Post",
      "syllables": ["post"],
      "accent_on": 0
    },
    {
      "word": "title",
      "syllables": ["ti", "tle"],
      "accent_on": 0
    }
  ]
},
{
  "phrase": "Post Timofti",
  "url": "http://9vo.lt/news/post1",
  "metadata": [
    {
      "word": "Post",
      "syllables": ["post"],
      "accent_on": 0
    },
    {
      "word": "Timofti",
      "syllables": ["Ti", "mof", "ti"],
      "accent_on": 1
    }
  ]
}]

Expected output file name is data_syllabified.json.

@idanci
Copy link
Member

idanci commented Apr 17, 2014

@bumbu
Copy link
Member Author

bumbu commented Apr 17, 2014

Need to check how it works with Romanian words.

As there are always problems with Romanian some solutions might be:

@bumbu
Copy link
Member Author

bumbu commented Apr 17, 2014

Some JS hyphenators with Romanian language support:

@minivan
Copy link
Member

minivan commented Apr 18, 2014

Silabisitor gives you the accent as well and is available as a Java thingie

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants