Tagalog Words Stemmer using Python

Description:

Tagalog Words Stemmer is a program that processes Tagalog words by removing all of its affixes and returns the root of the words.

Sample Output:

Input: "Patuloy pa din sila sa paghahanap ng posibleng gamot sa malubhang sakit ng dinaramdam ng kanyang ina."

word : root

patuloy : tuloy
pa : pa
din : din
sila : sila
sa : sa
paghahanap : hanap
ng : ng
posibleng : posible
gamot : gamot
sa : sa
malubhang : lubha
sakit : sakit
ng : ng
dinaramdam : daramdam
ng : ng
kanyang : kanya
ina. : ina

word_info

{'prefix': ['pa'], 'clean': [], 'infix': [], 'root': 'tuloy', 'repeat': [], 'suffix': [], 'word': 'Patuloy', 'dupli': []}
{'prefix': '[]', 'clean': '[]', 'infix': '[]', 'root': 'pa', 'repeat': '[]', 'suffix': '[]', 'word': 'pa', 'dupli': '[]'}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'din', 'repeat': [], 'suffix': [], 'word': 'din', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'sila', 'repeat': [], 'suffix': [], 'word': 'sila', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'sa', 'repeat': [], 'suffix': [], 'word': 'sa', 'dupli': []}
{'prefix': ['pag'], 'clean': [], 'infix': [], 'root': 'hanap', 'repeat': ['ha'], 'suffix': [], 'word': 'paghahanap', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'ng', 'repeat': [], 'suffix': [], 'word': 'ng', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'posible', 'repeat': [], 'suffix': ['ng'], 'word': 'posibleng', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'gamot', 'repeat': [], 'suffix': [], 'word': 'gamot', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'sa', 'repeat': [], 'suffix': [], 'word': 'sa', 'dupli': []}
{'prefix': ['ma'], 'clean': [], 'infix': [], 'root': 'lubha', 'repeat': [], 'suffix': ['ng'], 'word': 'malubhang', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'sakit', 'repeat': [], 'suffix': [], 'word': 'sakit', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'ng', 'repeat': [], 'suffix': [], 'word': 'ng', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': ['in'], 'root': 'daramdam', 'repeat': [], 'suffix': [], 'word': 'dinaramdam', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'ng', 'repeat': [], 'suffix': [], 'word': 'ng', 'dupli': []}
{'prefix': [], 'clean': [], 'infix': [], 'root': 'kanya', 'repeat': [], 'suffix': ['ng'], 'word': 'kanyang', 'dupli': []}
{'prefix': [], 'clean': ['.'], 'infix': [], 'root': 'ina', 'repeat': [], 'suffix': [], 'word': 'ina.', 'dupli': []}

validation

Accuracy: 94.12%
Errors: ['daramdam']

Usage:

python TglStemmer.py [mode] [source] [info]

modes: [1: text_file] [2: raw_string]
source: [1: file_name] [2: "raw_string"]
info: [1 word-root] [2: show_word_info]

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
output		output
LICENSE		LICENSE
README.md		README.md
TglStemmer.py		TglStemmer.py
filter.py		filter.py
validation.txt		validation.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tagalog Words Stemmer using Python

Description:

Sample Output:

word : root

word_info

validation

Usage:

Fix List:

About

Releases

Packages

Languages

License

crlwingen/TagalogStemmerPython

Folders and files

Latest commit

History

Repository files navigation

Tagalog Words Stemmer using Python

Description:

Sample Output:

word : root

word_info

validation

Usage:

Fix List:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages