-
Notifications
You must be signed in to change notification settings - Fork 53
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IndexError: list index out of range #68
Comments
Running into the same problem. Any way to maybe sanitize the string to not run into this problem? |
Seems to be a problem with some unicode characters. Encoding to ascii and then decoding back to utf-8 works. import unicodedata
...
text = "My Random Character text"
text = unicodedata.normalize('NFKD', text ).encode('ascii', 'ignore').decode("utf-8")
annotations = skill_extractor.annotate(text ) |
I am still running in the same issue using the encoding/decoding: import spacy
from spacy.matcher import PhraseMatcher
import unicodedata
# load default skills data base
from skillNer.general_params import SKILL_DB
# import skill extractor
from skillNer.skill_extractor_class import SkillExtractor
# init params of skill extractor
nlp = spacy.load("en_core_web_lg")
# init skill extractor
skill_extractor = SkillExtractor(nlp, SKILL_DB, PhraseMatcher)
text = "Learn how to become a professional wedding makeup artist"
text = unicodedata.normalize('NFKD', text ).encode('ascii', 'ignore').decode("utf-8")
annotations = skill_extractor.annotate(text ) I still get the same error ---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[2], line 4
2 text = "Learn how to become a professional wedding makeup artist"
3 text = unicodedata.normalize('NFKD', text ).encode('ascii', 'ignore').decode("utf-8")
----> 4 annotations = skill_extractor.annotate(text )
File [~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/skill_extractor_class.py:129](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jila/Documents/python_projects/skillNER/~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/skill_extractor_class.py:129), in SkillExtractor.annotate(self, text, tresh)
123 skills_abv, text_obj = self.skill_getters.get_abv_match_skills(
124 text_obj, self.matchers['abv_matcher'])
126 skills_uni_full, text_obj = self.skill_getters.get_full_uni_match_skills(
127 text_obj, self.matchers['full_uni_matcher'])
--> 129 skills_low_form, text_obj = self.skill_getters.get_low_match_skills(
130 text_obj, self.matchers['low_form_matcher'])
132 skills_on_token = self.skill_getters.get_token_match_skills(
133 text_obj, self.matchers['token_matcher'])
134 full_sk = skills_full + skills_abv
File [~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/matcher_class.py:332](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jila/Documents/python_projects/skillNER/~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/matcher_class.py:332), in SkillsGetter.get_low_match_skills(self, text_obj, matcher)
329 for match_id, start, end in matcher(doc):
330 id_ = matcher.vocab.strings[match_id]
--> 332 if text_obj[start].is_matchable:
333 skills.append({'skill_id': id_+'_lowSurf',
334 'doc_node_value': str(doc[start:end]),
335 'doc_node_id': list(range(start, end)),
336 'type': 'lw_surf'})
338 return skills, text_obj
File [~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/text_class.py:304](https://vscode-remote+wsl-002bubuntu.vscode-resource.vscode-cdn.net/home/jila/Documents/python_projects/skillNER/~/anaconda3/envs/skillner/lib/python3.9/site-packages/skillNer/text_class.py:304), in Text.__getitem__(self, index)
277 def __getitem__(
278 self,
279 index: int
280 ) -> Word:
281 """To get the word at the specified position by index
282
283 Parameters
(...)
302 english
303 """
--> 304 return self.list_words[index]
IndexError: list index out of range |
Facing this issue as well. Did you ever find a solve @Jibril-Frej ? |
No real fix. I just do a try catch. try:
skill_extractor.annotate(target_text)
except IndexError:
pass
except ValueError:
pass |
I am also encountering this error. I would really like to use SkillNER but this issue is really preventing me from being able to do so. |
Hello I found the solution. I think that the package has not been updated. First, please find your 'matcher_class.py' in your package directory please modify this function "def get_low_match_skills" like https://github.com/AnasAito/SkillNER/blob/master/skillNer/matcher_class.py : add: complete function :
or you can re-install package using git!! |
Some strings make the annotate function crash:
If you run the code above you should get the following error:
The text was updated successfully, but these errors were encountered: