Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range #3

Open
danilyef opened this issue Dec 14, 2021 · 5 comments
Open

IndexError: list index out of range #3

danilyef opened this issue Dec 14, 2021 · 5 comments

Comments

@danilyef
Copy link

Unfortunately, the issue with "index out of ranged" is not fixed:

if only_nouns:
    247         # workaround to prevent unwanted behaviour (only nouns are eligible)
--> 248         results[0] = results[0][0].upper() + results[0][1:]
    249         for ri in range(len(results) - 1):
    250             if results[ri].islower():

Example words: 'Rechtsanwält','Schätzmeister','Ferialjob','Infrasturktur'
dissection = comp_split.dissect(one_of_example_words, ahocs, make_singular=True)

@repodiac
Copy link
Owner

Thanks @danilyef - I will look into it. In case, you are more than welcome to provide a PR :-)

@PythonJDoe
Copy link

I'm not sure if it's right place to post, but I couldn't find any forum for this so I'm posting here. I'm facing a problem to work with german_compound_splitter. I have a large list of German text which I want to split & use for text mining. texts is the list containing German text which I want to split & store in another list text[]. So I wrote following code

text=list()
for i in range(length):
    s=comp_split.merge_fractions(comp_split.dissect(texts[i], ahocs, make_singular=True))
    text.append(s)

But I'm getting following error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
/tmp/ipykernel_483/1238906536.py in <module>
      1 text=list()
      2 for i in range(length):
----> 3     s=comp_split.merge_fractions(comp_split.dissect(texts[i], ahocs, make_singular=True))
      4     text.append(s)

/opt/conda/lib/python3.9/site-packages/german_compound_splitter/comp_split.py in dissect(compound, ahocs, only_nouns, make_singular, mask_unknown)
    246     if only_nouns:
    247         # workaround to prevent unwanted behaviour (only nouns are eligible)
--> 248         results[0] = results[0][0].upper() + results[0][1:]
    249         for ri in range(len(results) - 1):
    250             if results[ri].islower():

IndexError: list index out of range

Can you please guide me to resolve the error?

@repodiac
Copy link
Owner

Hi @PythonJDoe - thanks for your inquiry. Sorry to hear you experienced this error. My time is limited currently, but I will look into it and get back to you asap. It seems to be the same/a similar error and you are the second to mention - so it should be addressed, I agree.

@emphasize
Copy link

emphasize commented Jul 15, 2022

you don't remove list items in a (for) loop. With this it should be solved

# empties the list entry (if necessary) and removes it afterwards
# with single letters it reverse searches for non-empty entries and applies the letter 
# String.title() capitalizes the first letter

    if only_nouns and results:        
        results[0] = results[0].title()
        for ri in range(len(results) - 1):
            if results[ri].islower():
                merged = results[ri] + results[ri + 1].lower()
                if ahocs.exists(merged):   # does ahocs.exists() disregards capitalization?
                    results[ri] = merged.title()
                    results[ri + 1] = ""
                else:
                    if len(results[ri]) == 1:
                        aritfact_single_letter = results[ri]
                        for i in range(1, ri+1):
                            if results[ri - i]:
                                results[ri - i] += aritfact_single_letter
                                break
                        results[ri] = ""

    results = list(filter(None, results))

@repodiac
Copy link
Owner

Thanks @emphasize - I appreciate your efforts. I didn't have the time yet to look further into this issue, I am sorry. I'll try to check it asap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants