Key error while preprocessing a text using AraBERT #186
Unanswered
ReemAlJunaid
asked this question in
Q&A
Replies: 2 comments
-
I tried even to delete that row, but the problem still exists in the text with index 1278. |
Beta Was this translation helpful? Give feedback.
0 replies
-
I realized that the text with index 1278 is not part of the training set, but the problem accrue while training the model with the training set, the training stopped with that index. Here is my complete code as well. Why the trainer goes to an index for the test data? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have several questions and I tried to preprocess them using AraBERT but an error exists and I don't know what is the possible reason for it.
This is my code to preprocess all text in the data column:
arabert_prep = ArabertPreprocessor(model_name.split("/")[-1])
train_HARD[DATA_COLUMN] = train_HARD[DATA_COLUMN].apply(lambda x: arabert_prep.preprocess(x)) test_HARD[DATA_COLUMN] = test_HARD[DATA_COLUMN].apply(lambda x: arabert_prep.preprocess(x))
I tried to print the value for the following indexes:
train_HARD[DATA_COLUMN][1277]
This prints 'سوره ق من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي'
train_HARD[DATA_COLUMN][1279]
This prints 'سوره ق من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي'
train_HARD[DATA_COLUMN][1278]
This one shuld print this : 'سوره يس من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي'
but it gave me this error:
`
KeyError Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3360 try:
-> 3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
5 frames
/usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
/usr/local/lib/python3.7/dist-packages/pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.Int64HashTable.get_item()
KeyError: 1278
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
in
----> 1 train_HARD[DATA_COLUMN][1278]
/usr/local/lib/python3.7/dist-packages/pandas/core/series.py in getitem(self, key)
940
941 elif key_is_scalar:
--> 942 return self._get_value(key)
943
944 if is_hashable(key):
/usr/local/lib/python3.7/dist-packages/pandas/core/series.py in _get_value(self, label, takeable)
1049
1050 # Similar to Index.get_value, but we do not fall back to positional
-> 1051 loc = self.index.get_loc(label)
1052 return self.index._get_values_for_loc(self, loc, label)
1053
/usr/local/lib/python3.7/dist-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
3361 return self._engine.get_loc(casted_key)
3362 except KeyError as err:
-> 3363 raise KeyError(key) from err
3364
3365 if is_scalar(key) and isna(key) and not self.hasnans:
KeyError: 1278
I though the problem in some characters of the text itself so I tried to copy the text and preprocess it to see the result:
from arabert.preprocess import ArabertPreprocessor
model_name = 'aubmindlab/bert-base-arabertv02'
arabert_prep = ArabertPreprocessor(model_name.split("/")[-1])
text = "سوره المرسلات من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي"
print(arabert_prep.preprocess(text))
text2 = "سوره يس من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي"
print(arabert_prep.preprocess(text2))
`
This prints the following and without any error:
سوره المرسلات من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي
سوره يس من السور المكيه التنزيل الا ايه واحده نزلت بالمدينه فما هي
Any idea how to solve this error?
Beta Was this translation helpful? Give feedback.
All reactions