Skip to content

Commit

Permalink
Update tokenization.py
Browse files Browse the repository at this point in the history
  • Loading branch information
KINGNEWBLUSH authored Mar 10, 2024
1 parent dbe8936 commit 8fd96b2
Showing 1 changed file with 2 additions and 2 deletions.
4 changes: 2 additions & 2 deletions EduNLP/SIF/tokenization/text/tokenization.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,7 @@ def tokenize(text,
token for token in word_tokenize(text)
if token not in stopwords and token.strip()
]
except:
except OSError:
nltk.download('punkt')
return [
token for token in word_tokenize(text)
Expand All @@ -101,7 +101,7 @@ def tokenize(text,
huggingface_tokenizer.models.BPE())
try:
tokenizer.load(bpe_json, pretty=True)
except:
except OSError:
if (bpe_trainfile is None):
raise OSError("bpe train file not found, using %s." %
bpe_trainfile)
Expand Down

0 comments on commit 8fd96b2

Please sign in to comment.