Skip to content

Commit

Permalink
Extract sentence splitting in SemanticChunker into a private method (#30
Browse files Browse the repository at this point in the history
)

This change allows users to easily override splitting the text into
sentences in the SemanticChunker, which allows them to use their own
sentence splitting algorithm. Since the splitting logic wasn't changed,
the unit tests still pass.

Implements issue
#29

Co-authored-by: levara <[email protected]>
  • Loading branch information
Levara and levara authored Dec 23, 2024
1 parent dce0640 commit b3172d8
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion libs/experimental/langchain_experimental/text_splitter.py
Original file line number Diff line number Diff line change
Expand Up @@ -208,12 +208,15 @@ def _calculate_sentence_distances(

return calculate_cosine_distances(sentences)

def _get_single_sentences_list(self, text: str) -> List[str]:
return re.split(self.sentence_split_regex, text)

def split_text(
self,
text: str,
) -> List[str]:
# Splitting the essay (by default on '.', '?', and '!')
single_sentences_list = re.split(self.sentence_split_regex, text)
single_sentences_list = self._get_single_sentences_list(text)

# having len(single_sentences_list) == 1 would cause the following
# np.percentile to fail.
Expand Down

0 comments on commit b3172d8

Please sign in to comment.