Sentence complete classifier #389

LeonardPuettmannKern · 2023-10-19T17:45:07Z

refinery

Tested by creator on refinery
Tested by reviewer on refinery
Ensured that output of brick conforms with refinery structure (to be checked by reviewer)

API

Tested by creator on localhost:8000/docs
Tested by reviewer on localhost:8000/docs

common code

Common code tested in notebook/ script by creator
Common code tested in notebook/ script by reviewer
Common code contains docstrings and type hints

additional points:

Docstring and README is existing
Import statements (in __init__.py)
(If necessary) Added dependency to requirements.txt
(If necessary) Added dependency to issue for refinery env here
Published brick to Strapi CMS (locally)

@FelixKirschKern Not sure at all about the logic behind this brick, would love to get your feedback. Classifying whether or not a sentence is complete is not easy, but I think it's alright to check if a sentence already has some features we would expect in a usual sentence (Uppercase character in the beginning, end with a punctuation and contains nouns and a verb). This will of course miss some sentences. I also expect that the input will rarely just be one sentence, but multiple sentences in a text that might just be cut off at the end due to chunking. There might be better ways for for the aggregation part, too.

FelixKirschKern · 2023-10-23T08:27:59Z

classifiers/reference_quality/sentence_complete_classifier/README.md

@@ -0,0 +1 @@
+Languages can be very dynamic and complicated. This brick does not actually try to be able to accurately classify all sentences, which would be quite complex. Instead, this brick is meant to check if some characteristics apply that a lot of complete sentences have. These characteristics being: does the sentence starts with an uppercase character, if it ends on a punctuation and if it contains at least two nouns and a verb. The name `starts_with_uppercase_ends_with_punctuation_and_contains_two_nouns_and_a_verb` would be a bit long for a brick, though.


From the name and description of the brick, I would not expect the aggregation logic.
I suggest mentioning it in the README

FelixKirschKern · 2023-10-23T08:55:19Z

classifiers/reference_quality/sentence_complete_classifier/config.py

+        cognition_init_mapping = {
+            "incomplete": "Needs fix",
+            "complete": "null"
+        },


the mapping for partly complete is missing

FelixKirschKern · 2023-10-23T08:57:51Z

classifiers/reference_quality/sentence_complete_classifier/config.py

+            "incomplete": "Needs fix",
+            "complete": "null"
+        },
+        integrator_inputs={


outputs are missing

FelixKirschKern · 2023-10-23T09:23:02Z

classifiers/reference_quality/sentence_complete_classifier/__init__.py

+    classifications = []
+    for sent in doc.sents:
+        if sent[0].is_title and sent[-1].is_punct:
+            has_noun = 2
+            has_verb = 1
+            for token in sent:
+                if token.pos_ in ["NOUN", "PROPN", "PRON"]:
+                    has_noun -= 1
+                elif token.pos_ == "VERB":
+                    has_verb -= 1
+            if has_noun < 1 and has_verb < 1:
+                classifications.append("complete")
+            else:
+                classifications.append("incomplete")
+        else:
+            classifications.append("incomplete")


What do you think of the following restructuring of the code?

classifications = [] for sent in doc.sents: if not (sent[0].is_title and sent[-1].is_punct): classification.append("incomplete") continue has_noun = 2 has_verb = 1 for token in sent: if token.pos_ in ["NOUN", "PROPN", "PRON"]: has_noun -= 1 elif token.pos_ == "VERB": has_verb -= 1 if has_noun < 1 and has_verb < 1: classifications.append("complete") continue classifications.append("incomplete")

another option could also be to encapsulate the per sentence classification in a function, and call this via a list comprehension

LeonardPuettmannKern added 2 commits October 19, 2023 17:08

Inital version for sentence complete clf

485be62

Added some aggregation logic

935a27f

LeonardPuettmannKern requested a review from FelixKirschKern October 19, 2023 17:45

Fixed typo

511c6c5

FelixKirschKern marked this pull request as ready for review October 23, 2023 08:24

FelixKirschKern reviewed Oct 23, 2023

View reviewed changes

solves merge conflicts

a9db596

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sentence complete classifier #389

Sentence complete classifier #389

Uh oh!

LeonardPuettmannKern commented Oct 19, 2023

Uh oh!

FelixKirschKern Oct 23, 2023

Uh oh!

FelixKirschKern Oct 23, 2023

Uh oh!

FelixKirschKern Oct 23, 2023

Uh oh!

FelixKirschKern Oct 23, 2023

Uh oh!

Uh oh!

		@@ -0,0 +1 @@
		Languages can be very dynamic and complicated. This brick does not actually try to be able to accurately classify all sentences, which would be quite complex. Instead, this brick is meant to check if some characteristics apply that a lot of complete sentences have. These characteristics being: does the sentence starts with an uppercase character, if it ends on a punctuation and if it contains at least two nouns and a verb. The name `starts_with_uppercase_ends_with_punctuation_and_contains_two_nouns_and_a_verb` would be a bit long for a brick, though.

Sentence complete classifier #389

Are you sure you want to change the base?

Sentence complete classifier #389

Uh oh!

Conversation

LeonardPuettmannKern commented Oct 19, 2023

Uh oh!

FelixKirschKern Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

FelixKirschKern Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

FelixKirschKern Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

FelixKirschKern Oct 23, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!