Implement bias detection system with detailed analysis and reporting #49

leonvanbokhorst · 2024-11-13T19:22:12Z

Summary by Sourcery

Implement a bias detection system to analyze text for cognitive biases and generate detailed reports. Document the findings in comprehensive and mini-analysis reports, and provide educational material on bias types and their impacts.

New Features:

Introduce a bias detection system that analyzes text for various cognitive biases, including confirmation, stereotypical, ingroup-outgroup, anchoring, and availability biases.

Documentation:

Add a comprehensive bias analysis report documenting the detection of biases in a specific text, including a summary of findings and detailed explanations for each bias type.
Create a mini-analysis report summarizing the key points of a press conference, highlighting identified biases and communication strategies.
Provide a general overview of bias, including definitions, types, impact areas, and mitigation strategies.

…ation - Add BiasDetector class for analyzing various cognitive biases in text - Implement async document analysis with chunking support - Add confidence scoring using embedding similarity - Support multiple bias types: confirmation, stereotypical, ingroup-outgroup, anchoring, and availability - Include comprehensive error handling and logging - Remove example narrative stories file Technical details: - Uses Ollama for LLM integration and embeddings - Implements cosine similarity for confidence scoring - Supports async processing of large documents - Preserves semantic boundaries in text chunking

sourcery-ai · 2024-11-13T19:22:16Z

Reviewer's Guide by Sourcery

This pull request introduces a bias detection system and analysis of political speech, focusing on identifying and analyzing different types of cognitive biases (confirmation, stereotypical, ingroup-outgroup, anchoring, and availability bias). The implementation includes a Python-based bias detector class that uses language models for analysis, along with documentation and example analysis reports.

Class diagram for Bias Detection System

classDiagram
    class BiasType {
        <<enumeration>>
        CONFIRMATION
        STEREOTYPICAL
        INGROUP_OUTGROUP
        ANCHORING
        AVAILABILITY
    }

    class BiasDetectionResult {
        BiasType bias_type
        float confidence
        string explanation
        List~string~ affected_segments
    }

    class BiasDetector {
        -string model_name
        -string embeddings
        -Dict~BiasType, string~ prompts
        +BiasDetector(string model_name, string embeddings_model_name)
        +Dict~BiasType, string~ _load_bias_prompts()
        +List~float~ get_embedding(string text)
        +List~BiasDetectionResult~ detect_bias(string text, List~BiasType~ bias_types)
        +float _calculate_confidence(List~float~ text_embedding, string explanation)
        +Dict~BiasType, List~BiasDetectionResult~~ analyze_document(Path file_path)
        +List~string~ _split_text(string text, int chunk_size)
        +void save_analysis_report(Dict~BiasType, List~BiasDetectionResult~ results, Path output_path)
    }

    BiasDetectionResult --> BiasType
    BiasDetector --> BiasDetectionResult
    BiasDetector --> BiasType

File-Level Changes

Change	Details	Files
Implemented a BiasDetector class for analyzing text for different types of cognitive biases	Created BiasType enum for different bias categories Implemented bias detection using language models and embeddings Added methods for calculating confidence scores using cosine similarity Created text chunking functionality for processing large documents	`src/bias_detection.py`
Added documentation explaining different types of bias and their characteristics	Defined fundamental concepts of bias Described various types of cognitive biases Explained social, cultural and statistical biases Outlined bias mitigation strategies	`docs/bias.md`
Created analysis reports and examples demonstrating the bias detection system	Generated detailed bias analysis report with confidence scores Created mini-analysis of political speech focusing on key biases Added example text files for analysis	`docs/schoof_analysis_report.md` `docs/mini-schoof-analysis.md` `docs/schoof.txt` `docs/mini-schoof.txt` `docs/mini-schoof-results.txt` `docs/faber.txt`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time. You can also use
this command to specify where the summary should be inserted.

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey @leonvanbokhorst - I've reviewed your changes - here's some feedback:

Overall Comments:

Consider adding more robust error handling and validation around the model calls and JSON parsing to handle potential API failures gracefully.
The text chunking could be improved by using a proper NLP tokenizer instead of basic string splitting to better handle complex sentence structures.
The confidence calculation using cosine similarity is quite basic - consider implementing more sophisticated metrics or ensemble methods for bias detection confidence scoring.

Here's what I looked at during the review

🟡 General issues: 3 issues found
🟢 Security: all looks good
🟢 Testing: all looks good
🟡 Complexity: 1 issue found
🟡 Documentation: 1 issue found

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

sourcery-ai · 2024-11-13T19:23:36Z

src/bias_detection.py

+        response = ollama.embeddings(model=self.model_name, prompt=text)
+        return response["embedding"]
+
+    async def detect_bias(


issue (performance): The method is marked async but makes blocking calls to ollama.generate and ollama.embeddings

Consider using async versions of these calls to prevent blocking the event loop. This is especially important when processing multiple chunks of text.

sourcery-ai · 2024-11-13T19:23:36Z

src/bias_detection.py

+
+        return results
+
+    def _calculate_confidence(


suggestion (performance): Multiple separate embedding calls could be optimized by batching or caching

Consider caching the embeddings or batching the calls to reduce API usage and improve performance, especially when processing multiple chunks of text.

@lru_cache(maxsize=1024) def _calculate_confidence( self, text_embedding: Tuple[float, ...], explanation: str ) -> float:

sourcery-ai · 2024-11-13T19:23:36Z

src/bias_detection.py

+
+        for paragraph in paragraphs:
+            # Split paragraph into sentences (basic splitting)
+            sentences = [


issue: Basic sentence splitting might miss common abbreviations

Consider using a proper sentence tokenizer (like nltk.sent_tokenize) to handle cases with abbreviations like 'Mr.', 'Dr.', etc.

sourcery-ai · 2024-11-13T19:23:36Z

docs/schoof_analysis_report.md

+- Schoof's response directly contradicts this assumption by stating that antisemitism is an issue for all of Netherlands and needs to be discussed across various sectors, including conversations with Jewish organizations.
+- The author does not seem to acknowledge or address Schoof's answer which goes against their preconceived notion.
+
+### Instance (Confidence: 0.70)


suggestion (documentation): Consider adding explanation of confidence scores

The document uses confidence scores throughout but never explains what they mean or how they're calculated. This context would be valuable for readers.

Suggested change

### Instance (Confidence: 0.70)

### Instance (Confidence: 0.70 - scores range from 0.0 to 1.0, indicating analysis certainty)

sourcery-ai · 2024-11-13T19:23:36Z

src/bias_detection.py

+
+        return [chunk for chunk in chunks if chunk.strip()]  # Remove empty chunks
+
+    def save_analysis_report(


issue (complexity): Consider extracting report generation functionality into a separate class with dedicated responsibilities.

The report generation logic should be extracted to a separate class to improve maintainability and reusability. This would also make the BiasDetector class more focused on its core responsibility. Here's a suggested refactor:

@dataclass class BiasAnalysisReport: results: Dict[BiasType, List[BiasDetectionResult]] class BiasReportGenerator: def generate_markdown(self, analysis: BiasAnalysisReport, output_path: Path) -> None: total_instances = sum(len(bias_results) for bias_results in analysis.results.values()) avg_confidences = { bias_type: np.mean([r.confidence for r in bias_results]) for bias_type, bias_results in analysis.results.items() } # Generate report content (existing logic) report = self._generate_report_content(total_instances, avg_confidences, analysis.results) # Save report output_path.parent.mkdir(parents=True, exist_ok=True) output_path.write_text("\n".join(report)) class BiasDetector: def save_analysis_report(self, results: Dict[BiasType, List[BiasDetectionResult]], output_path: Path) -> None: report = BiasAnalysisReport(results=results) generator = BiasReportGenerator() generator.generate_markdown(report, output_path)

This change:

Encapsulates report generation in a dedicated class

Makes it easier to add new report formats

Simplifies testing of report generation

Reduces the responsibilities of BiasDetector

sourcery-ai · 2024-11-13T19:23:36Z

src/bias_detection.py

+
+        try:
+            # Get embedding for the explanation
+            explanation_embedding = self.get_embedding(explanation)


issue (code-quality): Extract code out into method (extract-method)

leonvanbokhorst added 2 commits November 13, 2024 19:42

Refactor bias detection system and improve analysis report generation

d069f84

leonvanbokhorst changed the title ~~Bias-detection~~ @sourcery-ai Nov 13, 2024

sourcery-ai bot changed the title ~~@sourcery-ai~~ Implement bias detection system with detailed analysis and reporting Nov 13, 2024

leonvanbokhorst merged commit ec00177 into main Nov 13, 2024
1 check failed

leonvanbokhorst deleted the bias-detection branch November 13, 2024 19:23

sourcery-ai bot approved these changes Nov 13, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement bias detection system with detailed analysis and reporting #49

Implement bias detection system with detailed analysis and reporting #49

leonvanbokhorst commented Nov 13, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 13, 2024 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

sourcery-ai bot Nov 13, 2024

sourcery-ai bot Nov 13, 2024

sourcery-ai bot Nov 13, 2024

sourcery-ai bot Nov 13, 2024

sourcery-ai bot Nov 13, 2024

sourcery-ai bot Nov 13, 2024

	### Instance (Confidence: 0.70)
	### Instance (Confidence: 0.70 - scores range from 0.0 to 1.0, indicating analysis certainty)


		return [chunk for chunk in chunks if chunk.strip()] # Remove empty chunks

		def save_analysis_report(

Implement bias detection system with detailed analysis and reporting #49

Implement bias detection system with detailed analysis and reporting #49

Conversation

leonvanbokhorst commented Nov 13, 2024 • edited by sourcery-ai bot Loading

Summary by Sourcery

sourcery-ai bot commented Nov 13, 2024 • edited Loading

Reviewer's Guide by Sourcery

Class diagram for Bias Detection System

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

sourcery-ai bot left a comment

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

sourcery-ai bot Nov 13, 2024

Choose a reason for hiding this comment

leonvanbokhorst commented Nov 13, 2024 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Nov 13, 2024 •

edited

Loading