Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make SchemaScanner Database-Agnostic #2507

Open
tnaum-ms opened this issue Dec 4, 2024 · 0 comments
Open

Make SchemaScanner Database-Agnostic #2507

tnaum-ms opened this issue Dec 4, 2024 · 0 comments
Assignees
Milestone

Comments

@tnaum-ms
Copy link
Collaborator

tnaum-ms commented Dec 4, 2024

Feature Description: Make SchemaScanner Database-Agnostic

The goal is to extend the functionality of SchemaScanner to work seamlessly across various NoSQL databases, not just MongoDB. This enhancement will ensure that the SchemaScanner can analyze and interpret schemas for a broader range of NoSQL databases with minimal dependency on database-specific features or formats. Additionally, we aim to explore exposing the SchemaScanner as a standalone package and/or repository to increase its usability and adoption across projects.

What is SchemaScanner?

SchemaScanner is a tool designed to analyze provided JSON documents and compute a unified, non-strict JSON schema. Its functionality includes:

  • Field and Datatype Analysis: Identifies fields within documents and infers their data types.
  • Simple Statistics: Provides basic statistics such as field occurrence and data distribution patterns.
  • Schema Insights: Offers a unified view of the structure, which is useful for understanding document schemas, analyzing data, or providing schema validation and document autocompletion in various applications.

Objectives

  • Refactor SchemaScanner to decouple it from MongoDB-specific implementations.
  • Introduce a modular architecture where database-specific adapters or plugins can be used for various NoSQL databases (e.g., CosmosDB NoSQL API).
  • Ensure compatibility with common NoSQL schema structures such as key-value pairs, document-oriented data, and wide-column stores.
  • Potentially expose SchemaScanner as a new package and/or repository for broader adoption and ease of integration.

Scope

  • Replace BSON-specific logic with a generic data parsing mechanism.
  • Design an abstraction layer to handle database-specific differences (e.g., data types, metadata retrieval).
  • Provide default implementations for popular NoSQL databases while keeping the core SchemaScanner flexible for future database support.
  • Explore: Package the SchemaScanner into a reusable and independent tool, with clear documentation and examples.
@tnaum-ms tnaum-ms self-assigned this Dec 4, 2024
@tnaum-ms tnaum-ms changed the title Mache SchemaScanner Database-Agnostic Make SchemaScanner Database-Agnostic Dec 4, 2024
@tnaum-ms tnaum-ms added this to the Backlog milestone Dec 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant