Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DX] Create objects from other objects #1177

Open
CShorten opened this issue Jul 16, 2024 · 2 comments
Open

[DX] Create objects from other objects #1177

CShorten opened this issue Jul 16, 2024 · 2 comments

Comments

@CShorten
Copy link
Member

What

Say we have a List[str] property in one Weaviate collection such as chunks, or a JSON property, we then want an API to populate another collection with each string value, potentially inheriting other properties of the collection as well.

Why

We believe one of the killer use cases of GFLs is for an LLM to chunk long documents such as PDFs into chunks and metadata descriptions, thus we have a JSON property that stores the list of chunks and metadata strings per entry. It would be great to have an API that flows this from say "WeaviateBlogPosts" --> "WeaviateBlogChunks"

How

weaviate_blog_posts.data.transfer(
  to_collection="WeaviateBlogChunks",
  split_properties="ChunkAndMetadataJSON",
  inherit_properties=["title", "author", "date_published"],
  add_cref=true,
  uuids=uuids
)

Assuming ChunkAndMetadataJSON is populated with a GFL such as:

weaviate_blog_posts.data.gfl.update(
  instruction="Please break up this markdown file into semantic chunks with metadata further description their context in the original document",
  view_properties=["content"],
  on_property=["ChunkAndMetadataJSON"],
  uuids=uuids
)

^ We still need to figure out how we can interface composite types like this to the GFL. So alternately this could be a List[ChunkWithMetadata] type.

@tsmith023
Copy link
Contributor

This was a use-case brought to me by @jfrancoa also since it is a frequent journey to be able to migrate collections either within or between instances. Developing this during the next sprint would be a good idea, I think!

@CShorten
Copy link
Member Author

Awesome!! Super happy to hear it, thanks @tsmith023!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants