Skip to content

Releases: enjalot/latent-scope

SAE Features

20 Dec 16:09
Compare
Choose a tag to compare

If you've embeded a dataset with nomic-embed-text-v1.5 you can "process SAE" in the embed step.
This will then annotate each row with SAE features from https://enjalot.github.io/latent-taxonomy/articles/about

You can then explore essentially the concepts that the embedding model uses to represent each data point.
You can also filter by a particular SAE feature to see which rows strongly activate for that concept.

Screenshot 2024-12-20 at 11 05 43 AM Screenshot 2024-12-20 at 11 05 51 AM Screenshot 2024-12-20 at 11 06 19 AM

0.5: UI Revamp

20 Dec 15:58
Compare
Choose a tag to compare

Numerous updates to Explore and Setup pages with many contributions from @jzhang621

Highlights:

  • Explore page redesign with better filtering UX and more screen real estate for the map
  • Setup process redesign, step-by-step
  • More options for cluster labeling from huggingface, ollama or custom URLs
  • Starting to support Sparse Autoencoder features

See closed milestone issues:
https://github.com/enjalot/latent-scope/issues?q=is%3Aissue+milestone%3A0.5+is%3Aclosed

Some more writing on the changes here:
https://enjalot.substack.com/p/hidden-states-and-latent-scope-05

Table improvements & embedding visualization

26 Jul 17:38
Compare
Choose a tag to compare

This release fixes a few bugs with the Explore page table UI and nearest neighbor search, making it much more reliable and performant.

Thank you to @hydrosquall for issues & PRs! #49 #50 #52

A new experimental feature for directly visualizing embeddings in the table is ready to try:
Screenshot 2024-07-26 at 1 38 08 PM

Use any Sentence Transformer from HuggingFace

23 Jul 19:40
Compare
Choose a tag to compare

This release adopts sentence transformers for embedding using local open source models downloaded automatically from HuggingFace hub.

It also keeps track of recently used models and brings it all together in a much improved selector component on the frontend.

Screenshot 2024-07-23 at 3 28 57 PM

Also includes a PR from @hydrosquall that fixed a bug using truncated embeddings in the nearest neighbor search.

One minor note: for now truncating of sentence transformers isn't supported as we don't have a way to tell if the model supports it arbitrarily. We could maintain a list of matroyshka enabled models separately.

export interactive plots

05 Jul 15:53
Compare
Choose a tag to compare

Export interactive DataMapPlots optionally instead of static thanks to @dhruv-anand-aintech

Fixes an unpinned dependency breaking transformers models

Export static plots

21 Jun 17:27
Compare
Choose a tag to compare

Implements #23, creating a UI to easily export static plots using datamapplot

Support more filetype inputs thanks to #40
Support proxy servers / alternate OpenAI compatible endpoints #44

The requirements.txt has been loosened so Python 3.12 should be enabled and more updated versions of some important pip modules will be installed

new models

15 May 13:54
Compare
Choose a tag to compare

Improve setup flow

13 May 13:42
Compare
Choose a tag to compare

Minor improvements to the setup flow

Refined data export

05 May 23:47
Compare
Choose a tag to compare

Creating a scope now also creates a combined parquet of the input data and the scope annotations.

This makes loading curated scopes much easier in other workflows

0.2.0 Explore Overhaul

01 May 22:06
Compare
Choose a tag to compare

This release makes a number of improvements to the exploring and curation part of Latent Scope. You can now filter a number of ways from a unified interface and perform bulk actions on the filtered points.

The following issues were closed:

  • #12 guide for setup page
  • #11 guide for explore page
  • #19 filtering by dataset column

This wasn't closed, but now we can show images in the data table if there is an image url:

  • #24 showing images

Improved documentation and a number of guides have been published to https://enjalot.github.io/latent-scope/