Releases · ddangelov/Top2Vec

15 Oct 21:42

1.0.13

12b68ac

1.0.13

update Top2Vec version

Assets 2

15 Oct 19:56

ddangelov

1.0.12

6fa6cf9

added pre-trained universal sentence encoder and BERT sentence transformer options

Top2Vec now has an option to choose the embedding model with doc2vec, universal-sentence-encoder, universal-sentence-encoder-multilingual, and distiluse-base-multilingual-cased as the options.

A get_documents_topics method was added.

Assets 2

08 Oct 22:05

ddangelov

1.0.11

a5e9fc4

added delete_documents methods and bug fixes

Added a method for deleting documents from model.

Fixed bug when using corpus_file that resulted in documents getting dropped. Fixed bug when using add_documents and delete_documents which resulted in improper ordering of topic words.

Assets 2

29 Aug 17:04

ddangelov

1.0.10

2c27b9a

UMAP install bug fix

There was an issue with UMAP install due to a missing comma in the setup.py file, this has been fixed. An optional min_count parameter has been added, the default is still 50. All words with total frequency lower min_count are ignored by the model.

Assets 2

26 Jun 21:35

ddangelov

1.0.9

2484e4a

Hierarchical Topic Reduction

Added functionality to perform hierarchical topic reduction. Added the ability to add new documents to an already trained model. Added use_corpus option which may lead to faster training with very large datasets in multi-worker environments.

Assets 2

18 Apr 15:09

ddangelov

1.0.8

e0b5e7a

Custom document ids, tokenizer input, option to save documents

Added option for custom document ids, these can be string or int. Option to not save documents in model, this allows for the trained model to be used as an index and for saved models to be smaller in size. Ability to pass in a custom tokenizer that will override the default. Verbose mode that will log status of training. Also added the ability to search documents by multiple documents, positive and negative semantic search.

Assets 2

07 Apr 20:09

ddangelov

1.0.7

bf601b4

Topic size and deduplication

Topic size is defined as the number of document vectors which have the topic as its nearest topic vector. Search by topic has been modified to only show documents who have the topic as its nearest topic, in order to avoid overlapping results from similar topics.

Topic deduplication is added to make topics more robust.

Assets 2

25 Mar 22:06

ddangelov

1.0.6

4c5926f

First Release

Top2Vec initial release.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: ddangelov/Top2Vec

updated code documentation

added pre-trained universal sentence encoder and BERT sentence transformer options

added delete_documents methods and bug fixes

UMAP install bug fix

Hierarchical Topic Reduction

Custom document ids, tokenizer input, option to save documents

Topic size and deduplication

First Release