Skip to content

Releases: ddangelov/Top2Vec

updated code documentation

15 Oct 21:42
Compare
Choose a tag to compare
1.0.13

update Top2Vec version

added pre-trained universal sentence encoder and BERT sentence transformer options

15 Oct 19:56
6fa6cf9
Compare
Choose a tag to compare

Top2Vec now has an option to choose the embedding model with doc2vec, universal-sentence-encoder, universal-sentence-encoder-multilingual, and distiluse-base-multilingual-cased as the options.

A get_documents_topics method was added.

added delete_documents methods and bug fixes

08 Oct 22:05
Compare
Choose a tag to compare

Added a method for deleting documents from model.

Fixed bug when using corpus_file that resulted in documents getting dropped. Fixed bug when using add_documents and delete_documents which resulted in improper ordering of topic words.

UMAP install bug fix

29 Aug 17:04
2c27b9a
Compare
Choose a tag to compare

There was an issue with UMAP install due to a missing comma in the setup.py file, this has been fixed. An optional min_count parameter has been added, the default is still 50. All words with total frequency lower min_count are ignored by the model.

Hierarchical Topic Reduction

26 Jun 21:35
Compare
Choose a tag to compare

Added functionality to perform hierarchical topic reduction. Added the ability to add new documents to an already trained model. Added use_corpus option which may lead to faster training with very large datasets in multi-worker environments.

Custom document ids, tokenizer input, option to save documents

18 Apr 15:09
Compare
Choose a tag to compare

Added option for custom document ids, these can be string or int. Option to not save documents in model, this allows for the trained model to be used as an index and for saved models to be smaller in size. Ability to pass in a custom tokenizer that will override the default. Verbose mode that will log status of training. Also added the ability to search documents by multiple documents, positive and negative semantic search.

Topic size and deduplication

07 Apr 20:09
Compare
Choose a tag to compare

Topic size is defined as the number of document vectors which have the topic as its nearest topic vector. Search by topic has been modified to only show documents who have the topic as its nearest topic, in order to avoid overlapping results from similar topics.

Topic deduplication is added to make topics more robust.

First Release

25 Mar 22:06
4c5926f
Compare
Choose a tag to compare

Top2Vec initial release.