Skip to content

Commit

Permalink
About the demo
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel-Robbins committed Dec 12, 2023
1 parent 550c92b commit de045da
Showing 1 changed file with 34 additions and 4 deletions.
38 changes: 34 additions & 4 deletions examples/chDB_vector_search.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,18 +11,48 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"# Recommendation systems these years\n",
"\n",
"# Briefing about Word2Vec:\n",
"The recommendation system has made several major advancements over the past 10 years:\n",
"\n",
"1. 2009-2015: LR (Logistic Regression) combined with sophisticated feature engineering defeated SVM and collaborative filtering, which were algorithms of the previous generation.\n",
"1. 2012-2015: NN (Neural Networks) changed the CV (Computer Vision) and NLP (Natural Language Processing) industries, then returned to the recommendation system field, greatly reducing the importance of traditional skill in feature combination.\n",
"1. 2013: Embedding was taken out from Google's archives and later developed into techniques like Item2vec, sparking a trend in mining user behavior.\n",
"1. 2015-2016: Wide & Deep inspired \"grafting\" NN with various old models.\n",
"1. 2016-2017: Experienced a strong counterattack from tree models such as XGBoost and LightGBM that were fast, good, and efficient.\n",
"1. 2017: Transformer became popularized to the point where \"Attention Is All You Need.\"\n",
"1. 2018-now: Mainly focused on deep exploration of features, especially user features. Representatively famous is DIEN."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# About this demo\n",
"\n",
"Item2vec technology is developed based on Word2vec. Its core idea is to treat the user's historical behavior sequence as a sentence, and then train the vector representation of each item through Word2vec. Finally, item recommendations are made based on the similarity of item vectors. The core of Item2vec technology is to treat the user's historical behavior sequence as a sentence, and then train the vector representation of each item through Word2vec. Finally, item recommendations are made based on the similarity of item vectors.\n",
"\n",
"The main purpose of this demo is to demonstrate how to train the vector representation of items using Word2vec and make item recommendations based on the similarity of item vectors. It mainly consists of 2 parts:\n",
"1. Prepare item sequences based on user behavior.\n",
"2. Train a CBOW model using the Word2Vec module of the gensim library.\n",
"3. Extract all embedding data and write it to chDB.\n",
"4. Perform queries on chDB based on cosine distance to find similar movies to the input movie."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n",
"# Briefing about Word2Vec\n",
"\n",
"Word2Vec was introduced in two papers by a team of researchers at Google, published between September and October 2013. Alongside the papers, the researchers released their implementation in C. The Python implementation followed shortly after the first paper, courtesy of Gensim.\n",
"\n",
"The fundamental premise of Word2Vec is that words with similar contexts also have similar meanings and consequently share a comparable vector representation within the model. For example, \"dog,\" \"puppy,\" and \"pup\" are frequently used in analogous situations with similar surrounding words like \"good,\" \"fluffy,\" or \"cute.\" According to Word2Vec, they will thus possess a corresponding vector representation.\n",
"\n",
"Based on this assumption, Word2Vec can be utilized to discover relationships between words in a dataset, calculate their similarity, or employ the vector representation of these words as input for other applications such as text classification or clustering.\n",
"\n",
"<img src=\"https://mccormickml.com/assets/word2vec/skip_gram_net_arch.png\" alt=\"Word2Vec\" style=\"max-width:800px\">\n",
"\n",
"\n"
"<img src=\"https://mccormickml.com/assets/word2vec/skip_gram_net_arch.png\" alt=\"Word2Vec\" style=\"max-width:800px\">"
]
},
{
Expand Down

0 comments on commit de045da

Please sign in to comment.