Skip to content

Commit

Permalink
Fix path to notebook in Colab button (#228)
Browse files Browse the repository at this point in the history
  • Loading branch information
miguelgrinberg authored Apr 30, 2024
1 parent 006c5ff commit 1351a40
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion notebooks/document-chunking/tokenization.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
"source": [
"# Calculating tokens for Semantic Search (ELSER and E5)\n",
"\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/search/tokenization.ipynb)\n",
"[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/elastic/elasticsearch-labs/blob/main/notebooks/document-chunking/tokenization.ipynb)\n",
"\n",
"Elasticsearch offers [semantic search](https://www.elastic.co/what-is/semantic-search) models, most notably [ELSER](https://www.elastic.co/guide/en/machine-learning/current/ml-nlp-elser.html) and [E5](https://www.elastic.co/search-labs/blog/articles/multilingual-vector-search-e5-embedding-model), to search through documents in a way that takes the text's meaning into account. Part of the semantic search process is breaking up texts into tokens (both for documents and for queries). Tokens are commonly thought of as words, but this is not completely accurate. Different semantic models use different concepts of tokens. Many treat punctuation separately and some break up compound words. For example ELSER (our English language model) uses the [`bert-base-uncased`](https://huggingface.co/bert-base-uncased) tokenizer.\n",
"\n",
Expand Down

0 comments on commit 1351a40

Please sign in to comment.