Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Dynamically scrape Ollama model names #14

Merged
merged 7 commits into from
Oct 31, 2023
Merged

Conversation

ericmjl
Copy link
Owner

@ericmjl ericmjl commented Oct 31, 2023

  • Added a function to dynamically scrape Ollama model names from the Ollama website.
  • The function uses BeautifulSoup and requests to parse the HTML and find the model names.
  • If the website cannot be reached, a static list of model names is used as a fallback.
  • The static list of model names is stored in a new file, ollama_model_names.txt.
  • The function is cached using lru_cache to improve performance.
  • Updated the create_model function to use the new dynamic function.
  • Added a new Jupyter notebook to test the scraping function.
  • Included the new text file in the MANIFEST.in file for distribution.
  • This change improves the flexibility and maintainability of the code by
    allowing it to adapt to changes in the Ollama model library.

- Added a function to dynamically scrape Ollama model names from the Ollama website.
- The function uses BeautifulSoup and requests to parse the HTML and find the model names.
- If the website cannot be reached, a static list of model names is used as a fallback.
- The static list of model names is stored in a new file, ollama_model_names.txt.
- The function is cached using lru_cache to improve performance.
- Updated the create_model function to use the new dynamic function.
- Added a new Jupyter notebook to test the scraping function.
- Included the new text file in the MANIFEST.in file for distribution.
- This change improves the flexibility and maintainability of the code by
  allowing it to adapt to changes in the Ollama model library.
This commit adds a newline at the end of the v0.0.86 release notes file.
This change is in line with the standard file formatting conventions.
This commit separates the 'Commit release notes' step from the
'Write release notes' step in the release-python-package workflow.
The 'pre-commit' package installation has been moved to the
'Commit release notes' step.
- Added beautifulsoup4, lxml, and requests to the environment.yml file.
  These packages are necessary for the automatic scraping of ollama models.
This commit adds the content.code.copy feature to the theme
configuration in mkdocs.yaml. This feature allows users to easily copy
code snippets from the documentation.
…for model names

The method ollama_model_keywords() in model_dispatcher.py has been refactored.
The dynamic scraping of model names from the Ollama website has been removed.
Instead, the model names are now read from a static text file distributed with the package.
This change simplifies the code and removes the dependency on the BeautifulSoup and requests libraries.
This commit introduces a new feature that automatically updates the list of Ollama models.
A new Python script has been added to the hooks in the pre-commit configuration file.
This script scrapes the Ollama AI library webpage to get the latest model names and writes them to a text file.
The dependencies required for this script are now specified in the pre-commit configuration file instead of the environment file.
@ericmjl
Copy link
Owner Author

ericmjl commented Oct 31, 2023

GitBot Summary of Changes

The pull request includes several changes:

  1. In the GitHub workflow file, the installation of pre-commit has been moved to a new step named "Commit release notes".

  2. A new pre-commit hook named "Autoupdate Ollama Models" has been added to the .pre-commit-config.yaml file. This hook runs a Python script that automatically updates the list of Ollama models.

  3. The file llamabot/bot/ollama_model_names.txt has been added to the MANIFEST.in file to be included in the distribution.

  4. The list of Ollama model keywords in the model_dispatcher.py file has been replaced with a function that reads the model names from the newly added file llamabot/bot/ollama_model_names.txt.

  5. A new feature "content.code.copy" has been added to the mkdocs.yaml file.

  6. A new Jupyter notebook file named scrape_ollama_models.ipynb has been added. This notebook contains code to scrape the Ollama model names from the Ollama website.

  7. A new Python script named autoupdate_ollama_models.py has been added. This script is similar to the Jupyter notebook but is intended to be run as a pre-commit hook. It updates the list of Ollama models in the llamabot/bot/ollama_model_names.txt file.

cc: @ericmjl, please check for correctness!

@ericmjl ericmjl merged commit a69eb73 into main Oct 31, 2023
@ericmjl ericmjl deleted the ollama-model-names branch October 31, 2023 13:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant