AI Assited Web Scraper

Built with Streamlit & Powered by Ollama

ABOUT:

This webscraper uses the Ollama LLM to assist in web scraping. It prompts the user to supply a URL, and a description of they want scraped from the specified website. These instructions are then read by the LLM and parsed by the python code, and the results are returned to the user.

First:

You will need to download the Chrome Driver

and palce it in the same directory as the python files/scripts

--> https://googlechromelabs.github.io/chrome-for-testing/#stable

Installation and use:

Clone the repository & Install the requirments

pip3 install -r requirements.txt

Then, in the command line, in the /AIWebScraper directory run:

streamlit run main.py

Quickstart

To run and chat with Llama 3.1:

! However, `deepseek-coder-v2` is very impressive

And, You can select it by running the included install script

* See the Shell Script instructions below.

To pull a model do:

ollama pull llama3.1

- Then run it interactively on the command line.

ollama run llama3.1

- You can also use this shell script to automate selecting and pulling Ollama Models

chmod +x install_ollama.sh
./install_ollama.sh

- Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

- The following information is from Ollama's GitHub and is relevant to it's use in this web scraper

- Model library

- Ollama supports a list of models available on ollama.com/library

- Here are some example models that can be downloaded:

Model	Parameters	Size	Download
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Llama 3.1	70B	40GB	`ollama run llama3.1:70b`
Llama 3.1	405B	231GB	`ollama run llama3.1:405b`
Phi 3 Mini	3.8B	2.3GB	`ollama run phi3`
Phi 3 Medium	14B	7.9GB	`ollama run phi3:medium`
Gemma 2	2B	1.6GB	`ollama run gemma2:2b`
Gemma 2	9B	5.5GB	`ollama run gemma2`
Gemma 2	27B	16GB	`ollama run gemma2:27b`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Starling	7B	4.1GB	`ollama run starling-lm`
Code Llama	7B	3.8GB	`ollama run codellama`
Llama 2 Uncensored	7B	3.8GB	`ollama run llama2-uncensored`
LLaVA	7B	4.5GB	`ollama run llava`
Solar	10.7B	6.1GB	`ollama run solar`

Note

You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models.

Customize a model

Import from GGUF

Ollama supports importing GGUF models in the Modelfile:

Create a file named Modelfile, with a FROM instruction with the local filepath to the model you want to import.
```
FROM ./vicuna-33b.Q4_0.gguf
```
Create the model in Ollama
```
ollama create example -f Modelfile
```
Run the model
```
ollama run example
```

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
README.md		README.md
TODO		TODO
install_ollama.sh		install_ollama.sh
main.py		main.py
parse.py		parse.py
requirements.txt		requirements.txt
scrape.py		scrape.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Assited Web Scraper

Built with Streamlit & Powered by Ollama

First:

You will need to download the Chrome Driver

and palce it in the same directory as the python files/scripts

--> https://googlechromelabs.github.io/chrome-for-testing/#stable

Installation and use:

Quickstart

To run and chat with Llama 3.1:

! However, `deepseek-coder-v2` is very impressive

And, You can select it by running the included install script

* See the Shell Script instructions below.

- Then run it interactively on the command line.

- You can also use this shell script to automate selecting and pulling Ollama Models

- Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

- The following information is from Ollama's GitHub and is relevant to it's use in this web scraper

- Model library

Customize a model

Import from GGUF

About

Uh oh!

Releases

Packages

Uh oh!

Languages

LinuxUser255/AIWebScraper

Folders and files

Latest commit

History

Repository files navigation

AI Assited Web Scraper

Built with Streamlit & Powered by Ollama

First:

You will need to download the Chrome Driver

and palce it in the same directory as the python files/scripts

--> https://googlechromelabs.github.io/chrome-for-testing/#stable

Installation and use:

Quickstart

To run and chat with Llama 3.1:

! However, deepseek-coder-v2 is very impressive

And, You can select it by running the included install script

* See the Shell Script instructions below.

- Then run it interactively on the command line.

- You can also use this shell script to automate selecting and pulling Ollama Models

- Before pulling a model number, keep in mind the memory constraints and limitations of your machine. Check the table below

- The following information is from Ollama's GitHub and is relevant to it's use in this web scraper

- Model library

Customize a model

Import from GGUF

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

! However, `deepseek-coder-v2` is very impressive

Packages