-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: explore the RAG technique, and methods to retain chat history #21
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cindyli I want to take a closer look at the ollama parts to compare it with my demo app so I have not finished my review. But I ran into a couple of issues with the rag.py
script and thought I should comment on that for the time being.
@@ -0,0 +1,71 @@ | |||
# Experiment with Retrieval-Augumented Generation (RAG) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is a python project, is there a minimum version of Python and Pip that are required? Its likely at least Python 3.8.17, and Pip 23.1.2 -- that's what I have at the moment, and it works. Are there version restrictions based on the packages in requirements.txt
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python verion >= 3.8.1 should work fine with the latest langchain-community module. Even if the python version is lower than that, an older version of langchain should be installed. My laptop uses Python 3.11.2 and pip 23.2.1.
|
||
### Run Scripts | ||
* Run `rag.py` with a parameter providing the path to the directory of a sentence transformer model | ||
- `python rag.py ./all-MiniLM-L6-v2/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This error occurs:
OSError: Error no file named pytorch_model.bin, model.safetensors, tf_model.h5,
model.ckpt.index or flax_model.msgpack found in directory ../../all-MiniLM-L6-v2/.
In case the location of the all-MiniLM-L6-v2
folder mattered, I moved it into the same directory as rag.py
, but the same error occurred.
I also tried following the all-MiniLM-L6-v2 README which advises to execute pip install -U sentence-transformers
. But, that didn't help either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in that there are a couple of error messages from the git clone ...
at step 2 of RAG.md
. It took a while to figure out, but the root cause was that git-lfs
was missing from my system. Here is the output from the git clone ...
command at step 2 of RAG.md
:
Cloning into 'all-MiniLM-L6-v2'...
remote: Enumerating objects: 61, done.
remote: Counting objects: 100% (61/61), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 61 (delta 22), reused 54 (delta 19), pack-reused 0 (from 0)
Unpacking objects: 100% (61/61), 316.23 KiB | 2.82 MiB/s, done.
git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
I first tried to figure out why the "Clone succeeded, but the checkout failed." and to either use git restore
as the error message suggested, or git reset
the pending deletions. Regarding the latter, the result of the suggested git status
shows that a number of files have been deleted:
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: .gitattributes
deleted: 1_Pooling/config.json
deleted: README.md
deleted: config.json
deleted: config_sentence_transformers.json
deleted: data_config.json
deleted: model.safetensors
deleted: modules.json
deleted: onnx/model.onnx
deleted: pytorch_model.bin
deleted: rust_model.ot
deleted: sentence_bert_config.json
deleted: special_tokens_map.json
deleted: tf_model.h5
deleted: tokenizer.json
deleted: tokenizer_config.json
deleted: train_script.py
deleted: vocab.txt
However, I noticed an earlier error: git-lfs: command not found
. That's the real reason that the clone did not succeed and left the repository in an odd state. Once I installed it, the script worked.
I think there needs to be a note or warning that before running git clone ...
, that git-lfs
is required, something like:
- Download the model
- Make sure that your system has the
git-lfs
command installed. See Git Large File Storage
for instructions.- Download the selected model to a local directory. For example, to download the all-MiniLM-L6-v2 model, use the following command: ...
docs/RAG.md
Outdated
### Run Scripts | ||
* Run `rag.py` with a parameter providing the path to the directory of a sentence transformer model | ||
- `python rag.py ./all-MiniLM-L6-v2/` | ||
- The last two responses in the exectution result shows the language model's output |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Misspelled "exectution".
jobs/RAG/rag.py
Outdated
|
||
loader = TextLoader(user_doc) | ||
documents = loader.load() | ||
# print(f"Loaded documents (first 2 rows):\n{documents[:2]}\n\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above in chat_history_with_summary.py
, another commented out print()
-- for debugging?
jobs/RAG/rag.py
Outdated
|
||
text_splitter = CharacterTextSplitter(chunk_size=200, chunk_overlap=0) | ||
splitted_docs = text_splitter.split_documents(documents) | ||
# print(f"Splitted documents (first 2 rows):\n{splitted_docs[:2]}\n\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above; for debugging?
from langchain_core.prompts import ChatPromptTemplate | ||
|
||
# from langchain_core.globals import set_debug | ||
# set_debug(True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these statements commented out because it's actually for debugging? If so, you probably want to delete them.
All catches on typos have been fix. Other than that, when I ran |
docs/RAG.md
Outdated
**Note:** Accessing a local sentence transformer model is much faster than accessing it via the | ||
`sentence-transformers` Python package. | ||
|
||
### Create/Activitate Virtual Environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spelling error:
"Activitate", change to
"Activate"
|
||
### Run Scripts | ||
* Run `rag.py` with a parameter providing the path to the directory of a sentence transformer model | ||
- `python rag.py ./all-MiniLM-L6-v2/` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are correct in that there are a couple of error messages from the git clone ...
at step 2 of RAG.md
. It took a while to figure out, but the root cause was that git-lfs
was missing from my system. Here is the output from the git clone ...
command at step 2 of RAG.md
:
Cloning into 'all-MiniLM-L6-v2'...
remote: Enumerating objects: 61, done.
remote: Counting objects: 100% (61/61), done.
remote: Compressing objects: 100% (39/39), done.
remote: Total 61 (delta 22), reused 54 (delta 19), pack-reused 0 (from 0)
Unpacking objects: 100% (61/61), 316.23 KiB | 2.82 MiB/s, done.
git-lfs filter-process: git-lfs: command not found
fatal: the remote end hung up unexpectedly
warning: Clone succeeded, but checkout failed.
You can inspect what was checked out with 'git status'
and retry with 'git restore --source=HEAD :/'
I first tried to figure out why the "Clone succeeded, but the checkout failed." and to either use git restore
as the error message suggested, or git reset
the pending deletions. Regarding the latter, the result of the suggested git status
shows that a number of files have been deleted:
On branch main
Your branch is up to date with 'origin/main'.
Changes to be committed:
(use "git restore --staged <file>..." to unstage)
deleted: .gitattributes
deleted: 1_Pooling/config.json
deleted: README.md
deleted: config.json
deleted: config_sentence_transformers.json
deleted: data_config.json
deleted: model.safetensors
deleted: modules.json
deleted: onnx/model.onnx
deleted: pytorch_model.bin
deleted: rust_model.ot
deleted: sentence_bert_config.json
deleted: special_tokens_map.json
deleted: tf_model.h5
deleted: tokenizer.json
deleted: tokenizer_config.json
deleted: train_script.py
deleted: vocab.txt
However, I noticed an earlier error: git-lfs: command not found
. That's the real reason that the clone did not succeed and left the repository in an odd state. Once I installed it, the script worked.
I think there needs to be a note or warning that before running git clone ...
, that git-lfs
is required, something like:
- Download the model
- Make sure that your system has the
git-lfs
command installed. See Git Large File Storage
for instructions.- Download the selected model to a local directory. For example, to download the all-MiniLM-L6-v2 model, use the following command: ...
Thanks for capturing the missing step on git-lfs. All comments so far have been addressed. |
README.md
Outdated
@@ -29,7 +29,7 @@ git clone https://github.com/your-username/baby-bliss-bot | |||
cd baby-bliss-bot | |||
``` | |||
|
|||
### Create/Activitate Virtual Environment | |||
### Create/Activiate Virtual Environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fix is wrong :-(. There is an extra "i" before the final "a".
Activiate
Activate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Sorry I cannot spell. :) Now fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Refer to the documentation in the pull request to find information on the scripts, how to run them, and the exploration results.