-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update demo-intrinsic to build granite3-rag #724
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Jing Chen <[email protected]>
Signed-off-by: Jing Chen <[email protected]>
if not found: | ||
# Install dependencies | ||
# Define the repository and directory | ||
llama_cpp_repo = "https://github.com/ggerganov/llama.cpp.git" | ||
llama_cpp_dir_name = "llama.cpp" | ||
|
||
# Clone the repository if it doesn't already exist | ||
if not os.path.isdir(llama_cpp_dir_name): | ||
subprocess.run(["git", "clone", "--depth", "1", llama_cpp_repo], check=True) | ||
|
||
# Install the required packages | ||
subprocess.run([ | ||
"pip", "install", | ||
"git+https://github.com/ibm-granite-community/utils.git", | ||
"huggingface_hub", | ||
"langchain_community", | ||
"langchain_ollama", | ||
"langchain-milvus", | ||
"docling", | ||
"-r", f"{llama_cpp_dir_name}/requirements.txt" | ||
], check=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if this is the best approach to install dependencies. I didn't use the lang: command
because the git clone
action will create a repository, and so we need additional logic such as checking if the repo is already cloned. I feel like the command can get overly complicated. Also, I wonder if installing dependencies via code blocks violates PDL design since it should be handled outside the program itself. Also, lang: command
code blocks can create side effects that isn't obvious. For example, git clone
creates a new repo. How do we handle this? Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we add a requirements:
field to code
blocks? then the interpreter does this part? because... its'more complex, there are venv issues, for example, that need to be managed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It raises the question again of switching to IPython and to do that like in notebooks.
# Write Modelfile for granite-rag:8b for Ollama | ||
model_file = "Modelfile" | ||
with open(model_file, "w") as modelfile: | ||
modelfile.write(f"""\ | ||
FROM {granite3_base_model} | ||
ADAPTER {lora_gguf} | ||
""") | ||
|
||
# Ideally we should use ollama.create() but there is no option to pass a modelfile | ||
subprocess.run(f"ollama create {granite3_rag_model} -f Modelfile", | ||
shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE) | ||
|
||
# Rm dependencies | ||
os.remove(model_file) | ||
if os.path.isdir(llama_cpp_dir_name): | ||
shutil.rmtree(llama_cpp_dir_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm also wondering if this is a good design. A modelfile is created (a side effect due to the Python code block). What should we do about these files? At the end you can see that I remove these files. Should we incorporate this as part of the PDL code or leave it for the end-user to handle?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fwiw, kubernetes and other declarative approaches manage this via "mounts". i.e. you declarative specify which files you want to mount into the filesystem, and where to mount them...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice if there's such requirements
or dependencies
field, such as
requirements: # (or dependencies)
# In the PDL code, llama_cpp package can be accessed using the name `llama_cpp`
- llama_cpp:
action: git clone https://github.com/ggerganov/llama.cpp.git
mountPath: ~/.pdl/llama_cpp # this is the path where this dependency can be accessed. For pip packages, it might be /.venv?
# In PDL code, we will be able to do `import utils`
- utils:
# Question: how do we address conflicts? Leave it for user to figure out?
action: |
pip install git+https://github.com/ibm-granite-community/utils.git \
huggingface_hub \
langchain_community \
langchain_ollama \
...
# since mountPath is not provided, we can will use PDL home dir as the default (if there is one?)
Some of the examples also require dependencies like wikipedia
. Today, users need to install these dependencies by pip install pdl[examples]
. Perhaps this design will help drive towards removing manual installation and just make those programs work by specifying dependencies in PDL-native way. Comments?
Signed-off-by: Jing Chen <[email protected]>
No description provided.