Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update demo-intrinsic to build granite3-rag #724

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from
Draft

Conversation

jgchn
Copy link
Collaborator

@jgchn jgchn commented Mar 7, 2025

No description provided.

Comment on lines +32 to +52
if not found:
# Install dependencies
# Define the repository and directory
llama_cpp_repo = "https://github.com/ggerganov/llama.cpp.git"
llama_cpp_dir_name = "llama.cpp"

# Clone the repository if it doesn't already exist
if not os.path.isdir(llama_cpp_dir_name):
subprocess.run(["git", "clone", "--depth", "1", llama_cpp_repo], check=True)

# Install the required packages
subprocess.run([
"pip", "install",
"git+https://github.com/ibm-granite-community/utils.git",
"huggingface_hub",
"langchain_community",
"langchain_ollama",
"langchain-milvus",
"docling",
"-r", f"{llama_cpp_dir_name}/requirements.txt"
], check=True)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is the best approach to install dependencies. I didn't use the lang: command because the git clone action will create a repository, and so we need additional logic such as checking if the repo is already cloned. I feel like the command can get overly complicated. Also, I wonder if installing dependencies via code blocks violates PDL design since it should be handled outside the program itself. Also, lang: command code blocks can create side effects that isn't obvious. For example, git clone creates a new repo. How do we handle this? Thoughts?

Copy link
Member

@starpit starpit Mar 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we add a requirements: field to code blocks? then the interpreter does this part? because... its'more complex, there are venv issues, for example, that need to be managed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It raises the question again of switching to IPython and to do that like in notebooks.

Comment on lines +65 to +80
# Write Modelfile for granite-rag:8b for Ollama
model_file = "Modelfile"
with open(model_file, "w") as modelfile:
modelfile.write(f"""\
FROM {granite3_base_model}
ADAPTER {lora_gguf}
""")

# Ideally we should use ollama.create() but there is no option to pass a modelfile
subprocess.run(f"ollama create {granite3_rag_model} -f Modelfile",
shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)

# Rm dependencies
os.remove(model_file)
if os.path.isdir(llama_cpp_dir_name):
shutil.rmtree(llama_cpp_dir_name)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also wondering if this is a good design. A modelfile is created (a side effect due to the Python code block). What should we do about these files? At the end you can see that I remove these files. Should we incorporate this as part of the PDL code or leave it for the end-user to handle?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, kubernetes and other declarative approaches manage this via "mounts". i.e. you declarative specify which files you want to mount into the filesystem, and where to mount them...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be nice if there's such requirements or dependencies field, such as

requirements: # (or dependencies)
# In the PDL code, llama_cpp package can be accessed using the name `llama_cpp`
- llama_cpp:
    action: git clone https://github.com/ggerganov/llama.cpp.git
    mountPath: ~/.pdl/llama_cpp  # this is the path where this dependency can be accessed. For pip packages, it might be /.venv?

# In PDL code, we will be able to do `import utils`
- utils:
    # Question: how do we address conflicts? Leave it for user to figure out?
    action: |
      pip install git+https://github.com/ibm-granite-community/utils.git \
      huggingface_hub \
      langchain_community \
      langchain_ollama \
      ...
    # since mountPath is not provided, we can will use PDL home dir as the default (if there is one?)

Some of the examples also require dependencies like wikipedia. Today, users need to install these dependencies by pip install pdl[examples]. Perhaps this design will help drive towards removing manual installation and just make those programs work by specifying dependencies in PDL-native way. Comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants