To to try this out, you can run a local Streamlit application that communicates with the Covalent backend.
- Make a conda environment with:
covalent-cloud>=0.65.1
streamlit==1.35.0
torch==2.2.2
transformers==4.39.3
-
Run the
chatbot-backend.ipynb
notebook to deploy the GPU Backend with Covalent. -
Run the command
streamlit run chatbot-frontend.py
to start the local Streamlit app. -
Copy-paste the deployment address and API key into the app's "Settings" sidebar.
That's all. You can now chat with Curtis Covalent!
This example is designed to run a Llama-based chatbot. There are two adjustable settings:
MEMORY_LENGTH = 50
MAX_RESPONSE_TOKENS = 275
The memory length determines the number of prior messages (one prompt-response pair equals 2 messages) that the chat bot "remembers". The chatbot's memory lives in the Streamlit app cache, so refreshing the app window will clear the bot's memory. The maximum number of response tokens determines the length of the response.
The model is prompted as follows.
LLAMA_PROMPT_TEMPLATE = """<s>[INST] <<SYS>>
You are a friendly Canadian chatbot named 'Curtis Covalent'. Introduce yourself as such.
Please consider the message history below, then provide a brief and polite response.
<</SYS>>
{message_memory} [/INST] """ # NOTE: trailing space is important!
The message_memory
field is a formatted reproduction of the chat displayed to the user.
The app will play a screen animation if the words "snow" or "balloons" are detected in the bot's response.