Skip to content

Commit

Permalink
Chore: clean up LLM (prompt caching, supports fn calling), leftover r…
Browse files Browse the repository at this point in the history
…enames (#6095)
  • Loading branch information
enyst authored Feb 1, 2025
1 parent 3b0bbce commit eb8d160
Show file tree
Hide file tree
Showing 21 changed files with 119 additions and 187 deletions.
8 changes: 4 additions & 4 deletions .github/workflows/ghcr-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ jobs:
exit 1
fi
# Run unit tests with the EventStream runtime Docker images as root
# Run unit tests with the Docker runtime Docker images as root
test_runtime_root:
name: RT Unit Tests (Root)
needs: [ghcr_build_runtime]
Expand Down Expand Up @@ -286,7 +286,7 @@ jobs:
image_name=ghcr.io/${{ github.repository_owner }}/runtime:${{ env.RELEVANT_SHA }}-${{ matrix.base_image }}
image_name=$(echo $image_name | tr '[:upper:]' '[:lower:]')
TEST_RUNTIME=eventstream \
TEST_RUNTIME=docker \
SANDBOX_USER_ID=$(id -u) \
SANDBOX_RUNTIME_CONTAINER_IMAGE=$image_name \
TEST_IN_CI=true \
Expand All @@ -297,7 +297,7 @@ jobs:
env:
CODECOV_TOKEN: ${{ secrets.CODECOV_TOKEN }}

# Run unit tests with the EventStream runtime Docker images as openhands user
# Run unit tests with the Docker runtime Docker images as openhands user
test_runtime_oh:
name: RT Unit Tests (openhands)
runs-on: ubuntu-latest
Expand Down Expand Up @@ -363,7 +363,7 @@ jobs:
image_name=ghcr.io/${{ github.repository_owner }}/runtime:${{ env.RELEVANT_SHA }}-${{ matrix.base_image }}
image_name=$(echo $image_name | tr '[:upper:]' '[:lower:]')
TEST_RUNTIME=eventstream \
TEST_RUNTIME=docker \
SANDBOX_USER_ID=$(id -u) \
SANDBOX_RUNTIME_CONTAINER_IMAGE=$image_name \
TEST_IN_CI=true \
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@


# 📦 Runtime EventStream
# 📦 Runtime Docker

Le Runtime EventStream d'OpenHands est le composant principal qui permet l'exécution sécurisée et flexible des actions des agents d'IA.
Le Runtime Docker d'OpenHands est le composant principal qui permet l'exécution sécurisée et flexible des actions des agents d'IA.
Il crée un environnement en bac à sable (sandbox) en utilisant Docker, où du code arbitraire peut être exécuté en toute sécurité sans risquer le système hôte.

## Pourquoi avons-nous besoin d'un runtime en bac à sable ?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -163,7 +163,7 @@ Les options de configuration de base sont définies dans la section `[core]` du

- `runtime`
- Type : `str`
- Valeur par défaut : `"eventstream"`
- Valeur par défaut : `"docker"`
- Description : Environnement d'exécution

- `default_agent`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ Pour créer un workflow d'évaluation pour votre benchmark, suivez ces étapes :
def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig:
config = AppConfig(
default_agent=metadata.agent_class,
runtime='eventstream',
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='your_container_image',
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
以下是翻译后的内容:

# 📦 EventStream 运行时
# 📦 Docker 运行时

OpenHands EventStream 运行时是实现 AI 代理操作安全灵活执行的核心组件。
OpenHands Docker 运行时是实现 AI 代理操作安全灵活执行的核心组件。
它使用 Docker 创建一个沙盒环境,可以安全地运行任意代码而不会危及主机系统。

## 为什么我们需要沙盒运行时?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -162,7 +162,7 @@

- `runtime`
- 类型: `str`
- 默认值: `"eventstream"`
- 默认值: `"docker"`
- 描述: 运行时环境

- `default_agent`
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ OpenHands 的主要入口点在 `openhands/core/main.py` 中。以下是它的
def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig:
config = AppConfig(
default_agent=metadata.agent_class,
runtime='eventstream',
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='your_container_image',
Expand Down
4 changes: 2 additions & 2 deletions docs/modules/usage/architecture/runtime.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# 📦 EventStream Runtime
# 📦 Docker Runtime

The OpenHands EventStream Runtime is the core component that enables secure and flexible execution of AI agent's action.
The OpenHands Docker Runtime is the core component that enables secure and flexible execution of AI agent's action.
It creates a sandboxed environment using Docker, where arbitrary code can be run safely without risking the host system.

## Why do we need a sandboxed runtime?
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/usage/configuration-options.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ The core configuration options are defined in the `[core]` section of the `confi

- `runtime`
- Type: `str`
- Default: `"eventstream"`
- Default: `"docker"`
- Description: Runtime environment

- `default_agent`
Expand Down
2 changes: 1 addition & 1 deletion docs/modules/usage/how-to/evaluation-harness.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@ To create an evaluation workflow for your benchmark, follow these steps:
def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig:
config = AppConfig(
default_agent=metadata.agent_class,
runtime='eventstream',
runtime='docker',
max_iterations=metadata.max_iterations,
sandbox=SandboxConfig(
base_container_image='your_container_image',
Expand Down
6 changes: 0 additions & 6 deletions openhands/core/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,12 +98,6 @@ def __init__(self, message='Operation was cancelled'):
super().__init__(message)


class CloudFlareBlockageError(Exception):
"""Exception raised when a request is blocked by CloudFlare."""

pass


# ============================================
# LLM function calling Exceptions
# ============================================
Expand Down
194 changes: 93 additions & 101 deletions openhands/llm/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,6 @@
from litellm.types.utils import CostPerToken, ModelResponse, Usage
from litellm.utils import create_pretrained_tokenizer

from openhands.core.exceptions import CloudFlareBlockageError
from openhands.core.logger import openhands_logger as logger
from openhands.core.message import Message
from openhands.llm.debug_mixin import DebugMixin
Expand Down Expand Up @@ -218,99 +217,86 @@ def wrapper(*args, **kwargs):
# log the entire LLM prompt
self.log_prompt(messages)

if self.is_caching_prompt_active():
# Anthropic-specific prompt caching
if 'claude-3' in self.config.model:
kwargs['extra_headers'] = {
'anthropic-beta': 'prompt-caching-2024-07-31',
}

# set litellm modify_params to the configured value
# True by default to allow litellm to do transformations like adding a default message, when a message is empty
# NOTE: this setting is global; unlike drop_params, it cannot be overridden in the litellm completion partial
litellm.modify_params = self.config.modify_params

try:
# Record start time for latency measurement
start_time = time.time()
# we don't support streaming here, thus we get a ModelResponse
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)
# Record start time for latency measurement
start_time = time.time()

# Calculate and record latency
latency = time.time() - start_time
response_id = resp.get('id', 'unknown')
self.metrics.add_response_latency(latency, response_id)
# we don't support streaming here, thus we get a ModelResponse
resp: ModelResponse = self._completion_unwrapped(*args, **kwargs)

non_fncall_response = copy.deepcopy(resp)
if mock_function_calling:
assert len(resp.choices) == 1
assert mock_fncall_tools is not None
non_fncall_response_message = resp.choices[0].message
fn_call_messages_with_response = (
convert_non_fncall_messages_to_fncall_messages(
messages + [non_fncall_response_message], mock_fncall_tools
)
# Calculate and record latency
latency = time.time() - start_time
response_id = resp.get('id', 'unknown')
self.metrics.add_response_latency(latency, response_id)

non_fncall_response = copy.deepcopy(resp)
if mock_function_calling:
assert len(resp.choices) == 1
assert mock_fncall_tools is not None
non_fncall_response_message = resp.choices[0].message
fn_call_messages_with_response = (
convert_non_fncall_messages_to_fncall_messages(
messages + [non_fncall_response_message], mock_fncall_tools
)
fn_call_response_message = fn_call_messages_with_response[-1]
if not isinstance(fn_call_response_message, LiteLLMMessage):
fn_call_response_message = LiteLLMMessage(
**fn_call_response_message
)
resp.choices[0].message = fn_call_response_message

message_back: str = resp['choices'][0]['message']['content'] or ''
tool_calls: list[ChatCompletionMessageToolCall] = resp['choices'][0][
'message'
].get('tool_calls', [])
if tool_calls:
for tool_call in tool_calls:
fn_name = tool_call.function.name
fn_args = tool_call.function.arguments
message_back += f'\nFunction call: {fn_name}({fn_args})'

# log the LLM response
self.log_response(message_back)

# post-process the response first to calculate cost
cost = self._post_completion(resp)

# log for evals or other scripts that need the raw completion
if self.config.log_completions:
assert self.config.log_completions_folder is not None
log_file = os.path.join(
self.config.log_completions_folder,
# use the metric model name (for draft editor)
f'{self.metrics.model_name.replace("/", "__")}-{time.time()}.json',
)
fn_call_response_message = fn_call_messages_with_response[-1]
if not isinstance(fn_call_response_message, LiteLLMMessage):
fn_call_response_message = LiteLLMMessage(
**fn_call_response_message
)
resp.choices[0].message = fn_call_response_message

message_back: str = resp['choices'][0]['message']['content'] or ''
tool_calls: list[ChatCompletionMessageToolCall] = resp['choices'][0][
'message'
].get('tool_calls', [])
if tool_calls:
for tool_call in tool_calls:
fn_name = tool_call.function.name
fn_args = tool_call.function.arguments
message_back += f'\nFunction call: {fn_name}({fn_args})'

# log the LLM response
self.log_response(message_back)

# post-process the response first to calculate cost
cost = self._post_completion(resp)

# log for evals or other scripts that need the raw completion
if self.config.log_completions:
assert self.config.log_completions_folder is not None
log_file = os.path.join(
self.config.log_completions_folder,
# use the metric model name (for draft editor)
f'{self.metrics.model_name.replace("/", "__")}-{time.time()}.json',
)

# set up the dict to be logged
_d = {
'messages': messages,
'response': resp,
'args': args,
'kwargs': {k: v for k, v in kwargs.items() if k != 'messages'},
'timestamp': time.time(),
'cost': cost,
}

# if non-native function calling, save messages/response separately
if mock_function_calling:
# Overwrite response as non-fncall to be consistent with messages
_d['response'] = non_fncall_response

# Save fncall_messages/response separately
_d['fncall_messages'] = original_fncall_messages
_d['fncall_response'] = resp
with open(log_file, 'w') as f:
f.write(json.dumps(_d))

# set up the dict to be logged
_d = {
'messages': messages,
'response': resp,
'args': args,
'kwargs': {k: v for k, v in kwargs.items() if k != 'messages'},
'timestamp': time.time(),
'cost': cost,
}

# if non-native function calling, save messages/response separately
if mock_function_calling:
# Overwrite response as non-fncall to be consistent with messages
_d['response'] = non_fncall_response

# Save fncall_messages/response separately
_d['fncall_messages'] = original_fncall_messages
_d['fncall_response'] = resp
with open(log_file, 'w') as f:
f.write(json.dumps(_d))

return resp
except APIError as e:
if 'Attention Required! | Cloudflare' in str(e):
raise CloudFlareBlockageError(
'Request blocked by CloudFlare'
) from e
raise
return resp

self._completion = wrapper

Expand Down Expand Up @@ -414,6 +400,25 @@ def init_model_info(self):
):
self.config.max_output_tokens = self.model_info['max_tokens']

# Initialize function calling capability
# Check if model name is in our supported list
model_name_supported = (
self.config.model in FUNCTION_CALLING_SUPPORTED_MODELS
or self.config.model.split('/')[-1] in FUNCTION_CALLING_SUPPORTED_MODELS
or any(m in self.config.model for m in FUNCTION_CALLING_SUPPORTED_MODELS)
)

# Handle native_tool_calling user-defined configuration
if self.config.native_tool_calling is None:
self._function_calling_active = model_name_supported
elif self.config.native_tool_calling is False:
self._function_calling_active = False
else:
# try to enable native tool calling if supported by the model
self._function_calling_active = litellm.supports_function_calling(
model=self.config.model
)

def vision_is_active(self) -> bool:
with warnings.catch_warnings():
warnings.simplefilter('ignore')
Expand Down Expand Up @@ -455,24 +460,11 @@ def is_caching_prompt_active(self) -> bool:
)

def is_function_calling_active(self) -> bool:
# Check if model name is in our supported list
model_name_supported = (
self.config.model in FUNCTION_CALLING_SUPPORTED_MODELS
or self.config.model.split('/')[-1] in FUNCTION_CALLING_SUPPORTED_MODELS
or any(m in self.config.model for m in FUNCTION_CALLING_SUPPORTED_MODELS)
)
"""Returns whether function calling is supported and enabled for this LLM instance.
# Handle native_tool_calling user-defined configuration
if self.config.native_tool_calling is None:
return model_name_supported
elif self.config.native_tool_calling is False:
return False
else:
# try to enable native tool calling if supported by the model
supports_fn_call = litellm.supports_function_calling(
model=self.config.model
)
return supports_fn_call
The result is cached during initialization for performance.
"""
return self._function_calling_active

def _post_completion(self, response: ModelResponse) -> float:
"""Post-process the completion response.
Expand Down
6 changes: 4 additions & 2 deletions openhands/llm/retry_mixin.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ def retry_decorator(self, **kwargs):
A retry decorator with the parameters customizable in configuration.
"""
num_retries = kwargs.get('num_retries')
retry_exceptions = kwargs.get('retry_exceptions')
retry_exceptions: tuple = kwargs.get('retry_exceptions', ())
retry_min_wait = kwargs.get('retry_min_wait')
retry_max_wait = kwargs.get('retry_max_wait')
retry_multiplier = kwargs.get('retry_multiplier')
Expand All @@ -39,7 +39,9 @@ def before_sleep(retry_state):
before_sleep=before_sleep,
stop=stop_after_attempt(num_retries) | stop_if_should_exit(),
reraise=True,
retry=(retry_if_exception_type(retry_exceptions)),
retry=(
retry_if_exception_type(retry_exceptions)
), # retry only for these types
wait=wait_exponential(
multiplier=retry_multiplier,
min=retry_min_wait,
Expand Down
2 changes: 1 addition & 1 deletion tests/runtime/test_bash.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Bash-related tests for the EventStreamRuntime, which connects to the ActionExecutor running in the sandbox."""
"""Bash-related tests for the DockerRuntime, which connects to the ActionExecutor running in the sandbox."""

import os
import time
Expand Down
2 changes: 1 addition & 1 deletion tests/runtime/test_browsing.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
"""Browsing-related tests for the EventStreamRuntime, which connects to the ActionExecutor running in the sandbox."""
"""Browsing-related tests for the DockerRuntime, which connects to the ActionExecutor running in the sandbox."""

from conftest import _close_test_runtime, _load_runtime

Expand Down
Loading

0 comments on commit eb8d160

Please sign in to comment.