Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

merge the current development branch into master #81

Merged
merged 78 commits into from
Aug 10, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
f0109a3
Refactored code so that documentation is handled by separate class
DianaStrauss Jun 10, 2024
630f571
refactored code
DianaStrauss Jun 10, 2024
bef16c0
Adjusted prompt_engineer to create better prompts
DianaStrauss Jun 13, 2024
74062ff
Refactored documentation_handler.py to update .yaml file when it get …
DianaStrauss Jun 13, 2024
7591be3
Created SubmitHTTPMethod.py for better separation
DianaStrauss Jun 13, 2024
73fe5c4
Created Converter and parser for handeling yaml and json files
DianaStrauss Jun 13, 2024
430cb1f
Refactored converter and parser
DianaStrauss Jun 14, 2024
cef43e9
Added token count so that prompts are not too long -> WIP shorten pro…
DianaStrauss Jun 14, 2024
89956d7
Refactored code and added yamlFile.py
DianaStrauss Jun 17, 2024
e7ce9ae
Refactored code
DianaStrauss Jun 19, 2024
6051342
switch from RoundBasedUseCase to Agent
andreashappe Jun 24, 2024
9739d17
switch from RoundBasedUseCase to Agent
andreashappe Jun 24, 2024
dfd9dbe
switch from RoundBasedUseCase to Agent
andreashappe Jun 24, 2024
9f119d8
switch from RoundBasedUseCase to Agent
andreashappe Jun 24, 2024
15f7a64
rename RoundBasedUseCse into AutonomousUseCase
andreashappe Jun 24, 2024
69c0340
add `perform_round` to agent as abstract method
andreashappe Jun 24, 2024
6d66889
add type information to Agent
andreashappe Jun 24, 2024
be78320
Convert PrivescWithHintFile from UseCase to Agent
andreashappe Jun 24, 2024
a401e07
convert the privescLSE usecase from UseCase to Agent
andreashappe Jun 24, 2024
c75d374
move AutonomousUseCase into base package too
andreashappe Jun 24, 2024
2dc037d
add some TODO notes to prior to split-up
andreashappe Jun 24, 2024
995b199
Added simple scoring to prompt engineer
DianaStrauss Jul 4, 2024
cbafdf2
changed order of setuo methods in simple_openai_documentation
DianaStrauss Jul 4, 2024
34593e3
changed order of setuo methods in simple_openai_documentation
DianaStrauss Jul 4, 2024
b95dd31
changed order of setuo methods in simple_openai_documentation
DianaStrauss Jul 4, 2024
e267621
Addition of examples works with redocly
DianaStrauss Jul 9, 2024
56bc5ff
Added yaml file assistant
DianaStrauss Jul 9, 2024
7c681af
Can create openapi spec with examples
DianaStrauss Jul 9, 2024
120b09f
Cleaned up code
DianaStrauss Jul 12, 2024
2fcca09
Refactor code
DianaStrauss Jul 12, 2024
29aa192
Refactor code
DianaStrauss Jul 12, 2024
b2632ab
Cleaned up code
DianaStrauss Jul 12, 2024
3af909a
Cleaned up code
DianaStrauss Jul 12, 2024
b1f9886
Cleaned up code
DianaStrauss Jul 12, 2024
fc37bfd
start with agent/usecase rework
andreashappe Jul 16, 2024
7d75a2c
Fixes configurations and changes over:
Neverbolt Jul 16, 2024
d6a99d8
reintroduce agent.setup() and make more use-cases work again
andreashappe Jul 17, 2024
deddab7
reintroduce agent.setup()
andreashappe Jul 17, 2024
2f8edc3
explicitely define the UseCase (instead of annotation)
andreashappe Jul 17, 2024
1bc86b5
make LinuxPrivescWithHintFile a usecase
andreashappe Jul 17, 2024
48f7852
Changes over the UseCases to full classes
Neverbolt Jul 20, 2024
7f9f43a
Merge pull request #73 from ipa-lab/explorative_refactoring
andreashappe Jul 22, 2024
5915187
Merge branch 'main' of https://github.com/DianaStrauss/hackingBuddyGP…
andreashappe Jul 22, 2024
f84a556
Fixes `use_case` decorator return type
Neverbolt Jul 22, 2024
8e58cad
Merge branch 'development' into DianaStrauss-main
andreashappe Jul 22, 2024
bbb8133
update dependencies
andreashappe Jul 22, 2024
fd4323e
some simple renames
andreashappe Jul 22, 2024
ec3a0ee
Fixed attribute initialization of use_cases and transparent types
Neverbolt Jul 26, 2024
0babd39
Refactored code and fixed import bugs in simple_web_api_testing and s…
DianaStrauss Aug 1, 2024
09c8e3d
Merge pull request #74 from ipa-lab/DianaStrauss-main
andreashappe Aug 1, 2024
e289ad6
update readme.md a bit
andreashappe Aug 1, 2024
653a119
Update README.md
andreashappe Aug 1, 2024
7dd36ea
Update README.md
andreashappe Aug 1, 2024
99d6134
introduct before_run/after_run hooks and use them within the hintfile…
andreashappe Aug 2, 2024
676a960
re-do the LinuxPrivescWithLSE use-case to directly call agents
andreashappe Aug 2, 2024
58e144c
Adjusted code for better testing of web_api_documentation
DianaStrauss Aug 2, 2024
9a14af2
Adjusted code for better testing of web_api_documentation
DianaStrauss Aug 2, 2024
fb05d87
added tolerance for web_api_testing
DianaStrauss Aug 2, 2024
45832a5
Update README.md
andreashappe Aug 2, 2024
71e5eb8
Merge remote-tracking branch 'refs/remotes/origin/web_api_testing' in…
DianaStrauss Aug 2, 2024
e4a2285
Replaced spacy with nltk as tokenizer for shortening prompts
DianaStrauss Aug 2, 2024
d2134d8
finished mocking web_api_documentation testing
Aug 5, 2024
7c0b84a
finished adding simple mock tests for web_api_testing
Aug 5, 2024
38bfbc0
Merge pull request #76 from ipa-lab/development_without_spacy
andreashappe Aug 5, 2024
3e52a55
also run testcases when changes to development happen
andreashappe Aug 5, 2024
a337520
fixed web_api_documentation test and removed unnecessary imports
DianaStrauss Aug 6, 2024
947c8a7
Added test for prompt engineer
DianaStrauss Aug 6, 2024
1640538
Added optional dependencies to .toml file for testing, instructions w…
DianaStrauss Aug 6, 2024
86cf648
Changed name of documentation_handler of web_api as there were other …
DianaStrauss Aug 6, 2024
44af818
Added tests for llm_handler and response_handler
DianaStrauss Aug 6, 2024
9bdd6bd
Added tests for openapi converter and parser
DianaStrauss Aug 6, 2024
70a9018
add upcoming talk of manuel
andreashappe Aug 6, 2024
e4ef23a
optimizeded code
DianaStrauss Aug 6, 2024
d013162
adjusted tests
DianaStrauss Aug 6, 2024
88fcf70
fixed wrong import
DianaStrauss Aug 6, 2024
033b598
Merge pull request #80 from ipa-lab/development_without_spacy
andreashappe Aug 6, 2024
ea56264
make lse-based example work
andreashappe Aug 6, 2024
aafabf1
Merge branch 'development' of github.com:ipa-lab/hackingBuddyGPT into…
andreashappe Aug 6, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/python-app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,9 @@ name: Python application

on:
push:
branches: [ "main" ]
branches: [ "main", "development" ]
pull_request:
branches: [ "main" ]
branches: [ "main", "development" ]

permissions:
contents: read
Expand Down
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,6 @@ src/hackingBuddyGPT.egg-info/
build/
dist/
.coverage
src/hackingBuddyGPT/usecases/web_api_testing/openapi_spec/
src/hackingBuddyGPT/usecases/web_api_testing/converted_files/
/src/hackingBuddyGPT/usecases/web_api_testing/utils/openapi_spec/
42 changes: 27 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,18 @@ HackingBuddyGPT helps security researchers use LLMs to discover new attack vecto

We aim to become **THE go-to framework for security researchers** and pen-testers interested in using LLMs or LLM-based autonomous agents for security testing. To aid their experiments, we also offer re-usable [linux priv-esc benchmarks](https://github.com/ipa-lab/benchmark-privesc-linux) and publish all our findings as open-access reports.

How can LLMs aid or even emulate hackers? Threat actors are [already using LLMs](https://arxiv.org/abs/2307.00691), to better protect against this new threat we must learn more about LLMs' capabilities and help blue teams preparing for them.
If you want to use hackingBuddyGPT and need help selecting the best LLM for your tasks, [we have a paper comparing multiple LLMs](https://arxiv.org/abs/2310.11409).

**[Join us](https://discord.gg/vr4PhSM8yN) / Help us, more people need to be involved in the future of LLM-assisted pen-testing:**
## hackingBuddyGPT in the News

To ground our research in reality, we performed a comprehensive analysis into [understanding hackers' work](https://arxiv.org/abs/2308.07057). There seems to be a mismatch between some academic research and the daily work of penetration testers, please help us to create more visibility for this issue by citing this paper (if suitable and fitting).
- **upcoming** 2024-11-20: [Manuel Reinsperger](https://www.github.com/neverbolt) will present hackingBuddyGPT at the [European Symposium on Security and Artificial Intelligence (ESSAI)](https://essai-conference.eu/)
- 2024-07-26: The [GitHub Accelerator Showcase](https://github.blog/open-source/maintainers/github-accelerator-showcase-celebrating-our-second-cohort-and-whats-next/) features hackingBuddyGPT
- 2024-07-24: [Juergen](https://github.com/citostyle) speaks at [Open Source + mezcal night @ GitHub HQ](https://lu.ma/bx120myg)
- 2024-05-23: hackingBuddyGPT is part of [GitHub Accelerator 2024](https://github.blog/news-insights/company-news/2024-github-accelerator-meet-the-11-projects-shaping-open-source-ai/)
- 2023-12-05: [Andreas](https://github.com/andreashappe) presented hackingBuddyGPT at FSE'23 in San Francisco ([paper](https://arxiv.org/abs/2308.00121), [video](https://2023.esec-fse.org/details/fse-2023-ideas--visions-and-reflections/9/Towards-Automated-Software-Security-Testing-Augmenting-Penetration-Testing-through-L))
- 2023-09-20: [Andreas](https://github.com/andreashappe) presented preliminary results at [FIRST AI Security SIG](https://www.first.org/global/sigs/ai-security/)

## Original Paper

hackingBuddyGPT is described in [Getting pwn'd by AI: Penetration Testing with Large Language Models ](https://arxiv.org/abs/2308.00121), help us by citing it through:

Expand All @@ -29,7 +36,6 @@ hackingBuddyGPT is described in [Getting pwn'd by AI: Penetration Testing with L
}
~~~


## Getting help

If you need help or want to chat about using AI for security or education, please join our [discord server where we talk about all things AI + Offensive Security](https://discord.gg/vr4PhSM8yN)!
Expand Down Expand Up @@ -74,12 +80,10 @@ The following would create a new (minimal) linux privilege-escalation agent. Thr
template_dir = pathlib.Path(__file__).parent
template_next_cmd = Template(filename=str(template_dir / "next_cmd.txt"))

@use_case("minimal_linux_privesc", "Showcase Minimal Linux Priv-Escalation")
@dataclass

class MinimalLinuxPrivesc(Agent):

conn: SSHConnection = None

_sliding_history: SlidingCliHistory = None

def init(self):
Expand All @@ -89,28 +93,33 @@ class MinimalLinuxPrivesc(Agent):
self.add_capability(SSHTestCredential(conn=self.conn))
self._template_size = self.llm.count_tokens(template_next_cmd.source)

def perform_round(self, turn):
got_root : bool = False
def perform_round(self, turn: int) -> bool:
got_root: bool = False

with self.console.status("[bold green]Asking LLM for a new command..."):
with self._log.console.status("[bold green]Asking LLM for a new command..."):
# get as much history as fits into the target context size
history = self._sliding_history.get_history(self.llm.context_size - llm_util.SAFETY_MARGIN - self._template_size)

# get the next command from the LLM
answer = self.llm.get_response(template_next_cmd, capabilities=self.get_capability_block(), history=history, conn=self.conn)
cmd = llm_util.cmd_output_fixer(answer.result)

with self.console.status("[bold green]Executing that command..."):
self.console.print(Panel(answer.result, title="[bold cyan]Got command from LLM:"))
result, got_root = self.get_capability(cmd.split(" ", 1)[0])(cmd)
with self._log.console.status("[bold green]Executing that command..."):
self._log.console.print(Panel(answer.result, title="[bold cyan]Got command from LLM:"))
result, got_root = self.get_capability(cmd.split(" ", 1)[0])(cmd)

# log and output the command and its result
self.log_db.add_log_query(self._run_id, turn, cmd, result, answer)
self._log.log_db.add_log_query(self._log.run_id, turn, cmd, result, answer)
self._sliding_history.add_command(cmd, result)
self.console.print(Panel(result, title=f"[bold cyan]{cmd}"))
self._log.console.print(Panel(result, title=f"[bold cyan]{cmd}"))

# if we got root, we can stop the loop
return got_root


@use_case("Showcase Minimal Linux Priv-Escalation")
class MinimalLinuxPrivescUseCase(AutonomousAgentUseCase[MinimalLinuxPrivesc]):
pass
~~~

The corresponding `next_cmd.txt` template would be:
Expand Down Expand Up @@ -170,6 +179,9 @@ wintermute.py: error: the following arguments are required: {linux_privesc,windo

# start wintermute, i.e., attack the configured virtual machine
$ python wintermute.py minimal_linux_privesc

# install dependencies for testing if you want to run the tests
$ pip install .[testing]
~~~

## Publications about hackingBuddyGPT
Expand Down
10 changes: 9 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,14 @@ dependencies = [
'requests == 2.32.0',
'rich == 13.7.1',
'tiktoken == 0.6.0',
'instructor == 1.2.2',
'instructor == 1.3.5',
'PyYAML == 6.0.1',
'python-dotenv == 1.0.1',
'pypsexec == 0.3.0',
'pydantic == 2.8.2',
'openai == 1.28.0',
'BeautifulSoup4',
'nltk'
]

[project.urls]
Expand All @@ -54,6 +57,11 @@ pythonpath = "src"
addopts = [
"--import-mode=importlib",
]
[project.optional-dependencies]
testing = [
'pytest',
'pytest-mock'
]

[project.scripts]
wintermute = "hackingBuddyGPT.cli.wintermute:main"
Expand Down
11 changes: 10 additions & 1 deletion src/hackingBuddyGPT/capabilities/http_request.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,16 @@ def __call__(self,
) -> str:
if body is not None and body_is_base64:
body = base64.b64decode(body).decode()

if self.host[-1] != "/":
path = "/" + path
resp = self._client.request(
method,
self.host + path,
params=query,
data=body,
headers=headers,
allow_redirects=self.follow_redirects,
)
try:
resp = self._client.request(
method,
Expand Down
78 changes: 78 additions & 0 deletions src/hackingBuddyGPT/capabilities/submit_http_method.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
import base64
from dataclasses import dataclass, field
from typing import Set, Dict, Callable, Literal, Optional
import inspect

import requests
from pydantic import create_model, BaseModel

from . import Capability


@dataclass
class SubmitHTTPMethod(Capability):
http_methods_format: str
valid_http_methods: Set[str]
_client = requests.Session()
host: str
follow_redirects: bool = False
success_function: Callable[[], None] = None


submitted_valid_http_methods: Set[str] = field(default_factory=set, init=False)

def describe(self) -> str:
return f"Submits a HTTP methods, which is the goal of all activities. The HTTP methods format is {self.http_methods_format}. If you find a flag following the that format, that you have not yet submitted, then the most important first step is to submit it, ignoring all other possibilities of further action"

def to_model(self) -> BaseModel:
"""
Converts the parameters of the `__call__` function of the capability to a pydantic model, that can be used to
interface with an LLM using eg instructor or the openAI function calling API.
The model will have the same name as the capability class and will have the same fields as the `__call__`,
the `__call__` method can then be accessed by calling the `execute` method of the model.
"""
sig = inspect.signature(self.__call__)
fields = {param: (param_info.annotation, ...) for param, param_info in sig.parameters.items()}
model_type = create_model(self.__class__.__name__, __doc__=self.describe(), **fields)

def execute(model):
m = model.dict()
return self(**m)

model_type.execute = execute

return model_type

def __call__(self, method: Literal["GET", "HEAD", "POST", "PUT", "DELETE", "OPTION", "PATCH"],
path: str,
query: Optional[str] = None,
body: Optional[str] = None,
body_is_base64: Optional[bool] = False,
headers: Optional[Dict[str, str]] = None
) -> str:

if body is not None and body_is_base64:
body = base64.b64decode(body).decode()

resp = self._client.request(
method,
self.host + path,
params=query,
data=body,
headers=headers,
allow_redirects=self.follow_redirects,
)
try:
resp.raise_for_status()
except requests.exceptions.HTTPError as e:
return str(e)

headers = "\r\n".join(f"{k}: {v}" for k, v in resp.headers.items())
if len(self.submitted_valid_http_methods) == len(self.valid_http_methods):
if self.success_function is not None:
self.success_function()
else:
return "All methods submitted, congratulations"
# turn the response into "plain text format" for responding to the prompt
return f"HTTP/1.1 {resp.status_code} {resp.reason}\r\n{headers}\r\n\r\n{resp.text}"""

44 changes: 44 additions & 0 deletions src/hackingBuddyGPT/capabilities/yamlFile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
from dataclasses import dataclass, field
from typing import Tuple, List

import yaml

from . import Capability

@dataclass
class YAMLFile(Capability):

def describe(self) -> str:
return "Takes a Yaml file and updates it with the given information"

def __call__(self, yaml_str: str) -> str:
"""
Updates a YAML string based on provided inputs and returns the updated YAML string.

Args:
yaml_str (str): Original YAML content in string form.
updates (dict): A dictionary representing the updates to be applied.

Returns:
str: Updated YAML content as a string.
"""
try:
# Load the YAML content from string
data = yaml.safe_load(yaml_str)

print(f'Updates:{yaml_str}')

# Apply updates from the updates dictionary
#for key, value in updates.items():
# if key in data:
# data[key] = value
# else:
# print(f"Warning: Key '{key}' not found in the original data. Adding new key.")
# data[key] = value
#
## Convert the updated dictionary back into a YAML string
#updated_yaml_str = yaml.safe_dump(data, sort_keys=False)
#return updated_yaml_str
except yaml.YAMLError as e:
print(f"Error processing YAML data: {e}")
return "None"
9 changes: 5 additions & 4 deletions src/hackingBuddyGPT/cli/wintermute.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@ def main():
parser = argparse.ArgumentParser()
subparser = parser.add_subparsers(required=True)
for name, use_case in use_cases.items():
use_case.build_parser(subparser.add_parser(
subb = subparser.add_parser(
name=use_case.name,
help=use_case.description
))

parsed = parser.parse_args(sys.argv[1:])
)
use_case.build_parser(subb)
x= sys.argv[1:]
parsed = parser.parse_args(x)
instance = parsed.use_case(parsed)
instance.init()
instance.run()
Expand Down
35 changes: 26 additions & 9 deletions src/hackingBuddyGPT/usecases/agents.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,18 +4,33 @@
from rich.panel import Panel
from typing import Dict

from hackingBuddyGPT.usecases.base import Logger
from hackingBuddyGPT.utils import llm_util

from hackingBuddyGPT.capabilities.capability import Capability, capabilities_to_simple_text_handler
from .common_patterns import RoundBasedUseCase
from hackingBuddyGPT.utils.openai.openai_llm import OpenAIConnection


@dataclass
class Agent(RoundBasedUseCase, ABC):
class Agent(ABC):
_capabilities: Dict[str, Capability] = field(default_factory=dict)
_default_capability: Capability = None
_log: Logger = None

llm: OpenAIConnection = None

def init(self):
super().init()
pass

def before_run(self):
pass

def after_run(self):
pass

# callback
@abstractmethod
def perform_round(self, turn: int) -> bool:
pass

def add_capability(self, cap: Capability, default: bool = False):
self._capabilities[cap.get_name()] = cap
Expand All @@ -29,6 +44,7 @@ def get_capability_block(self) -> str:
capability_descriptions, _parser = capabilities_to_simple_text_handler(self._capabilities)
return "You can either\n\n" + "\n".join(f"- {description}" for description in capability_descriptions.values())


@dataclass
class AgentWorldview(ABC):

Expand All @@ -40,6 +56,7 @@ def to_template(self):
def update(self, capability, cmd, result):
pass


class TemplatedAgent(Agent):

_state: AgentWorldview = None
Expand All @@ -59,7 +76,7 @@ def set_template(self, template:str):
def perform_round(self, turn:int) -> bool:
got_root : bool = False

with self.console.status("[bold green]Asking LLM for a new command..."):
with self._log.console.status("[bold green]Asking LLM for a new command..."):
# TODO output/log state
options = self._state.to_template()
options.update({
Expand All @@ -70,16 +87,16 @@ def perform_round(self, turn:int) -> bool:
answer = self.llm.get_response(self._template, **options)
cmd = llm_util.cmd_output_fixer(answer.result)

with self.console.status("[bold green]Executing that command..."):
self.console.print(Panel(answer.result, title="[bold cyan]Got command from LLM:"))
with self._log.console.status("[bold green]Executing that command..."):
self._log.console.print(Panel(answer.result, title="[bold cyan]Got command from LLM:"))
capability = self.get_capability(cmd.split(" ", 1)[0])
result, got_root = capability(cmd)

# log and output the command and its result
self.log_db.add_log_query(self._run_id, turn, cmd, result, answer)
self._log.log_db.add_log_query(self._log.run_id, turn, cmd, result, answer)
self._state.update(capability, cmd, result)
# TODO output/log new state
self.console.print(Panel(result, title=f"[bold cyan]{cmd}"))
self._log.console.print(Panel(result, title=f"[bold cyan]{cmd}"))

# if we got root, we can stop the loop
return got_root
Loading
Loading