Skip to content

Commit

Permalink
Release 1.6 (#58)
Browse files Browse the repository at this point in the history
Co-authored-by: Mason Davis <[email protected]>
Co-authored-by: Remy <[email protected]>
  • Loading branch information
3 people authored Dec 23, 2023
1 parent 28bb109 commit 8683d33
Show file tree
Hide file tree
Showing 18 changed files with 723 additions and 89 deletions.
50 changes: 33 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,28 +8,43 @@


Gato, or GitHub Attack Toolkit, is an enumeration and attack tool that allows both
blue teamers and offensive security practitioners to evaluate the blast radius
of a compromised personal access token within a GitHub organization.
blue teamers and offensive security practitioners to identify and exploit
pipeline vulnerabilities within a GitHub organization's public and private
repositories.

The tool also allows searching for and thoroughly enumerating public
repositories that utilize self-hosted runners. GitHub recommends that
self-hosted runners only be utilized for private repositories, however, there
are thousands of organizations that utilize self-hosted runners.
The tool has post-exploitation features to leverage a compromised personal
access token in addition to enumeration features to identify poisoned pipeline
execution vulnerabilities against public repositories that use self-hosted GitHub Actions
runners.

## Version 1.5 Released
GitHub recommends that self-hosted runners only be utilized for private repositories, however, there are thousands of organizations that utilize self-hosted runners. Default configurations are often vulnerable, and Gato uses a mix of workflow file analysis and run-log analysis to identify potentially vulnerable repositories at scale.

Gato version 1.5 was released on June 27th, 2023!
## Version 1.6

#### New Features
Gato version 1.6 improves the public repository enumeration feature set.

* Secrets Enumeration
* Secrets Exfiltration
* API-only Enumeration
* JSON Output
* Improved Code Search
* GitHub Enterprise Server Support
* PAT Validation Only Mode
* Quality of life and UX improvements
Previously, Gato's code search functionality by default only looked for
yaml files that explicitly had "self-hosted" in the name. Now, the
code search functionality supports a SourceGraph query. This query has a
lower false negative rate and is not limited by GitHub's code search limit.

For example, the following query will identify public repositories that use
self-hosted runners:

`gato search --sourcegraph --output-text public_repos.txt`

This can be fed back into Gato's enumeration feature:

`gato enumerate --repositories public_repos.txt --output-json enumeration_results.json`

Additionally the release contains several improvements under the hood to speed up the enumeration process. This includes changes to limit redundant run-log downloads (which are the slowest part of Gato's enumeration process) and using the GraphQL API to download workflow files when enumerating an entire organization. Finally, Gato will use a heuristic to detect if an attached runner is non-ephemeral. Most poisoned pipeline execution attacks require a non-ephemeral runner in order to exploit.

### New Features

* SourceGraph Search Functionality
* Improved Public Repository Enumeration Speed
* Improved Workflow File Analysis
* Non-ephemeral self-hosted runner detection

## Who is it for?

Expand All @@ -44,6 +59,7 @@ Gato version 1.5 was released on June 27th, 2023!

* GitHub Classic PAT Privilege Enumeration
* GitHub Code Search API-based enumeration
* SourceGraph Search enumeration
* GitHub Action Run Log Parsing to identify Self-Hosted Runners
* Bulk Repo Sparse Clone Features
* GitHub Action Workflow Parsing
Expand Down
13 changes: 4 additions & 9 deletions gato/attack/attack.py
Original file line number Diff line number Diff line change
Expand Up @@ -181,28 +181,23 @@ def __execute_and_wait_workflow(
"""

workflow_id = None
branch_created = self.api.create_branch(target_repo, branch)

if not branch_created:
Output.error("Failed to create branch!")
return False

if self.author_email and self.author_name:
rev_hash = self.api.commit_file(
rev_hash = self.api.commit_workflow(
target_repo,
branch,
f".github/workflows/{yaml_name}.yml",
yaml_contents.encode(),
f"{yaml_name}.yml",
commit_author=self.author_name,
commit_email=self.author_email,
message=commit_message
)
else:
rev_hash = self.api.commit_file(
rev_hash = self.api.commit_workflow(
target_repo,
branch,
f".github/workflows/{yaml_name}.yml",
yaml_contents.encode(),
f"{yaml_name}.yml",
message=commit_message
)

Expand Down
52 changes: 42 additions & 10 deletions gato/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -276,19 +276,35 @@ def search(args, parser):
http_proxy=args.http_proxy,
github_url=args.api_url
)
if args.sourcegraph:
if args.query and args.target:
parser.error(
f"{Fore.RED}[-]{Style.RESET_ALL} You cannot select an organization "
"with a custom query!"
)

if not (args.query or args.target):
parser.error(
f"{Fore.RED}[-]{Style.RESET_ALL} You must select an organization "
"or pass a custom query!."
)

if args.query:
gh_search_runner.use_search_api(
organization=args.target, query=args.query
results = gh_search_runner.use_sourcegraph_api(
organization=args.target,
query=args.query
)
else:
gh_search_runner.use_search_api(organization=args.target)
if not (args.query or args.target):
parser.error(
f"{Fore.RED}[-]{Style.RESET_ALL} You must select an organization "
"or pass a custom query!."
)
if args.query:
results = gh_search_runner.use_search_api(
organization=args.target,
query=args.query
)
else:
results = gh_search_runner.use_search_api(
organization=args.target
)

if results:
gh_search_runner.present_results(results, args.output_text)


def configure_parser_general(parser):
Expand Down Expand Up @@ -563,3 +579,19 @@ def configure_parser_search(parser):
metavar="QUERY",
required=False
)

parser.add_argument(
"--sourcegraph", "-sg",
help="Use Sourcegraph API to search for self-hosted runners.",
required=False,
action="store_true"
)

parser.add_argument(
"--output-text", "-oT",
help=(
"Save enumeration output to text file."
),
metavar="TEXT_FILE",
type=StringType(256)
)
15 changes: 13 additions & 2 deletions gato/enumerate/enumerate.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import logging

from gato.github import Api
from gato.github import GqlQueries
from gato.models import Repository, Organization
from gato.cli import Output
from gato.enumerate.repository import RepositoryEnum
Expand Down Expand Up @@ -173,12 +174,22 @@ def enumerate_organization(self, org: str):
f"the {organization.name} organization!"
)

Output.info(f"Querying and caching workflow YAML files!")
wf_queries = GqlQueries.get_workflow_ymls(enum_list)

for wf_query in wf_queries:
result = self.org_e.api.call_post('/graphql', wf_query)
# Sometimes we don't get a 200, fall back in this case.
if result.status_code == 200:
self.repo_e.construct_workflow_cache(result.json()['data']['nodes'])
else:
Output.warn("GraphQL query failed, will revert to REST workflow query for impacted repositories!")
for repo in enum_list:

Output.tabbed(
f"Enumerating: {Output.bright(repo.name)}!"
)
self.repo_e.enumerate_repository(repo)

self.repo_e.enumerate_repository(repo, large_org_enum=len(enum_list) > 100)
self.repo_e.enumerate_repository_secrets(repo)

Recommender.print_repo_secrets(
Expand Down
7 changes: 6 additions & 1 deletion gato/enumerate/recommender.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,7 +140,7 @@ def print_repo_runner_info(repository: Repository):
Output.result(
f"The repository contains a workflow: "
f"{Output.bright(repository.sh_workflow_names[0])} that "
"executes on self-hosted runners!"
"might execute on self-hosted runners!"
)

if repository.accessible_runners:
Expand All @@ -157,6 +157,11 @@ def print_repo_runner_info(repository: Repository):
f"{Output.bright(repository.accessible_runners[0].machine_name)}"
)

for runner in repository.accessible_runners:
if runner.non_ephemeral:
Output.owned("The repository contains a non-ephemeral self-hosted runner!")
break

if repository.runners:
Output.result(
f"The repository has {len(repository.runners)} repo-level"
Expand Down
63 changes: 52 additions & 11 deletions gato/enumerate/repository.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ def __init__(self, api: Api, skip_log: bool, output_yaml):
api (Api): GitHub API wraper object.
"""
self.api = api
self.workflow_cache = {}
self.skip_log = skip_log
self.output_yaml = output_yaml

Expand All @@ -40,11 +41,12 @@ def __perform_runlog_enumeration(self, repository: Repository):
)

if wf_runs:
runner = Runner(
wf_runs[0]['runner_name'], wf_runs[0]['machine_name']
)
for wf_run in wf_runs:
runner = Runner(
wf_run['runner_name'], wf_run['machine_name'], non_ephemeral=wf_run['non_ephemeral']
)

repository.add_accessible_runner(runner)
repository.add_accessible_runner(runner)
runner_detected = True

return runner_detected
Expand All @@ -60,12 +62,15 @@ def __perform_yml_enumeration(self, repository: Repository):
list: List of workflows that execute on sh runner, empty otherwise.
"""
runner_wfs = []
ymls = self.api.retrieve_workflow_ymls(repository.name)

if repository.name in self.workflow_cache:
ymls = self.workflow_cache[repository.name]
else:
ymls = self.api.retrieve_workflow_ymls(repository.name)

for (wf, yml) in ymls:
try:
parsed_yml = WorkflowParser(yml, repository.name, wf)

self_hosted_jobs = parsed_yml.self_hosted()

if self_hosted_jobs:
Expand All @@ -79,12 +84,13 @@ def __perform_yml_enumeration(self, repository: Repository):
# At this point we only know the extension, so handle and
# ignore malformed yml files.
except Exception as parse_error:
print(parse_error)

print(f"{wf}: {str(parse_error)}")
logger.warning("Attmpted to parse invalid yaml!")

return runner_wfs

def enumerate_repository(self, repository: Repository):
def enumerate_repository(self, repository: Repository, large_org_enum=False):
"""Enumerate a repository, and check everything relevant to
self-hosted runner abuse that that the user has permissions to check.
Expand Down Expand Up @@ -119,15 +125,25 @@ def enumerate_repository(self, repository: Repository):

repository.set_runners(repo_runners)

if not self.skip_log and self.__perform_runlog_enumeration(repository):
runner_detected = True

workflows = self.__perform_yml_enumeration(repository)

if len(workflows) > 0:
repository.add_self_hosted_workflows(workflows)
runner_detected = True

if not self.skip_log:
# If we are enumerating an organization, only enumerate runlogs if
# the workflow suggests a sh_runner.
if large_org_enum and runner_detected:
self.__perform_runlog_enumeration(repository)

# If we are doing internal enum, get the logs, because coverage is
# more important here and it's ok if it takes time.
elif not repository.is_public() and self.__perform_runlog_enumeration(repository):
runner_detected = True
else:
runner_detected = self.__perform_runlog_enumeration(repository)

if runner_detected:
# Only display permissions (beyond having none) if runner is
# detected.
Expand Down Expand Up @@ -158,3 +174,28 @@ def enumerate_repository_secrets(

if org_secrets:
repository.set_accessible_org_secrets(org_secrets)

def construct_workflow_cache(self, yml_results):
"""Creates a cache of workflow yml files retrieved from graphQL. Since
graphql and REST do not have parity, we still need to use rest for most
enumeration calls. This method saves off all yml files, so during org
level enumeration if we perform yml enumeration the cached file is used
instead of making github REST requests.
Args:
yml_results (list): List of results from individual GraphQL queries
(100 nodes at a time).
"""
for result in yml_results:
owner = result['nameWithOwner']

self.workflow_cache[owner] = list()

if not result['object']:
continue

for yml_node in result['object']['entries']:
yml_name = yml_node['name']
if yml_name.lower().endswith('yml') or yml_name.lower().endswith('yaml'):
contents = yml_node['object']['text']
self.workflow_cache[owner].append((yml_name, contents))
1 change: 1 addition & 0 deletions gato/github/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from .api import Api
from .gql_queries import GqlQueries
from .search import Search
Loading

0 comments on commit 8683d33

Please sign in to comment.