Release 1.6 (#58)

Co-authored-by: Mason Davis <[email protected]> Co-authored-by: Remy <[email protected]>
praetorian-inc · Dec 23, 2023 · 8683d33 · 8683d33
1 parent 28bb109
commit 8683d33
Show file tree

Hide file tree

Showing 18 changed files with 723 additions and 89 deletions.
diff --git a/README.md b/README.md
@@ -8,28 +8,43 @@
 
 
 Gato, or GitHub Attack Toolkit, is an enumeration and attack tool that allows both 
-blue teamers and offensive security practitioners to evaluate the blast radius 
-of a compromised personal access token within a GitHub organization.
+blue teamers and offensive security practitioners to identify and exploit 
+pipeline vulnerabilities within a GitHub organization's public and private 
+repositories.
 
-The tool also allows searching for and thoroughly enumerating public
-repositories that utilize self-hosted runners. GitHub recommends that
-self-hosted runners only be utilized for private repositories, however, there
-are thousands of organizations that utilize self-hosted runners.
+The tool has post-exploitation features to leverage a compromised personal
+access token in addition to enumeration features to identify poisoned pipeline
+execution vulnerabilities against public repositories that use self-hosted GitHub Actions 
+runners.
 
-## Version 1.5 Released
+GitHub recommends that self-hosted runners only be utilized for private repositories, however, there are thousands of organizations that utilize self-hosted runners. Default configurations are often vulnerable, and Gato uses a mix of workflow file analysis and run-log analysis to identify potentially vulnerable repositories at scale.
 
-Gato version 1.5 was released on June 27th, 2023!
+## Version 1.6
 
-#### New Features
+Gato version 1.6 improves the public repository enumeration feature set.
 
-* Secrets Enumeration
-* Secrets Exfiltration
-* API-only Enumeration
-* JSON Output
-* Improved Code Search
-* GitHub Enterprise Server Support
-* PAT Validation Only Mode
-* Quality of life and UX improvements
+Previously, Gato's code search functionality by default only looked for
+yaml files that explicitly had "self-hosted" in the name. Now, the
+code search functionality supports a SourceGraph query. This query has a 
+lower false negative rate and is not limited by GitHub's code search limit.
+
+For example, the following query will identify public repositories that use 
+self-hosted runners:
+
+`gato search --sourcegraph --output-text public_repos.txt`
+
+This can be fed back into Gato's enumeration feature:
+
+`gato enumerate --repositories public_repos.txt --output-json enumeration_results.json`
+
+Additionally the release contains several improvements under the hood to speed up the enumeration process. This includes changes to limit redundant run-log downloads (which are the slowest part of Gato's enumeration process) and using the GraphQL API to download workflow files when enumerating an entire organization. Finally, Gato will use a heuristic to detect if an attached runner is non-ephemeral. Most poisoned pipeline execution attacks require a non-ephemeral runner in order to exploit.
+
+### New Features
+
+* SourceGraph Search Functionality
+* Improved Public Repository Enumeration Speed
+* Improved Workflow File Analysis
+* Non-ephemeral self-hosted runner detection
 
 ## Who is it for?
 
@@ -44,6 +59,7 @@ Gato version 1.5 was released on June 27th, 2023!
 
 * GitHub Classic PAT Privilege Enumeration
 * GitHub Code Search API-based enumeration
+* SourceGraph Search enumeration
 * GitHub Action Run Log Parsing to identify Self-Hosted Runners
 * Bulk Repo Sparse Clone Features
 * GitHub Action Workflow Parsing

diff --git a/gato/attack/attack.py b/gato/attack/attack.py
@@ -181,28 +181,23 @@ def __execute_and_wait_workflow(
         """
 
         workflow_id = None
-        branch_created = self.api.create_branch(target_repo, branch)
-
-        if not branch_created:
-            Output.error("Failed to create branch!")
-            return False
 
         if self.author_email and self.author_name:
-            rev_hash = self.api.commit_file(
+            rev_hash = self.api.commit_workflow(
                 target_repo,
                 branch,
-                f".github/workflows/{yaml_name}.yml",
                 yaml_contents.encode(),
+                f"{yaml_name}.yml",
                 commit_author=self.author_name,
                 commit_email=self.author_email,
                 message=commit_message
             )
         else:
-            rev_hash = self.api.commit_file(
+            rev_hash = self.api.commit_workflow(
                 target_repo,
                 branch,
-                f".github/workflows/{yaml_name}.yml",
                 yaml_contents.encode(),
+                f"{yaml_name}.yml",
                 message=commit_message
             )
 

diff --git a/gato/cli/cli.py b/gato/cli/cli.py
@@ -276,19 +276,35 @@ def search(args, parser):
         http_proxy=args.http_proxy,
         github_url=args.api_url
     )
+    if args.sourcegraph:
+        if args.query and args.target:
+            parser.error(
+                f"{Fore.RED}[-]{Style.RESET_ALL} You cannot select an organization "
+                "with a custom query!"
+            )
 
-    if not (args.query or args.target):
-        parser.error(
-            f"{Fore.RED}[-]{Style.RESET_ALL} You must select an organization "
-            "or pass a custom query!."
-        )
-
-    if args.query:
-        gh_search_runner.use_search_api(
-            organization=args.target, query=args.query
+        results = gh_search_runner.use_sourcegraph_api(
+            organization=args.target,
+            query=args.query
         )
     else:
-        gh_search_runner.use_search_api(organization=args.target)
+        if not (args.query or args.target):
+            parser.error(
+                f"{Fore.RED}[-]{Style.RESET_ALL} You must select an organization "
+                "or pass a custom query!."
+            )
+        if args.query:
+            results = gh_search_runner.use_search_api(
+                organization=args.target,
+                query=args.query
+            )
+        else:
+            results = gh_search_runner.use_search_api(
+                organization=args.target
+            )
+
+    if results:
+        gh_search_runner.present_results(results, args.output_text)
 
 
 def configure_parser_general(parser):
@@ -563,3 +579,19 @@ def configure_parser_search(parser):
         metavar="QUERY",
         required=False
     )
+
+    parser.add_argument(
+        "--sourcegraph", "-sg",
+        help="Use Sourcegraph API to search for self-hosted runners.",
+        required=False,
+        action="store_true"
+    )
+
+    parser.add_argument(
+        "--output-text", "-oT",
+        help=(
+            "Save enumeration output to text file."
+        ),
+        metavar="TEXT_FILE",
+        type=StringType(256)
+    )
diff --git a/gato/enumerate/enumerate.py b/gato/enumerate/enumerate.py
@@ -1,6 +1,7 @@
 import logging
 
 from gato.github import Api
+from gato.github import GqlQueries
 from gato.models import Repository, Organization
 from gato.cli import Output
 from gato.enumerate.repository import RepositoryEnum
@@ -173,12 +174,22 @@ def enumerate_organization(self, org: str):
             f"the {organization.name} organization!"
         )
 
+        Output.info(f"Querying and caching workflow YAML files!")
+        wf_queries = GqlQueries.get_workflow_ymls(enum_list)
+
+        for wf_query in wf_queries:
+            result = self.org_e.api.call_post('/graphql', wf_query)
+            # Sometimes we don't get a 200, fall back in this case.
+            if result.status_code == 200:
+                self.repo_e.construct_workflow_cache(result.json()['data']['nodes'])
+            else:
+                Output.warn("GraphQL query failed, will revert to REST workflow query for impacted repositories!")
         for repo in enum_list:
-
             Output.tabbed(
                 f"Enumerating: {Output.bright(repo.name)}!"
             )
-            self.repo_e.enumerate_repository(repo)
+
+            self.repo_e.enumerate_repository(repo, large_org_enum=len(enum_list) > 100)
             self.repo_e.enumerate_repository_secrets(repo)
 
             Recommender.print_repo_secrets(

diff --git a/gato/enumerate/recommender.py b/gato/enumerate/recommender.py
@@ -140,7 +140,7 @@ def print_repo_runner_info(repository: Repository):
             Output.result(
                 f"The repository contains a workflow: "
                 f"{Output.bright(repository.sh_workflow_names[0])} that "
-                "executes on self-hosted runners!"
+                "might execute on self-hosted runners!"
             )
 
         if repository.accessible_runners:
@@ -157,6 +157,11 @@ def print_repo_runner_info(repository: Repository):
                 f"{Output.bright(repository.accessible_runners[0].machine_name)}"
             )
 
+            for runner in repository.accessible_runners:
+                if runner.non_ephemeral:
+                    Output.owned("The repository contains a non-ephemeral self-hosted runner!")
+                    break
+
         if repository.runners:
             Output.result(
                 f"The repository has {len(repository.runners)} repo-level"

diff --git a/gato/enumerate/repository.py b/gato/enumerate/repository.py
@@ -21,6 +21,7 @@ def __init__(self, api: Api, skip_log: bool, output_yaml):
             api (Api): GitHub API wraper object.
         """
         self.api = api
+        self.workflow_cache = {}
         self.skip_log = skip_log
         self.output_yaml = output_yaml
 
@@ -40,11 +41,12 @@ def __perform_runlog_enumeration(self, repository: Repository):
         )
 
         if wf_runs:
-            runner = Runner(
-                wf_runs[0]['runner_name'], wf_runs[0]['machine_name']
-            )
+            for wf_run in wf_runs:
+                runner = Runner(
+                    wf_run['runner_name'], wf_run['machine_name'], non_ephemeral=wf_run['non_ephemeral']
+                )
 
-            repository.add_accessible_runner(runner)
+                repository.add_accessible_runner(runner)
             runner_detected = True
 
         return runner_detected
@@ -60,12 +62,15 @@ def __perform_yml_enumeration(self, repository: Repository):
             list: List of workflows that execute on sh runner, empty otherwise.
         """
         runner_wfs = []
-        ymls = self.api.retrieve_workflow_ymls(repository.name)
+
+        if repository.name in self.workflow_cache:
+            ymls = self.workflow_cache[repository.name]
+        else:
+            ymls = self.api.retrieve_workflow_ymls(repository.name)
 
         for (wf, yml) in ymls:
             try:
                 parsed_yml = WorkflowParser(yml, repository.name, wf)
-
                 self_hosted_jobs = parsed_yml.self_hosted()
 
                 if self_hosted_jobs:
@@ -79,12 +84,13 @@ def __perform_yml_enumeration(self, repository: Repository):
             # At this point we only know the extension, so handle and
             #  ignore malformed yml files.
             except Exception as parse_error:
-                print(parse_error)
+
+                print(f"{wf}: {str(parse_error)}")
                 logger.warning("Attmpted to parse invalid yaml!")
 
         return runner_wfs
 
-    def enumerate_repository(self, repository: Repository):
+    def enumerate_repository(self, repository: Repository, large_org_enum=False):
         """Enumerate a repository, and check everything relevant to
         self-hosted runner abuse that that the user has permissions to check.
 
@@ -119,15 +125,25 @@ def enumerate_repository(self, repository: Repository):
 
                 repository.set_runners(repo_runners)
 
-        if not self.skip_log and self.__perform_runlog_enumeration(repository):
-            runner_detected = True
-
         workflows = self.__perform_yml_enumeration(repository)
 
         if len(workflows) > 0:
             repository.add_self_hosted_workflows(workflows)
             runner_detected = True
 
+        if not self.skip_log:
+            # If we are enumerating an organization, only enumerate runlogs if
+            # the workflow suggests a sh_runner.
+            if large_org_enum and runner_detected:
+                self.__perform_runlog_enumeration(repository)
+
+            # If we are doing internal enum, get the logs, because coverage is
+            # more important here and it's ok if it takes time.
+            elif not repository.is_public() and self.__perform_runlog_enumeration(repository):
+                runner_detected = True
+            else:
+                runner_detected = self.__perform_runlog_enumeration(repository)
+
         if runner_detected:
             # Only display permissions (beyond having none) if runner is
             # detected.
@@ -158,3 +174,28 @@ def enumerate_repository_secrets(
 
             if org_secrets:
                 repository.set_accessible_org_secrets(org_secrets)
+
+    def construct_workflow_cache(self, yml_results):
+        """Creates a cache of workflow yml files retrieved from graphQL. Since
+        graphql and REST do not have parity, we still need to use rest for most
+        enumeration calls. This method saves off all yml files, so during org
+        level enumeration if we perform yml enumeration the cached file is used
+        instead of making github REST requests. 
+
+        Args:
+            yml_results (list): List of results from individual GraphQL queries
+            (100 nodes at a time).
+        """
+        for result in yml_results:
+            owner = result['nameWithOwner']
+
+            self.workflow_cache[owner] = list()
+
+            if not result['object']:
+                continue
+
+            for yml_node in result['object']['entries']:
+                yml_name = yml_node['name']
+                if yml_name.lower().endswith('yml') or yml_name.lower().endswith('yaml'):
+                    contents = yml_node['object']['text']
+                    self.workflow_cache[owner].append((yml_name, contents))
diff --git a/gato/github/__init__.py b/gato/github/__init__.py
@@ -1,2 +1,3 @@
 from .api import Api
+from .gql_queries import GqlQueries
 from .search import Search