Skip to content

[Deployment revisited][Staging][PoC] Fix create config #4724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
6294b84
Fix open browser
vitorguidi Mar 12, 2025
05292e4
Launch butler with python3
vitorguidi Mar 13, 2025
0b4a092
Remove redis from deployment manager administration
vitorguidi Mar 14, 2025
34609c3
Deploy terraform/k8s before appengine to bootstrap redis
vitorguidi Mar 14, 2025
2bb4be8
Add K8S_PROJECT to env, so deploy_k8s can work
vitorguidi Mar 14, 2025
5311c56
Replace python for python3 in appengine template creation
vitorguidi Mar 14, 2025
0450d39
Add WindowRateLimitTask to indexes in appengine
vitorguidi Mar 18, 2025
f421522
Fix extra space
vitorguidi Mar 18, 2025
e5d1bae
Enable secret manager api in create_config
vitorguidi Mar 18, 2025
98d43c1
Add boilerplate for blob migration
vitorguidi Mar 24, 2025
54cc8df
Implement blob migration script
vitorguidi Mar 27, 2025
291b105
Add exponential backoff to avoid a fatal exception when one cp comman…
vitorguidi Mar 27, 2025
214327d
Trying to figure out why appengine does not reach redis
vitorguidi Mar 27, 2025
68a7d52
Check if redis is pointing to localhost wrongly
vitorguidi Mar 27, 2025
a318946
Revert "Check if redis is pointing to localhost wrongly"
vitorguidi Mar 28, 2025
438433c
Revert "Trying to figure out why appengine does not reach redis"
vitorguidi Mar 28, 2025
99d5954
Use gcloud storage rsync --recursive instead of list_blob + copy_blob
vitorguidi Mar 28, 2025
d970379
Treat errors from external process
vitorguidi Mar 28, 2025
a4e9129
Ignore __common* databundles, which are deprecated
vitorguidi Mar 31, 2025
8b55e6e
Run assertions on redis hist
vitorguidi Mar 31, 2025
d17b13e
Bump redis
vitorguidi Mar 31, 2025
b7b8c32
Bump redis again
vitorguidi Mar 31, 2025
1a4b80e
Fix python invocation for requirements.txt
vitorguidi Mar 31, 2025
91543c5
Improve log for data bundle sync failure and enable local task debugging
vitorguidi Apr 1, 2025
e4c6ed3
Add preprocess, utask_main and postprocess queues to configs/test
vitorguidi Apr 2, 2025
48f35fa
Migrate blobs by avoiding legacy blob keys in the target project
vitorguidi Apr 3, 2025
0bed848
Log result outcome of corpus rsync in setup.py
vitorguidi Apr 3, 2025
f50cad5
Change reference to the migrated DataBundle bucket in migrate_blobs.py
vitorguidi Apr 3, 2025
6863366
[Logging] Task structured log (#4722)
javanlacerda Mar 21, 2025
71853d7
[fix] move default labels to json formatter (#4740)
javanlacerda Mar 25, 2025
9cd0260
[Logging] Adding progression structured log (#4736)
javanlacerda Mar 27, 2025
53d38e1
[Logging] Adding task argument and job_name to struc log, remove old …
javanlacerda Mar 27, 2025
533e21e
[Fix] Fix metadata to retrieve fuzz target in testcase-based logging …
ViniciustCosta Apr 1, 2025
38e4383
[Logging] check and log exceptions in log context (#4754)
javanlacerda Apr 5, 2025
b1cf0b8
[Logging] Add regression to structured logs (#4756)
ViniciustCosta Apr 7, 2025
ba4c689
[Fix] Add fuzz target as argument for logs context to avoid access de…
ViniciustCosta Apr 8, 2025
1bacd12
Consider clusterfuzz-staging a chrome project so crrev behaves correctly
vitorguidi Apr 9, 2025
58a63e2
Add logging to troubleshoot crrev failures on some chrome jobs
vitorguidi Apr 9, 2025
0ecefef
Fix bad syntax
vitorguidi Apr 9, 2025
66a46a0
[Staging Poc] Mute the issue tracker on staging environments (#4778)
vitorguidi Apr 30, 2025
833a599
Revert "[Staging Poc] Mute the issue tracker on staging environments …
vitorguidi Apr 30, 2025
ae00db2
[Staging Poc] Mute the issue tracker on staging environments (#4779)
vitorguidi May 2, 2025
8481fbd
Remove redis host and port assertions
vitorguidi May 5, 2025
8d56c96
Log exception when issue filing fails
vitorguidi May 5, 2025
1887278
Assert emptyness on LabelStore instead of List for ccs and collaborators
vitorguidi May 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions configs/test/project.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,9 @@ env:
# Application ID for the Google Cloud Project. In production, this will have a s~ prefix.
APPLICATION_ID: test-clusterfuzz

# GCP project where the GKE cluster running cronjobs will be.
K8S_PROJECT: test-clusterfuzz

# Default project name unless overridden in a job definition.
PROJECT_NAME: test-project

Expand Down
6 changes: 6 additions & 0 deletions configs/test/pubsub/queues.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -66,3 +66,9 @@ resources:
type: queue.jinja
- name: ml-jobs-linux
type: queue.jinja
- name: utask_main
type: queue.jinja
- name: preprocess
type: queue.jinja
- name: postprocess
type: queue.jinja
2 changes: 1 addition & 1 deletion src/Pipfile
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ pyOpenSSL = "==22.0.0"
python-dateutil = "==2.8.1"
PyYAML = "==6.0"
pytz = "==2023.3"
redis = "==3.3.11"
redis = "==4.6.0"
requests = "==2.21.0"
sendgrid = "==6.0.4"
wrapt = "==1.16.0"
Expand Down
932 changes: 521 additions & 411 deletions src/Pipfile.lock

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions src/appengine/index.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -588,3 +588,10 @@ indexes:
properties:
- name: keywords
- name: bot_name

- kind: WindowRateLimitTask
properties:
- name: job_name
- name: task_argument
- name: task_name
- name: timestamp
2 changes: 1 addition & 1 deletion src/clusterfuzz/_internal/base/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -998,7 +998,7 @@ def is_oss_fuzz():

def is_chromium():
"""If this is an instance of chromium fuzzing."""
return default_project_name() in ('chromium', 'chromium-testing')
return default_project_name() in ('chromium', 'chromium-testing', 'clusterfuzz-staging')


def file_hash(file_path):
Expand Down
20 changes: 13 additions & 7 deletions src/clusterfuzz/_internal/bot/tasks/commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,10 +277,6 @@ def process_command(task):
task.high_end, task.is_command_override)


def _get_task_id(task_name, task_argument, job_name):
return f'{task_name},{task_argument},{job_name},{uuid.uuid4()}'


# pylint: disable=too-many-nested-blocks
# TODO(mbarbella): Rewrite this function to avoid nesting issues.
@set_task_payload
Expand All @@ -296,8 +292,11 @@ def process_command_impl(task_name, task_argument, job_name, high_end,
# "postprocess".
task_id = None
else:
task_id = _get_task_id(task_name, task_argument, job_name)
task_id = uuid.uuid4()
environment.set_value('CF_TASK_ID', task_id)
environment.set_value('CF_TASK_NAME', task_name)
environment.set_value('CF_TASK_ARGUMENT', task_argument)
environment.set_value('CF_TASK_JOB_NAME', job_name)
if job_name != 'none':
job = data_types.Job.query(data_types.Job.name == job_name).get()
# Job might be removed. In that case, we don't want an exception
Expand Down Expand Up @@ -451,6 +450,9 @@ def process_command_impl(task_name, task_argument, job_name, high_end,
uworker_env['TASK_ARGUMENT'] = task_argument
uworker_env['JOB_NAME'] = job_name
uworker_env['CF_TASK_ID'] = task_id
uworker_env['CF_TASK_NAME'] = task_name
uworker_env['CF_TASK_ARGUMENT'] = task_argument
uworker_env['CF_TASK_JOB_NAME'] = job_name

# Match the cpu architecture with the ones required in the job definition.
# If they don't match, then bail out and recreate task.
Expand All @@ -474,5 +476,9 @@ def process_command_impl(task_name, task_argument, job_name, high_end,
finally:
# Final clean up.
cleanup_task_state()
if 'CF_TASK_ID' in os.environ:
del os.environ['CF_TASK_ID']
tear_down_envs = [
'CF_TASK_ID', 'CF_TASK_NAME', 'CF_TASK_ARGUMENT', 'CF_TASK_JOB_NAME'
]
for env_key in tear_down_envs:
if env_key in os.environ:
del os.environ[env_key]
2 changes: 2 additions & 0 deletions src/clusterfuzz/_internal/bot/tasks/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -495,6 +495,7 @@ def update_data_bundle(
logs.info('Data bundles: normal path.')
result = corpus_manager.sync_data_bundle_corpus_to_disk(
data_bundle_corpus, data_bundle_directory)
logs.info(f'Result = {result}')
else:
logs.info('Data bundles: untrusted runner path.')
from clusterfuzz._internal.bot.untrusted_runner import \
Expand All @@ -509,6 +510,7 @@ def update_data_bundle(
data_bundle_corpus.gcs_url,
worker_data_bundle_directory,
delete=False)
logs.info(f'Result = {result}')
result = result.return_code == 0

if not result:
Expand Down
3 changes: 3 additions & 0 deletions src/clusterfuzz/_internal/bot/tasks/task_types.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,7 @@ def execute(self, task_argument, job_type, uworker_env):
assert swarming.is_swarming_task(command, job_type)
swarming.push_swarming_task(command, download_url, job_type)

@logs.task_stage_context(logs.Stage.PREPROCESS)
def preprocess(self, task_argument, job_type, uworker_env):
result = utasks.tworker_preprocess(self.module, task_argument, job_type,
uworker_env)
Expand All @@ -170,6 +171,7 @@ def __init__(self, module):
# many different tasks.
super().__init__('none')

@logs.task_stage_context(logs.Stage.POSTPROCESS)
def execute(self, task_argument, job_type, uworker_env):
"""Executes postprocessing of a utask."""
# These values are None for now.
Expand All @@ -190,6 +192,7 @@ def __init__(self, module):
del module
super().__init__('none')

@logs.task_stage_context(logs.Stage.MAIN)
def execute(self, task_argument, job_type, uworker_env):
"""Executes uworker_main of a utask."""
# These values are None for now.
Expand Down
5 changes: 5 additions & 0 deletions src/clusterfuzz/_internal/bot/tasks/utasks/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ def _start_web_server_if_needed(job_type):
logs.error('Failed to start web server, skipping.')


@logs.task_stage_context(logs.Stage.PREPROCESS)
def tworker_preprocess_no_io(utask_module, task_argument, job_type,
uworker_env):
"""Executes the preprocessing step of the utask |utask_module| and returns the
Expand All @@ -253,6 +254,7 @@ def tworker_preprocess_no_io(utask_module, task_argument, job_type,
return uworker_io.serialize_uworker_input(uworker_input)


@logs.task_stage_context(logs.Stage.MAIN)
def uworker_main_no_io(utask_module, serialized_uworker_input):
"""Executes the main part of a utask on the uworker (locally if not using
remote executor)."""
Expand Down Expand Up @@ -283,6 +285,7 @@ def uworker_main_no_io(utask_module, serialized_uworker_input):

# TODO(metzman): Stop passing module to this function and `uworker_main_no_io`.
# Make them consistent with the I/O versions.
@logs.task_stage_context(logs.Stage.POSTPROCESS)
def tworker_postprocess_no_io(utask_module, uworker_output, uworker_input):
"""Executes the postprocess step on the trusted (t)worker (in this case it is
the same bot as the uworker)."""
Expand Down Expand Up @@ -330,6 +333,7 @@ def set_uworker_env(uworker_env: dict) -> None:
environment.set_value(key, value)


@logs.task_stage_context(logs.Stage.MAIN)
def uworker_main(input_download_url) -> None:
"""Executes the main part of a utask on the uworker (locally if not using
remote executor)."""
Expand Down Expand Up @@ -380,6 +384,7 @@ def uworker_bot_main():
return 0


@logs.task_stage_context(logs.Stage.POSTPROCESS)
def tworker_postprocess(output_download_url) -> None:
"""Executes the postprocess step on the trusted (t)worker."""
logs.info('Starting postprocess untrusted worker.')
Expand Down
153 changes: 78 additions & 75 deletions src/clusterfuzz/_internal/bot/tasks/utasks/progression_task.py
Original file line number Diff line number Diff line change
Expand Up @@ -436,38 +436,38 @@ def _set_regression_testcase_upload_url(
def utask_preprocess(testcase_id, job_type, uworker_env):
"""Runs preprocessing for progression task."""
testcase = data_handler.get_testcase_by_id(testcase_id)
if not testcase:
return None

if testcase.fixed:
logs.error(f'Fixed range is already set as {testcase.fixed}, skip.')
return None

# Set a flag to indicate we are running progression task. This shows pending
# status on testcase report page and avoid conflicting testcase updates by
# triage cron.
testcase.set_metadata('progression_pending', True)
data_handler.update_testcase_comment(testcase, data_types.TaskState.STARTED)
blob_name, blob_upload_url = blobs.get_blob_signed_upload_url()
progression_input = uworker_msg_pb2.ProgressionTaskInput( # pylint: disable=no-member
custom_binary=build_manager.is_custom_binary(),
bad_revisions=build_manager.get_job_bad_revisions(),
blob_name=blob_name,
stacktrace_upload_url=blob_upload_url)
# Setup testcase and its dependencies.
setup_input = setup.preprocess_setup_testcase(testcase, uworker_env)

_set_regression_testcase_upload_url(progression_input, testcase)
uworker_input = uworker_msg_pb2.Input( # pylint: disable=no-member
job_type=job_type,
testcase_id=str(testcase_id),
uworker_env=uworker_env,
progression_task_input=progression_input,
testcase=uworker_io.entity_to_protobuf(testcase),
setup_input=setup_input)

testcase_manager.preprocess_testcase_manager(testcase, uworker_input)
return uworker_input
with logs.progression_log_context(testcase, testcase.get_fuzz_target()):
if not testcase:
return None
if testcase.fixed:
logs.error(f'Fixed range is already set as {testcase.fixed}, skip.')
return None

# Set a flag to indicate we are running progression task. This shows pending
# status on testcase report page and avoid conflicting testcase updates by
# triage cron.
testcase.set_metadata('progression_pending', True)
data_handler.update_testcase_comment(testcase, data_types.TaskState.STARTED)
blob_name, blob_upload_url = blobs.get_blob_signed_upload_url()
progression_input = uworker_msg_pb2.ProgressionTaskInput( # pylint: disable=no-member
custom_binary=build_manager.is_custom_binary(),
bad_revisions=build_manager.get_job_bad_revisions(),
blob_name=blob_name,
stacktrace_upload_url=blob_upload_url)
# Setup testcase and its dependencies.
setup_input = setup.preprocess_setup_testcase(testcase, uworker_env)

_set_regression_testcase_upload_url(progression_input, testcase)
uworker_input = uworker_msg_pb2.Input( # pylint: disable=no-member
job_type=job_type,
testcase_id=str(testcase_id),
uworker_env=uworker_env,
progression_task_input=progression_input,
testcase=uworker_io.entity_to_protobuf(testcase),
setup_input=setup_input)

testcase_manager.preprocess_testcase_manager(testcase, uworker_input)
return uworker_input


def find_fixed_range(uworker_input):
Expand Down Expand Up @@ -684,8 +684,10 @@ def utask_main(uworker_input):
"""Executes the untrusted part of progression_task."""
testcase = uworker_io.entity_from_protobuf(uworker_input.testcase,
data_types.Testcase)
uworker_io.check_handling_testcase_safe(testcase)
return find_fixed_range(uworker_input)
with logs.progression_log_context(
testcase, testcase_manager.get_fuzz_target_from_input(uworker_input)):
uworker_io.check_handling_testcase_safe(testcase)
return find_fixed_range(uworker_input)


_ERROR_HANDLER = uworker_handle_errors.CompositeErrorHandler({
Expand Down Expand Up @@ -713,53 +715,54 @@ def utask_postprocess(output: uworker_msg_pb2.Output): # pylint: disable=no-mem
"""Trusted: Cleans up after a uworker execute_task, writing anything needed to
the db."""
testcase = data_handler.get_testcase_by_id(output.uworker_input.testcase_id)
_maybe_clear_progression_last_min_max_metadata(testcase, output)
_cleanup_stacktrace_blob_from_storage(output)
task_output = None
with logs.progression_log_context(testcase, testcase.get_fuzz_target()):
_maybe_clear_progression_last_min_max_metadata(testcase, output)
_cleanup_stacktrace_blob_from_storage(output)
task_output = None

if output.issue_metadata:
_update_issue_metadata(testcase, json.loads(output.issue_metadata))
if output.issue_metadata:
_update_issue_metadata(testcase, json.loads(output.issue_metadata))

if output.HasField('progression_task_output'):
task_output = output.progression_task_output
_update_build_metadata(output.uworker_input.job_type,
task_output.build_data_list)
if output.HasField('progression_task_output'):
task_output = output.progression_task_output
_update_build_metadata(output.uworker_input.job_type,
task_output.build_data_list)

if output.error_type != uworker_msg_pb2.ErrorType.NO_ERROR: # pylint: disable=no-member
_ERROR_HANDLER.handle(output)
return
if output.error_type != uworker_msg_pb2.ErrorType.NO_ERROR: # pylint: disable=no-member
_ERROR_HANDLER.handle(output)
return

# If there is a fine grained bisection service available, request it. Both
# regression and fixed ranges are requested once. Regression is also requested
# here as the bisection service may require details that are not yet available
# (e.g. issue ID) at the time regress_task completes.
bisection.request_bisection(testcase)
# If there is a fine grained bisection service available, request it.
# Both regression and fixed ranges are requested once. Regression is also
# requested here as the bisection service may require details that
# are not yet available (e.g. issue ID) at the time regress_task completes.
bisection.request_bisection(testcase)

if task_output and task_output.crash_on_latest:
crash_on_latest(output)
return
if task_output and task_output.crash_on_latest:
crash_on_latest(output)
return

if output.uworker_input.progression_task_input.custom_binary:
# Retry once on another bot to confirm our results and in case this bot is
# in a bad state which we didn't catch through our usual means.
if data_handler.is_first_attempt_for_task(
'progression', testcase, reset_after_retry=True):
tasks.add_task('progression', output.uworker_input.testcase_id,
output.uworker_input.job_type)
if output.uworker_input.progression_task_input.custom_binary:
# Retry once on another bot to confirm our results and in case this bot is
# in a bad state which we didn't catch through our usual means.
if data_handler.is_first_attempt_for_task(
'progression', testcase, reset_after_retry=True):
tasks.add_task('progression', output.uworker_input.testcase_id,
output.uworker_input.job_type)
data_handler.update_progression_completion_metadata(
testcase, task_output.crash_revision)
return

# The bug is fixed.
testcase.fixed = 'Yes'
testcase.open = False
data_handler.update_progression_completion_metadata(
testcase, task_output.crash_revision)
testcase,
task_output.crash_revision,
message='fixed on latest custom build')
return

# The bug is fixed.
testcase.fixed = 'Yes'
testcase.open = False
data_handler.update_progression_completion_metadata(
testcase,
task_output.crash_revision,
message='fixed on latest custom build')
return

testcase = data_handler.get_testcase_by_id(output.uworker_input.testcase_id)
if task_output.HasField('min_revision'):
_save_fixed_range(output.uworker_input.testcase_id,
task_output.min_revision, task_output.max_revision)
testcase = data_handler.get_testcase_by_id(output.uworker_input.testcase_id)
if task_output.HasField('min_revision'):
_save_fixed_range(output.uworker_input.testcase_id,
task_output.min_revision, task_output.max_revision)
Loading