Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CDAP-21091: Increasing timeout for task workers to recover when app fabric restarts #15855

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

anshumanks
Copy link

@anshumanks anshumanks commented Feb 5, 2025

Increasing timeout for task workers to recover when app fabric restarts

Jira: CDAP-21091

Description

This change lets the task workers restart gracefully when app fabric pod is restarted.

Code change

  • Modified ComputeEngineCredentials.java

@anshumanks anshumanks added the build Triggers github actions build label Feb 5, 2025
Comment on lines +62 to +64
private static final int NUMBER_OF_RETRIES = 20;
private static final int MIN_WAIT_TIME_MILLISECOND = 2000;
private static final int MAX_WAIT_TIME_MILLISECOND = 60000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we please move these to cconf?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move only the NUMBER_OF_RETRIES to cconf.
That's the value we would need to modify mainly to configure the retry behavior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep min / max wait time the same and make NUMBER_OF_RETRIES configurable in cconf.

If later we figure out that the number of retries are insufficient, we won't have to make data plane changes for it.

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add JIRA title in PR description

Copy link
Member

@itsankit-google itsankit-google left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description should be of the form: data-integrations/google-cloud#1473 (comment)

Copy link

sonarqubecloud bot commented Feb 5, 2025

@anshumanks anshumanks changed the title Increasing timeout for task workers to recover when app fabric restarts CDAP-21091: Increasing timeout for task workers to recover when app fabric restarts Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Triggers github actions build
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants