Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachprod: add ip address expander func #140150

Merged
merged 3 commits into from
Feb 6, 2025

Conversation

DarrylWong
Copy link
Contributor

@DarrylWong DarrylWong commented Jan 30, 2025

This change adds ip to the list of supported roachprod expander funcs. e.g. {ip:1-3} now expands to the private ip addresses of nodes 1 to 3. :public can be appended, e.g. {ip:1-3:public} to specify the public ip addresses.

Additionally, this change also logs expanded commands for roachtests if they differ from the original command. This should ease debugging as we can know what command was actually run on the node.

Release note: none
Epic: none
Fixes: none


Additional context is that this PR is motivated by my playing around with iptables on a roachprod cluster. Found running roachprod ip repeatedly and copy pasting tedious.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@DarrylWong DarrylWong force-pushed the ip-expander branch 2 times, most recently from 5bcb804 to 647c00c Compare January 30, 2025 22:01
@DarrylWong
Copy link
Contributor Author

DarrylWong commented Jan 30, 2025

I also considered logging the expanded command unconditionally in runCmdOnSingleNode. This would affect roachprod CLI calls too. After testing it out a bit I found the expanded commands noisy - it's easy enough to just run echo {expanderFunc} on the nodes if you really want to know unlike a roachtest cluster that's deleted. If people think that would be better I have no issues going back to that.

@DarrylWong DarrylWong marked this pull request as ready for review January 31, 2025 18:10
@DarrylWong DarrylWong requested a review from a team as a code owner January 31, 2025 18:10
@DarrylWong DarrylWong requested review from herkolategan and golgeek and removed request for a team January 31, 2025 18:10
@@ -782,6 +782,7 @@ func (r *RunResultDetails) Output(decorate bool) string {
type RunCmdOptions struct {
combinedOut bool
includeRoachprodEnvVars bool
logExpandedCmd bool // log expanded result if it differs from the original command
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: comment syntax

// logExpandedCmd If true, logs the expanded result if it differs from the original command
logExpandedCmd bool

Copy link
Collaborator

@herkolategan herkolategan Feb 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on second thought, seems like godoc says comments like these are allowed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I sometimes like the inline comments so it keeps the indenting uniform. This comment was a bit lengthy so I changed it to your original suggestion anyway.

@@ -826,6 +827,9 @@ func (c *SyncedCluster) runCmdOnSingleNode(
if err != nil {
return &noResult, errors.WithDetailf(err, "error expanding command: %s", cmd)
}
if opts.logExpandedCmd && expandedCmd != cmd {
l.Printf("Node %d expanded cmd > %s", e.node, expandedCmd)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: "cmd: %s"

for _, ip := range ips {
c.Run(ctx, option.WithNodes(controller), "sh", "-c", fmt.Sprintf(`"ssh-keyscan -t rsa %s >> .ssh/known_hosts"`, ip))
for _, worker := range workers {
c.Run(ctx, option.WithNodes(controller), "sh", "-c", fmt.Sprintf(`"ssh-keyscan -t rsa {ip:%d} >> .ssh/known_hosts"`, worker))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! And the other areas you utilise this, much cleaner.

This change adds ip to the list of supported roachprod
expander funcs. e.g. {ip:1-3} now expands to the private
ip addresses of nodes 1 to 3. :public can be appended,
e.g. {ip:1-3:public} to specify the public ip addresses.
This change now logs expanded commands if they differ from
the original command. This should ease debugging as we can
know what command was actually run on the node.
@DarrylWong DarrylWong force-pushed the ip-expander branch 2 times, most recently from 0995b81 to 59f61f0 Compare February 4, 2025 20:03
return c
}

func TestIPExpander(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

@DarrylWong
Copy link
Contributor Author

TFTRs!

bors r=srosenberg, herkolategan

craig bot pushed a commit that referenced this pull request Feb 5, 2025
140150: roachprod: add ip address expander func r=srosenberg,herkolategan a=DarrylWong

This change adds ip to the list of supported roachprod expander funcs. e.g. `{ip:1-3}` now expands to the private ip addresses of nodes 1 to 3. `:public` can be appended, e.g. `{ip:1-3:public}` to specify the public ip addresses.

Additionally, this change also logs expanded commands for roachtests if they differ from the original command. This should ease debugging as we can know what command was actually run on the node.

Release note: none
Epic: none
Fixes: none

----

Additional context is that this PR is motivated by my playing around with iptables on a roachprod cluster. Found running `roachprod ip` repeatedly and copy pasting tedious.

Co-authored-by: DarrylWong <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 5, 2025

Build failed:

@rickystewart
Copy link
Collaborator

bors r=srosenberg, herkolategan

craig bot pushed a commit that referenced this pull request Feb 5, 2025
140021: roachtest: add test for data migration between clusters r=tbg a=shailendra-patel

Adds a roachtest to verify cluster behavior when data stores are relocated to different nodes on a new cluster. The test:

- Starts a source cluster and runs kv workload
- Stops the cluster and copies data directory to destination nodes
- Restarts cluster on new nodes with copied volumes
- Validates data consistency between source and destination clusters

This test improves coverage for scenarios where operators need to physically relocate storage volumes between nodes.

Epic: none
Fixes: #136622

Release note: None

140150: roachprod: add ip address expander func r=srosenberg,herkolategan a=DarrylWong

This change adds ip to the list of supported roachprod expander funcs. e.g. `{ip:1-3}` now expands to the private ip addresses of nodes 1 to 3. `:public` can be appended, e.g. `{ip:1-3:public}` to specify the public ip addresses.

Additionally, this change also logs expanded commands for roachtests if they differ from the original command. This should ease debugging as we can know what command was actually run on the node.

Release note: none
Epic: none
Fixes: none

----

Additional context is that this PR is motivated by my playing around with iptables on a roachprod cluster. Found running `roachprod ip` repeatedly and copy pasting tedious.

140461: replica_rac2: add commentary about ready and tick processing for live… r=pav-kv,miraradeva a=sumeerbhola

…ness

This can be used to guide optimizations without breaking RACv2 liveness.

Epic: none

Release note: None

140465: changefeedccl: add more obs to RunNemesis r=asg0451 a=wenyihu6

Previously, it is hard to track which sql smith generated queries take a long time 
to execute. This patch adds more observability by adding more print stmts.

Informs: #140355
Release note: none

140479: admission: fix shutdown logic for store grant coordinator r=RaduBerinde a=aadityasondhi

We were incorrectly waiting on the ticker outside of the select statement, which would inherently cause a delay in case of the stopping sequence being initiated.

Informs #140454.

Release note: None

140490: roachtest: increase the verbosity around tests that kills nodes with traffic r=rickystewart a=iskettaneh

The PR increases the verbosity around some roachtests that kills nodes while sending traffic. The roachtests affected by this PR:

1. restart/down-for-2m
2. tpcc/multiregion/survive=region/chaos=true
3. tpcc/multiregion/survive=zone/chaos=true
4. tpcc/w=100/nodes=3/chaos=true

References: #138028

Release note: None

140522: catalog/ingesting: disallow writing cross database references r=fqazi a=fqazi

Previously, the ingest API supported writing cross database references, which could allow tables to be ingested in a manner that is deprecated. To avoid issues, this patch adds an error when cross database references of any kind are detected.

Fixes: #138230

Release note: None

Note: This functionality is only blocked for LDR, since backup / restore and imports have to work with cross database references until we fully remove them.

Co-authored-by: Shailendra Patel <[email protected]>
Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Wenyi Hu <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Ibrahim Kettaneh <[email protected]>
Co-authored-by: Faizan Qazi <[email protected]>
@rickystewart
Copy link
Collaborator

bors r=srosenberg, herkolategan

craig bot pushed a commit that referenced this pull request Feb 5, 2025
…140539

140021: roachtest: add test for data migration between clusters r=tbg a=shailendra-patel

Adds a roachtest to verify cluster behavior when data stores are relocated to different nodes on a new cluster. The test:

- Starts a source cluster and runs kv workload
- Stops the cluster and copies data directory to destination nodes
- Restarts cluster on new nodes with copied volumes
- Validates data consistency between source and destination clusters

This test improves coverage for scenarios where operators need to physically relocate storage volumes between nodes.

Epic: none
Fixes: #136622

Release note: None

140150: roachprod: add ip address expander func r=srosenberg,herkolategan a=DarrylWong

This change adds ip to the list of supported roachprod expander funcs. e.g. `{ip:1-3}` now expands to the private ip addresses of nodes 1 to 3. `:public` can be appended, e.g. `{ip:1-3:public}` to specify the public ip addresses.

Additionally, this change also logs expanded commands for roachtests if they differ from the original command. This should ease debugging as we can know what command was actually run on the node.

Release note: none
Epic: none
Fixes: none

----

Additional context is that this PR is motivated by my playing around with iptables on a roachprod cluster. Found running `roachprod ip` repeatedly and copy pasting tedious.

140461: replica_rac2: add commentary about ready and tick processing for live… r=pav-kv,miraradeva a=sumeerbhola

…ness

This can be used to guide optimizations without breaking RACv2 liveness.

Epic: none

Release note: None

140465: changefeedccl: add more obs to RunNemesis r=asg0451 a=wenyihu6

Previously, it is hard to track which sql smith generated queries take a long time 
to execute. This patch adds more observability by adding more print stmts.

Informs: #140355
Release note: none

140479: admission: fix shutdown logic for store grant coordinator r=RaduBerinde a=aadityasondhi

We were incorrectly waiting on the ticker outside of the select statement, which would inherently cause a delay in case of the stopping sequence being initiated.

Informs #140454.

Release note: None

140489: roachpb: add pretty-printing for three remaining value tags r=rickystewart a=yuzefovich

TIMETZ, GEO, and BOX2D value tags were missing support for pretty-printing.

Epic: None
Release note: None

140492: rowenc/valueside: fix recently added pretty-printing of arrays values r=yuzefovich a=yuzefovich

We incorrectly returned VarBitArray as the element type for BitArray encoding (which - confusingly - corresponds to BIT and VARBIT types).

Fixes: #140487.
Release note: None

140538: replicationtestutils: disable autocommit behavior in test setup r=rafiss a=rafiss

We've seen tests that use this setup helper flake under race. The tests are pretty expensive as they set up two test clusters, and the recent change to make autocommit_before_ddl default to true seems to have made the tests more likely to flake due to overload.

This patch addresses the flake by avoiding autocommit during test setup.

fixes #140216
fixes #140215
fixes #140214
fixes #140213
fixes #140212
fixes #140211
fixes #140210
fixes #140208
fixes #140207
fixes #140206
fixes #140169


Release note: None

140539: changefeedccl: fix avro schema bug r=wenyihu6 a=asg0451

Fix bug in avro schema generation introduced by #139655, which
caused extraneous fields to appear in the schema.

Epic: none
Release note: None


Co-authored-by: Shailendra Patel <[email protected]>
Co-authored-by: DarrylWong <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Wenyi Hu <[email protected]>
Co-authored-by: Aaditya Sondhi <[email protected]>
Co-authored-by: Yahor Yuzefovich <[email protected]>
Co-authored-by: Rafi Shamim <[email protected]>
Co-authored-by: Miles Frankel <[email protected]>
@craig
Copy link
Contributor

craig bot commented Feb 5, 2025

Build failed (retrying...):

@rickystewart
Copy link
Collaborator

bors r-

@craig
Copy link
Contributor

craig bot commented Feb 5, 2025

Canceled.

@rickystewart
Copy link
Collaborator

Lints are failing, please check your CI results before attempting to bors

@DarrylWong
Copy link
Contributor Author

bors r=srosenberg, herkolategan

@craig craig bot merged commit 3c19abb into cockroachdb:master Feb 6, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants