Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v3.3.1: Backup to GCS still requires fake s3 region #3994

Open
alice-viola opened this issue Aug 22, 2024 · 3 comments
Open

v3.3.1: Backup to GCS still requires fake s3 region #3994

alice-viola opened this issue Aug 22, 2024 · 3 comments

Comments

@alice-viola
Copy link

Using version 3.3.1, in order to backup to Google Cloud Storage, I had to fake an S3 bucket in the /etc/scylla-manager-agent/scylla-manager-agent.yaml file, specifying a fake region ('aaa'). Otherwise the check-location script will fail with this error:

{"L":"INFO","T":"2024-08-22T13:41:39.277Z","N":"rclone","M":"registered s3 provider [name=s3, provider=AWS, upload_concurrency=2, disable_checksum=true, memory_pool_use_mmap=true, chunk_size=50M, no_check_bucket=true, env_auth=true, memory_pool_flush_time=5m]"}
{"L":"INFO","T":"2024-08-22T13:41:39.277Z","N":"rclone","M":"registered gcs provider [name=gcs, allow_create_bucket=false, memory_pool_use_mmap=true, chunk_size=50M, memory_pool_flush_time=5m, bucket_policy_only=true, service_account_file=/etc/scylla-manager-agent/gcs-service-account.json]"}
{"L":"INFO","T":"2024-08-22T13:41:39.277Z","N":"rclone","M":"registered azure provider [name=azure, disable_checksum=true, use_msi=true, memory_pool_use_mmap=true, chunk_size=50M, memory_pool_flush_time=5m]"}
{"L":"DEBUG","T":"2024-08-22T13:41:39.277Z","N":"rclone","M":"Creating backend with remote \"gcs:sp-scraper-scylla-backup\""}
{"L":"DEBUG","T":"2024-08-22T13:41:39.277Z","N":"rclone","M":"Creating backend with remote \"/tmp/scylla-manager-agent-3506209747\""}
{"L":"DEBUG","T":"2024-08-22T13:41:39.371Z","N":"rclone","M":"GCS bucket sp-scraper-scylla-backup: Waiting for checks to finish"}
{"L":"DEBUG","T":"2024-08-22T13:41:39.371Z","N":"rclone","M":"GCS bucket sp-scraper-scylla-backup: Waiting for transfers to finish"}
{"L":"DEBUG","T":"2024-08-22T13:41:39.473Z","N":"rclone","M":"scylla-manager-agent-3506209747/test: Copied (new)"}
{"L":"DEBUG","T":"2024-08-22T13:41:39.599Z","N":"rclone","M":"Creating backend with remote \"gcs:sp-scraper-scylla-backup/scylla-manager-agent-3506209747\""}
{"L":"DEBUG","T":"2024-08-22T13:41:39.688Z","N":"rclone","M":"Waiting for deletions to finish"}
{"L":"DEBUG","T":"2024-08-22T13:41:39.775Z","N":"rclone","M":"test: Deleted"}```
@mykaul
Copy link
Contributor

mykaul commented Aug 26, 2024

Do we need the s3 and Azure providers? Can you share, after anonymization, your json file?

@gdubicki
Copy link

We are seeing this too, also with 3.3.1.

We run the Manager in k8s using Scylla Operator with these params:

      -c /etc/scylla-manager-agent/scylla-manager-agent.yaml
      -c /mnt/scylla-agent-config/scylla-manager-agent.yaml
      -c /mnt/scylla-agent-config/auth-token.yaml

...where the 1st config is fully commented out, the 2nd contains only:

gcs:
  service_account_file: /etc/scylla-manager-agent/credentials/gcs-service-account.json

...and the 3rd contains only:

auth_token: <redacted>

Adding:

s3:
  region: aaa

...to the second config did in fact remove the error when the region was missing, which for us looked like this:

{"L":"ERROR","T":"2024-08-26T10:28:32.542Z","N":"rclone","M":"parse instance region: invalid character '<' looking for beginning of value","S":"github.com/scylladb/go-log.Logger.log\n\tgithub.com/scylladb/[email protected]/logger.go:101\ngithub.com/scylladb/go-log.Logger.Error\n\tgithub.com/scylladb/[email protected]/logger.go:84\nmain.setupCommand.RedirectLogPrint.func1\n\tgithub.com/scylladb/scylla-manager/v3/pkg/rclone/logger.go:19\ngithub.com/rclone/rclone/fs.LogPrintf\n\tgithub.com/rclone/[email protected]/fs/log.go:152\ngithub.com/rclone/rclone/fs.Errorf\n\tgithub.com/rclone/[email protected]/fs/log.go:167\ngithub.com/scylladb/scylla-manager/v3/pkg/rclone.awsRegionFromMetadataAPI\n\tgithub.com/scylladb/scylla-manager/v3/pkg/rclone/aws.go:55\ngithub.com/scylladb/scylla-manager/v3/pkg/rclone.(*S3Options).AutoFill\n\tgithub.com/scylladb/scylla-manager/v3/pkg/rclone/options.go:138\ngithub.com/scylladb/scylla-manager/v3/pkg/rclone.RegisterS3Provider\n\tgithub.com/scylladb/scylla-manager/v3/pkg/rclone/providers.go:52\nmain.setupCommand\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/agent/setup.go:35\nmain.init.func3\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/agent/check_location.go:40\ngithub.com/spf13/cobra.(*Command).execute\n\tgithub.com/spf13/[email protected]/command.go:983\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\tgithub.com/spf13/[email protected]/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\tgithub.com/spf13/[email protected]/command.go:1039\nmain.main\n\tgithub.com/scylladb/scylla-manager/v3/pkg/cmd/agent/main.go:12\nruntime.main\n\truntime/proc.go:271"}

...but we still get the logs about initializing the S3 and Azure providers, which we don't use:

{"L":"INFO","T":"2024-08-26T16:24:36.185Z","N":"rclone","M":"registered s3 provider [name=s3, memory_pool_use_mmap=true, env_auth=true, disable_checksum
=true, region=aaa, memory_pool_flush_time=5m, chunk_size=50M, upload_concurrency=2, no_check_bucket=true, provider=AWS]"}
{"L":"INFO","T":"2024-08-26T16:24:36.185Z","N":"rclone","M":"registered gcs provider [name=gcs, allow_create_bucket=false, service_account_file=/etc/scylla-manager-agent/credentials/gcs-service-account.json, memory_pool_use_mmap=true, chunk_size=50M, bucket_policy_only=true, memory_pool_flush_time=5m]"}
{"L":"INFO","T":"2024-08-26T16:24:36.186Z","N":"rclone","M":"registered azure provider [name=azure, chunk_size=50M, memory_pool_use_mmap=true, use_msi=true, memory_pool_flush_time=5m, disable_checksum=true]"}

Not a big deal to us as it seems harmless, but perhaps it would be better to prevent loading unnecessary providers at all while fixing this small bug.

@karol-kokoszka
Copy link
Collaborator

@alice-viola SM is expected to just log an error, but it should proceed even though it cannot read the region.
So, updating region in config just clears the error message from logs.

The error appears because SM tries to discover region when s3.endpoint and s3.region is not provided, by querying http://169.254.169.254/latest/dynamic/instance-identity/document
See

func awsRegionFromMetadataAPI() string {
const docURL = "http://169.254.169.254/latest/dynamic/instance-identity/document"
// Step 1: Request an IMDSv2 session token
token, err := awsAPIToken()
// fallback to IMDSv1 when error on token retrieval
if err != nil {
fs.Errorf(nil, "%+v", err)
token = ""
}
// Step 2: Use the session token to retrieve instance metadata
reqMetadata, err := http.NewRequestWithContext(context.Background(), http.MethodGet, docURL, http.NoBody)
if err != nil {
fs.Errorf(nil, "create metadata request: %+v", err)
return ""
}
if token != "" {
reqMetadata.Header.Set("X-aws-ec2-metadata-token", token)
}
metadataClient := http.Client{
Timeout: 2 * time.Second,
}
resMetadata, err := metadataClient.Do(reqMetadata)
if err != nil {
fs.Errorf(nil, "IMDSv2 failed to fetch instance identity: %+v", err)
return ""
}
defer resMetadata.Body.Close()
metadata := struct {
Region string `json:"region"`
}{}
if err := json.NewDecoder(resMetadata.Body).Decode(&metadata); err != nil {
fs.Errorf(nil, "parse instance region: %+v", err)
return ""
}
return metadata.Region
}

If the deployment is outside of AWS, then this API call just fails.

We could change the log level from ERROR to INFO with WARN prefix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants