Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Object Lifecycle Management for Gateway Mode NasXL #27

Open
wants to merge 23 commits into
base: iternity-rb
Choose a base branch
from

Conversation

rluetzner
Copy link
Collaborator

Description

This allows us to use ILM in combination with Gateway Mode NasXL.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Documentation updated
  • Unit tests added/updated

aweisser and others added 23 commits December 17, 2021 20:52
The only reason why the useful profiling function of the admin API
doesn't work for gateway modes is, that the gateway-main doesn't init
the global variable globalLocalNodeName.

We simply use the same initialization logic as in server mode.
* set MINIO_GATEWAY_SCANNER_CANDIDATE to 'on' to make the gateway instance take part in the leader election for running the datascanner process
* the active leader will execute the datascanner process
* leader election is implemented with etcd concurrency api
* TODOs: remove info logs
In erasure or server mode minio keeps track of changes
and stores the object paths that have been updated in a
bloomfilter that is shared between the server instances.
Changes to this filter are published via RPC notification.

The data scanner utilizes this information to decide if a
folder needs to be scanned.

Because nas(xl) mode currently no not support RPC
notifications, we decided to disable checking of this filter
within the data scanner.
…ateDirCycles

The value of globalDataScannerStartDelay can be set via env 'MINIO_SCANNER_START_DELAY_SECONDS'
This value of globalDataUsageUpdateDirCycles can be set via env 'MINIO_USAGE_UPDATE_DIR_CYCLES'
* Activation of Workers (BackgroundTransition, BackgroundExpiry, TierDeletionJournal)
* Implementation of TransitionObject and RestoreObject in fs-v1.go
* Linting
* Activated Router endpoints for tier configuration
* Initializing subsystem
* Handler methods now allow nasxl mode
* only working with single server mode, because the tier config changes are not populated to other servers yet
The response code returned in case of success did not
match the aws spec. See 'Responses' (200 OK vs. 202 Accepted):

https://docs.aws.amazon.com/AmazonS3/latest/API/API_RestoreObject.html
Updated to tier configs on server A do now trigger server B to reload its
tier configs. This is done using a watch on an etcd key. Each time an
instance updates a tier config, it updates the etcd key and all other
instances reload their tier configs.
Restoring of files that had been uploaded as multipart did not
work with mc client. The mc client triggered completion of multipart
upload and at the same time it polled for the status of the final
object via getobjectinfo.

The locks in getobjectinfo (fs-v1.go L:875 and L:888) in conjunction
with the locks of complete multipart upload (fs-v1-multipart.go
L: 604 and L:773) caused a deadlock at fs-v1-multipart.go L:773.

The status polling request was stuck in fs-v1.go L:888 due to the
filesystem write-lock of the complete multipart upload request. Thus
the polling request did not release the NS read lock. This caused
the completion request to get stuck in fs-v1-multipart.go L:773.

I have moved the filesystem write lock to the same location as the
NS lock.
Restoring of files that had been uploaded as multipart resulted in
the deletion of the file in the remote tier and with empty xl.meta.

This was caused by a missing metadata property (restore-status header).
The same property already got set in non-multipart restore process.
I have applied it to both kind of restore processes: multipart and
non-multipart.
The meta lock file used to prevent the parallel creation of
buckets with the same name has not been deleted after
successfully creating the bucket.

1. meta file is now deleted during cleanup. As the cleanup
method only deletes newly created meta files when an error
is passed to the function, I have created some kind of pseudo
error. This solved the deletion of the xl.meta file.
-- seems a bit hacky though

2. although the xl.meta is now deleted, the folder of the
pseudobucket still resists in meta tmp folder. So I changed
the .lck files location to be directly located within the
meta tmp folder and not within a sub folder (pseudobucket).
Removed env 'MINIO_SCANNER_START_DELAY_SECONDS' as this does not work
correctly and there is already a working solution in place with
env 'MINIO_SCANNER_CYCLE'.
This unit allows users to limit the maximum number of noncurrent
versions of an object.

To enable this rule you need the following *ilm.json*
```
cat >> ilm.json <<EOF
{
    "Rules": [
        {
            "ID": "test-max-noncurrent",
            "Status": "Enabled",
            "Filter": {
                "Prefix": "user-uploads/"
            },
            "NoncurrentVersionExpiration": {
                "MaxNoncurrentVersions": 5
            }
        }
    ]
}
EOF
mc ilm import myminio/mybucket < ilm.json
```
- Rename MaxNoncurrentVersions tag to NewerNoncurrentVersions

Note: We apply overlapping NewerNoncurrentVersions rules such that
we honor the highest among applicable limits. e.g if 2 overlapping rules
are configured with 2 and 3 noncurrent versions to be retained, we
will retain 3.

- Expire newer noncurrent versions after noncurrent days
- MinIO extension: allow noncurrent days to be zero, allowing expiry
  of noncurrent version as soon as more than configured
  NewerNoncurrentVersions are present.
- Allow NewerNoncurrentVersions rules on object-locked buckets
- No x-amz-expiration when NewerNoncurrentVersions configured
- ComputeAction should skip rules with NewerNoncurrentVersions > 0
- Add unit tests for lifecycle.ComputeAction
- Support lifecycle rules with MaxNoncurrentVersions
- Extend ExpectedExpiryTime to work with zero days
- Fix all-time comparisons to be relative to UTC
Caused by 'LatestModTime: versions[0].ModTime' if versions
slice was empty.

fff
Instances now observe the name of the current leader and do not restart
scanning if their name does not match the leadername. In this case the former
leader starts campaigning for the leader role.

In addition the campaign winner rechecks the leader name after the call
to Campaign()

To create a random name for each instance the randString function has been
moved from tests to utils.go.
@rluetzner
Copy link
Collaborator Author

@tristanessquare , well this was stupid of me. Now we have a pull request, but I can no longer review it. 🤣
I'll figure out a way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants