Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Update installation mechanism for Transforms in Fleet according to new specifications #140046

Merged
merged 27 commits into from
Sep 20, 2022

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Sep 6, 2022

Summary

This PR addresses #134321 and can be tested in conjunction with the test example package elastic/integrations#4138. Changes include:

  • Ensure legacy schema is still supported
  • Handle creating an index template when either the fields.yml file is present or manifest.yml contains destination_index_template:
    • Generate the mappings from the fields.yml file. This can be done by leveraging loadFieldsFromYaml and generateMappings
    • Combine the mappings and other index template settings from manifest.yml into a single index template
    • Create the index template and track the template in EsAssetReferences
  • Create the destination index if it doesn't already exist
    • Open question: how will new mappings and settings get applied across upgrades if the destination index is not deleted?
  • After the index template / destination index has been created and initialized, create the transform and start it. This largely already exists, we just need a couple adjustments:
    • Update the logic to read the transform from the transform.yml file rather than a json file.
    • Do not start the transform if the manifest.yml file contains start: false (the default is to start the transform if this isn't specified).
  • Handle creating the transforms in the correct order. For instance, if one transform's src index is the dest index of another transform, the second transform and index need to be created first.

Checklist

Delete any items that are not applicable to this PR.

Risk Matrix

Delete this section if it is not applicable to this PR.

Before closing this PR, invite QA, stakeholders, and other developers to identify risks that should be tested prior to the change/feature release.

When forming the risk matrix, consider some of the following examples and how they may potentially impact the change:

Risk Probability Severity Mitigation/Notes
Multiple Spaces—unexpected behavior in non-default Kibana Space. Low High Integration tests will verify that all features are still supported in non-default Kibana Space and when user switches between spaces.
Multiple nodes—Elasticsearch polling might have race conditions when multiple Kibana nodes are polling for the same tasks. High Low Tasks are idempotent, so executing them multiple times will not result in logical error, but will degrade performance. To test for this case we add plenty of unit tests around this logic and document manual testing procedure.
Code should gracefully handle cases when feature X or plugin Y are disabled. Medium High Unit tests will verify that any feature flag or plugin combination still results in our service operational.
See more potential risk examples

For maintainers

@qn895 qn895 self-assigned this Sep 6, 2022
@qn895 qn895 marked this pull request as ready for review September 6, 2022 23:37
@qn895 qn895 requested a review from a team as a code owner September 6, 2022 23:37
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@botelastic botelastic bot added the Team:Fleet Team label for Observability Data Collection Fleet team label Sep 7, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@kevinlog
Copy link
Contributor

@qn895 thank you for working on this and adding the legacy support for the Endpoint package!

I checked out the code locally and have been testing with our legacy schema. The installation is working great! However, I think we're missing the cleanup of the older, legacy transforms. More below:

  • The legacy transform schema is successfully installed on a fresh install
  • When I upgrade the the package, the legacy transforms are not uninstalled when we install new versions. We would need to be sure we clean up any transforms following the old schema and naming convention for users who are upgrading from older versions.

I think the problem is just that when we check for transforms to remove, the code is only looking for transforms with the new naming convention in their IDs.

I see this in the logs:

[2022-09-12T08:18:02.992-04:00][INFO ][plugins.fleet] Found previous transform references:
 [{"id":"logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_current/default.json-default-8.4.1","type":"transform"},{"id":"logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_united/default.json-default-8.4.1","type":"transform"}]
[2022-09-12T08:18:02.992-04:00][INFO ][plugins.fleet] Deleting currently installed transform ids logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_current/default.json-default-8.4.1,logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_united/default.json-default-8.4.1
[2022-09-12T08:18:02.998-04:00][INFO ][plugins.fleet] Deleted: logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_united/default.json-default-8.4.1
[2022-09-12T08:18:02.998-04:00][WARN ][plugins.fleet] cannot find transform for logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_united/default.json-default-8.4.1
[2022-09-12T08:18:02.999-04:00][INFO ][plugins.fleet] Deleted: logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_current/default.json-default-8.4.1
[2022-09-12T08:18:02.999-04:00][WARN ][plugins.fleet] cannot find transform for logs-endpoint.endpoint-8.4.1/elasticsearch/transform/metadata_current/default.json-default-8.4.1

And in the Transforms UI, you can see that 4 transforms are installed on upgrade, while there should be 2. The 8.4.1 versions should have been removed when upgrading to 8.5.0-dev .

image

Endpoint's legacy transforms will have id/names like the below which differ from the new naming convention.

endpoint.metadata_current-default-8.4.1
endpoint.metadata_united-default-8.4.1

When doing cleanup of old transforms we would also need to make sure we delete those for users who are upgrading from before the new schema was in place.

Let me know if the above makes sense - happy to answer any questions and help test/debug further

@sophiec20
Copy link
Contributor

@kevinlog As a reminder, this piece of work is an initial iteration. It is only intended to create a reference package for transforms in Fleet in staging - we do not intend for this to move into production it its current form.

We had initially wanted to use sample data, but we found we could make faster progress by following the endpoint source for reasons concerning index privileges. Sorry to get your hopes up so early.

Our next steps will be

  • define best practice for upgrade
  • define best practice for privileges

In order to get this PR moving, we will ensure it is clearly labeled for non-production.

We do not expect to be working on the clean up of old endpoint transforms, although we expect to provide best practice guidelines in follow up iterations.

@qn895 qn895 changed the title [ML] Update installation mechanism for Transforms in Fleet [ML] Showcase new installation mechanism for Transforms in Fleet Sep 12, 2022
Copy link
Member

@kpollich kpollich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went through the code with an eye for performance/stability improvements and I think everything looks good except for one minor improvement. 🚀

droberts195 added a commit to elastic/elasticsearch that referenced this pull request Sep 15, 2022
As a companion to elastic/kibana#140046 an example transform
package is being created in elastic/integrations#4138.

Initially the intention was that this example would use the
Kibana sample data. Then, after realising that wouldn't work
due to lack of permissions, it was changed to reinstall a
transform that is created by the security solution. However,
it has been pointed out that this is dangerous because even
if we tell people not to install the example in production
it could somehow happen by mistake and then we'd have created
objects that clash with production code.

Therefore, it is best that the example transform package be
switched back to the original idea of using the Kibana sample
data, so that there is no risk of it ever clashing with a
production package.
@qn895
Copy link
Member Author

qn895 commented Sep 19, 2022

@elasticmachine merge upstream

@kevinlog
Copy link
Contributor

@qn895

I checked out the branch again after the refactors and the Endpoint package still installs correctly. I also ran through some upgrade scenarios with particular focus on ensuring the existing functionality for installing and upgrading legacy transforms still works (i.e. maintaining existing Endpoint package support). Some details on the scenarios I tested below, I don't see anything that should stop this PR from merging in terms of functionality.

I tested the following scenarios and they work as expected:

  1. Install the legacy Endpoint package with the legacy transform schema ✅
  2. Install the updated Endpoint package with the new compliant transform schema ✅

Upgrade scenarios:
In this section, I tested some upgrade scenarios.

  1. Install Endpoint package with legacy transforms -> upgrade to an Endpoint package with compliant transforms ✅
  2. (Existing functionality) Install Endpoint package with legacy transforms -> upgrade to an Endpoint package still with legacy transforms ✅
  3. (Out of scope functionality) Install Endpoint package with compliant transforms -> upgrade to an Endpoint package with compliant transforms ❌
    • I know that this scenario is out of scope for the PR, so it shouldn't stop us from merging.

@kevinlog
Copy link
Contributor

kevinlog commented Sep 19, 2022

@qn895 I took a closer look at the 3rd upgrade scenario I referenced above.

  1. (Out of scope functionality) Install Endpoint package with compliant transforms -> upgrade to an Endpoint package with compliant transforms ❌

I originally thought that this was part of more advanced upgrade capabilities that was mentioned in the "out of scope" section in the original ticket.

However, I took a closer look at the Kibana logs I had locally. It looks like to me that we just need to delete any existing index patterns along with the transforms. The logs below upgrade a 8.5.0 package version to 8.6.0.

[2022-09-19T18:20:27.780-04:00][INFO ][plugins.fleet] Deleting currently installed transform ids logs-endpoint.metadata_current-default-8.5.0-dev.0,logs-endpoint.metadata_united-default-8.5.0-dev.0
[2022-09-19T18:20:30.165-04:00][INFO ][plugins.fleet] Deleted: logs-endpoint.metadata_united-default-8.5.0-dev.0
[2022-09-19T18:20:32.678-04:00][INFO ][plugins.fleet] Deleted: logs-endpoint.metadata_current-default-8.5.0-dev.0
[2022-09-19T18:20:33.249-04:00][WARN ][plugins.fleet] Failure to install package [endpoint]: [ResponseError: illegal_argument_exception: [illegal_argument_exception] Reason: index template [logs-endpoint.metadata_current-template-8.6.0-dev.0] has index patterns [metrics-endpoint.metadata_current_default] matching patterns from existing templates [logs-endpoint.metadata_current-template-8.5.0-dev.0] with patterns (logs-endpoint.metadata_current-template-8.5.0-dev.0 => [metrics-endpoint.metadata_current_default]) that have the same priority [250], multiple index templates may not match during index creation, please use a different priority]
[2022-09-19T18:20:33.935-04:00][ERROR][plugins.fleet] rolling back to endpoint-8.5.0-dev.0 after error installing endpoint-8.6.0-dev.0
[2022-09-19T18:20:33.971-04:00][WARN ][plugins.fleet] Failed installing package [endpoint] due to error: [ResponseError: illegal_argument_exception: [illegal_argument_exception] Reason: index template [logs-endpoint.metadata_current-template-8.6.0-dev.0] has index patterns [metrics-endpoint.metadata_current_default] matching patterns from existing templates [logs-endpoint.metadata_current-template-8.5.0-dev.0] with patterns (logs-endpoint.metadata_current-template-8.5.0-dev.0 => [metrics-endpoint.metadata_current_default]) that have the same priority [250], multiple index templates may not match during index creation, please use a different priority]
[2022-09-19T18:20:34.002-04:00][INFO ][plugins.fleet] Encountered non fatal errors during Fleet setup

Looking at this from the original description in the Fleet ticket (cc @joshdover):

The transform uninstallation code should not need to be updated. Our existing code for deleting transforms and index templates should already work as long as the EsAssetReferences are updated correctly to track the assets that were created by the install code.

Do we just need to be sure that the index templates are accounted for in EsAssetReferences when installing transforms that follow the new schema? I see references to this functionality in your PR, so I may be misunderstanding the results.

Again, I don't want to increase scope of this PR beyond original expectations, but if we thought that we had already accounted for deleting older templates, then maybe this is a bug.

Endpoint package can still ship using the legacy transform schema. I just wanted to be sure I called the above out as I think it would need to work to start shipping packages with the new schema. We can open a separate ticket and follow up if needed.

@joshdover
Copy link
Contributor

joshdover commented Sep 20, 2022

@kevinlog Good catch. I had thought that we removed the old assets before we install the new ones, but that is not the case:

return await _installPackage({
savedObjectsClient,
savedObjectsImporter,
savedObjectTagAssignmentService,
savedObjectTagClient,
esClient,
logger,
installedPkg,
paths,
packageInfo,
installType,
spaceId,
verificationResult,
installSource: 'registry',
})
.then(async (assets) => {
await removeOldAssets({
soClient: savedObjectsClient,
pkgName: packageInfo.name,
currentVersion: packageInfo.version,
});

I think the solution here is to:

  • Exclude the package version number from the template name (so we generate the same template name for each version of the package) and instead overwrite the existing template during installation.
  • Make sure to remove the EsAssetReference for the old version so it does not get deleted in this removeOldAssets function.
  • Double check the rollback behavior. We need to be sure that on package rollback (due to a failure during installation) we re-install the template from the previous version. I believe this would be covered by the two suggestions above, but we should test it. You can probably test this in _install_package.test.ts by mocking a step to throw an exception.

Was there a reason we need to have the package version in the template name? I must have missed this discussion. We don't do this for other index / data stream templates, so I'm not sure it's necessary here either.

@kevinlog
Copy link
Contributor

@joshdover

Was there a reason we need to have the package version in the template name? I must have missed this discussion. We don't do this for other index / data stream templates, so I'm not sure it's necessary here either.

I don't think it's necessary to include the package version in the template names.

If it helps, updating the Endpoint package following the legacy schema with a new Endpoint package following the legacy schema, transforms and templates included, is working in my tests with this PR. Are we able to follow the same logic with the new transform schema?

These two upgrade scenarios were working, but both start with an Endpoint package following the legacy schema:

  1. Install Endpoint package with legacy transforms -> upgrade to an Endpoint package with compliant transforms ✅
  2. (Existing functionality) Install Endpoint package with legacy transforms -> upgrade to an Endpoint package still with legacy transforms ✅

@qn895
Copy link
Member Author

qn895 commented Sep 20, 2022

Thanks @joshdover and @kevinlog for the clarification. I've removed the version from the template names here 09a7c08 (#140046)

Installing a transform with package name transform and folder name kibana_ecommerce_pivot will install the following:

GET _transform/logs-transform.kibana_ecommerce_pivot-default-0.2.0 
GET _index_template/logs-transform.kibana_ecommerce_pivot-template
GET _component_template/logs-transform.kibana_ecommerce_pivot-template@custom
GET _component_template/logs-transform.kibana_ecommerce_pivot-template@package

Uninstalling and upgrading the package to a newer version will delete all 4 assets from the old version.

To not block the PR for FF, I'll add integration tests using compliant transforms in a follow up.

@kibana-ci
Copy link
Collaborator

💚 Build Succeeded

Metrics [docs]

✅ unchanged

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @qn895

@kevinlog
Copy link
Contributor

@qn895 thanks for the changes!

I tested the below scenarios again with the new branch and it's all working.

I tested the following scenarios and they work as expected:

  1. Install the legacy Endpoint package with the legacy transform schema ✅
  2. Install the updated Endpoint package with the new compliant transform schema ✅

Upgrade scenarios:
In this section, I tested some upgrade scenarios.

  1. Install Endpoint package with legacy transforms -> upgrade to an Endpoint package with compliant transforms ✅
  2. (Existing functionality) Install Endpoint package with legacy transforms -> upgrade to an Endpoint package still with legacy transforms ✅
  3. (Now working, was broken before) Install Endpoint package with compliant transforms -> upgrade to an Endpoint package with compliant transforms ✅

@qn895 qn895 requested a review from hop-dev September 20, 2022 16:52
Copy link
Contributor

@alvarezmelissa87 alvarezmelissa87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ⚡

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport:skip This commit does not require backporting :ml release_note:skip Skip the PR/issue when compiling release notes Team:Fleet Team label for Observability Data Collection Fleet team v8.5.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.