Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Transform] Transforms with unattended flag don't create destination index unless all conditions/fields exist in source index #104146

Open
susan-shu-c opened this issue Jan 9, 2024 · 8 comments
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team

Comments

@susan-shu-c
Copy link
Member

susan-shu-c commented Jan 9, 2024

Description

We added the unattended flag to transforms shipped in integration packages (example: elastic/integrations#8320).

In the past, without the unattended flag, once the package is installed on a fresh cluster:

Now, with the unattended flag, on a fresh cluster:

  • Transform is installed
  • Destination index doesn't seem to be created successfully

After testing on v8.11.1 (so that this fix #101627 would be there), transforms with the unattended flag don't seem to create the destination index like without the unattended flag.

It turns out, the destination index is only created when there is exact data that matches the criteria (e.g. fields host.name, destination.ip, etc. exist in logs-*) for the transform to run, compared to before, the destination index can be created regardless. This gives the impression that the package hasn't fully been installed.

What we want to clarify is: Is this expected behavior with the unattended flag?
If so, can it be implemented so the behavior is the same as before (create destination index regardless of available data) so that it's clearer to users when the transform and associated indices have been created?

Related links

@susan-shu-c susan-shu-c added >enhancement :ml/Transform Transform needs:triage Requires assignment of a team area label labels Jan 9, 2024
@elasticsearchmachine elasticsearchmachine added the Team:ML Meta label for the ML team label Jan 9, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/ml-core (Team:ML)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jan 9, 2024
@susan-shu-c susan-shu-c changed the title [Transform] [Transform] Transforms with unattended flag doesn't install destination index with caveats Jan 9, 2024
@susan-shu-c susan-shu-c changed the title [Transform] Transforms with unattended flag doesn't install destination index with caveats [Transform] Transforms with unattended flag doesn't install destination index unless all conditions/fields exist in source index Jan 9, 2024
@susan-shu-c susan-shu-c changed the title [Transform] Transforms with unattended flag doesn't install destination index unless all conditions/fields exist in source index [Transform] Transforms with unattended flag don't create destination index unless all conditions/fields exist in source index Jan 9, 2024
@przemekwitek przemekwitek self-assigned this Jan 10, 2024
@przemekwitek
Copy link
Contributor

Hi,

In the past, without the unattended flag, once the package is installed on a fresh cluster:
Transform is installed
Destination index is created on install
.latest and .all aliases for the destination index is created

Confirm, that's the correct behavior with unattended set to false

After testing on v8.11.1 (so that this fix #101627 would be there), transforms with the unattended flag don't seem to create the destination index like without the unattended flag.

That's true. In case of unattended transform, we explicitly skip destination index creation on _start call (which is a part of package install).

It turns out, the destination index is only created when there is exact data that matches the criteria (e.g. fields host.name, destination.ip, etc. exist in logs-*) for the transform to run

In case of unattended, the destination index is created when the first document is being written to it.
So, if there is proper data in source index, you'll eventually see the destination index created and the first results written to it.

This gives the impression that the package hasn't fully been installed.

I understand your concern here. Without explicit destination index creation, it is less predictable when exactly the destination index (and its aliases) will be set up.

Is this expected behavior with the unattended flag?

Confirm, working as intended.

If so, can it be implemented so the behavior is the same as before (create destination index regardless of available data) so that it's clearer to users when the transform and associated indices have been created?

I'll need to think how such a change would fit the current codebase.
I'll ping this issue soon.

@przemekwitek
Copy link
Contributor

przemekwitek commented Jan 23, 2024

@susan-shu-c, it seems you can achieve what you need by reverting the transform to non-unattended.
Precisely, you want these 2 settings in your transform config:

  "settings": {
    "unattended": false,
    "num_failure_retries": -1
  }

This way the transform will not be unattended (so it will create destination index just like it used to) but at the same time it will retry most of the failures indefinitely (without limit).
Having said that, there will still be failures that will not be retried (like script exception) so the transform will not be fully unattended.

Are there any reasons (other than indefinite retry limit) that made you switch to unattended?

@susan-shu-c
Copy link
Member Author

Pasting our Slack conversation for reference:

We added unattended: true so that the install would work on Serverless

(as requested by Sophie Chang, not going to link it here as it was an internal GitHub discussion)

@przemekwitek
Copy link
Contributor

I was able to reproduce the issue locally.
The problem is that if the transform destination index is created dynamically (not on _start_ but later during indexing), then we do not set up this index' aliases.
This is a bug that we need to fix in our backend code.

@przemekwitek
Copy link
Contributor

FYI: I have opened a PR with the fix (#105499).

@susan-shu-c
Copy link
Member Author

Awesome, thank you! So with #105499 we can install packages with unattended: true or unattended: false and in both cases, the destination index will be created on package install?

@przemekwitek
Copy link
Contributor

Awesome, thank you! So with #105499 we can install packages with unattended: true or unattended: false and in both cases, the destination index will be created on package install?

Not exactly.
This bugfix makes destination index and its aliases set up correctly once the transform sees source indices and is ready to start processing them.
This should solve your immediate problem of missing aliases and should be enough for your setup to work correctly (but of course let us know if it is not the case and there are further issues).

Creating destination index before source indices are ready is a more complex topic that we want to tackle too, but we won't have any solution for it in 8.13.
We need to re-design the transform's workflow to accommodate this change, that's why we don't want to rush it before feature freeze.

prwhelan added a commit that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix #105683
Relate #104146
prwhelan added a commit to prwhelan/elasticsearch that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix elastic#105683
Relate elastic#104146
elasticsearchmachine pushed a commit that referenced this issue Feb 23, 2024
For Unattended Transforms, if we fail to create the destination index on
the first run, we will retry the transformation iteration, but we will
not retry the destination index creation on that next iteration.

This change stops the Unattended Transform from progressing beyond the
0th checkpoint, so all retries will include the destination index
creation.

Fix #105683
Relate #104146
@przemekwitek przemekwitek removed their assignment Jun 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :ml/Transform Transform Team:ML Meta label for the ML team
Projects
None yet
Development

No branches or pull requests

3 participants