-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add data_stream fields to spec #38
base: main
Are you sure you want to change the base?
Conversation
@felixbarny Deprecating The current guidance included in the |
@felixbarny I would also assume that
|
That's how I specified it. Both
I've updated the spec. But to be clear: the loggers would not set |
I'm torn if it should be set by default or not. The advantage of setting it also in the log is even if an other tool picks the logs up, it will contain all the data needed. But adds to each log line :-( |
I really think it's the responsibility of the "other tool" to add the defaults. It already needs to add metadata about the host, pod, container, etc anyway. |
To add some context from the Logs UI side: It uses |
@ruflin is adding Should ecs loggers add both The advantage is that this would work with the widest range of Filebeat and the metrics UI. The downside is that there's a duplication of data. Also, if |
Elastic Agent does not have any templates, I assume you are referring to the templates shipped by Elasticsearch? As you mentioned, there is a problem around event.dataset already existing. Filebeat is the only shipper that we control, there might be others. Also there are the parts which don't use the new indexing strategy yet where still event.dataset is used. Maybe that could be the differentiator? For indices which are |
I was referring to the templates shipped with Filebeat that are installed by Elastic Agent, IIUC.
But where should that decision be made? The ECS loggers don't and shouldn't know who's shipping the logs. If ECS loggers add Maybe ECS loggers should just add To serve the log UI's log rate anomaly ML job, even datastream-based |
The templates are loaded by Elasticsearch directly. Here is an open PR to modify these, it should indicate where these are: elastic/elasticsearch#64978 Agree, ECS loggers should not care who ships the logs. What if we use the Lets try to discuss first the end goal and then work backwards from there what we can do to mitigate the issues that happen short and midterm. |
💔 Build Failed
Expand to view the summary
Build stats
Trends 🧪Steps errorsExpand to view the steps failures
|
Let's try to get this one over the finishing line.
The goal of ecs loggers is to be compatible with the widest range of stack versions possible. For that reason, I think it makes sense to include both the |
++, lets move forward with both as you suggested @felixbarny . We can always adjust / optimise later. |
I hit an issue when testing this end-to-end with elastic/ecs-logging-java#124 and Elastic Agent 7.11.2: The |
I think we need to open an issue for this in Beats. It bascially means we need some logic around:
It is possible that apm-server can do this already today but not the logfile input AFAIK. |
I've created an issue: elastic/beats#24683 I'll mark this PR as blocked, at least until we have a consensus that this should be implemented in the near to mid-term. |
@felixbarny I assume the routing feature might bring a new twist on this discussion? |
It definitely does. Reading this thread is almost nostalgic 😄. This thread was the reason I got pulled in more and more into routing and it seems elastic/elasticsearch#76511 is finally landing soon. The reroute processor will also unblock |
Data streams are a new way of index management that is used by Elastic Agent. See also elastic/ecs#1145.
To enable users to influence the data stream (and consequently, the index) the data is sent to, ECS loggers should offer configuration options for
data_stream.dataset
anddata_stream.namespace
. The data stream for logs islogs-{data_stream.dataset}-{data_stream.namespace}
. As the settings are optional, Filebeat will default the values todefault
so that the data stream would belogs-generic-default
.The
event.dataset
field is going to be deprecated in favor ofdata_stream.dataset
and scheduled to be removed for 8.0.As we're planning to GA Elastic Agent in the 7.x timeframe, we'll have to support both
event.dataset
anddata_stream.dataset
for a while. Therefore, ECS loggers should set both fields with the same value, ideally before their initial GA release. This will make 7.x ECS loggers compatible with the 8.x stack.When 8.0 comes along, ECS loggers should drop support for the
event.dataset
field. This would still allow 8.x loggers to be used with a 7.x stack. The only implication would be that the ML categorization job, which relies onevent.dataset
, wouldn't work. To fix that, users can use an ingest node processor to copy the value ofdata_stream.dataset
toevent.dataset
, or downgrade their ECS logger to 7.x.