-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: [io-elastic] Introduce elasticsearch io. #546
base: master
Are you sure you want to change the base?
Conversation
…ase. Co-authored-by: Zdenek Tison <[email protected]> Co-authored-by: Lukas Drbal <[email protected]>
public void write(StreamElement element, CommitCallback commitCallback) { | ||
Preconditions.checkArgument(!element.isDelete(), "Delete not supported."); | ||
Preconditions.checkArgument( | ||
!element.getAttributeDescriptor().isWildcard(), "Wildcard not supported."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for this? I don't see anything that would stop it from working.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think ... it wasn't needed so I didn't test it. I don't think there is any other reason.
SonarCloud Quality Gate failed. |
|
||
final IndexRequest request = | ||
new IndexRequest(accessor.getIndexName()) | ||
.id(element.getKey()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like the key should be element's key-attribute, no? I'm not familiar with elastic's datamodel, but if documents with the same key are replaced (upserted), then it looks like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean wildcard elements? It is true that document will be replaced if the key is the same for wildcard. Maybe this is the real reason why i didn't implemented support for wildcard elements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is not related to wildcard attributes. The key will be the same for all attributes of given entity (key). Seems like in the current version these are overwritten, that is update to attribute X overwrites update to attribute Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Every attribute has to have own elastic index, otherwise there could be problem with fields mapping. Elastic is not schemaless storage. In older version of elastic there was concept of index types which could be used for different attributes within single index, but it was problematic and removed from elastic. So now is recommended to have single index per data schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. I think forcing separate index per attribute will not work for me. What is the use-case, that was behind this implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that it is best practice. If you share single index for more document types you can have mapping issue. It means that field with the same name always has to have the same field type.
Different approach could be like this:
Instead of:
{"key": "my-key", "attribute": "my-attr", "data": {....}}
we can have:
{"key": "my-key", "my-attr": {....}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that inside index documents should have the same structure. But that is fine, I think that we can have the same structure for all attributes. My use-case would be to be able to search for documents (representing entity-key-attribute tuples) with values of some attributes matching a full-text query. I think we can do that by unifying the document and using https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text-usage.html
We would break down the attributes of the entity and annotate values (as string) with a dot-notation of the inner attribute.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting approach I haven't used annotated text
before, it is relatively a new feature. So it means that data would not be structured json, but just single string with annotations? Try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably something like
{
"entity": "...",
"key": " ...",
"attribute": "...",
"stamp": "...",
"jsonValue": "value serialized as json string",
"value": "[field_value](field_alias1&field_alias2&...&field_aliasN)
[field_value](field_alias1&field_alias2&...&field_aliasN)
...."
}
I think it might be then searchable using the value, and any of the aliases - a field alias would be a stipped dot notation - if we have a nested structure of something like
message A {
message B {
message C {
int val = 1;
}
C c;
}
B b;
}
The field would then be b.c.val
, other aliases would be c.val
and val
. That should make possible queries like value:"field_value=b.c.val"
. Would that work for your case?
No description provided.