Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: [io-elastic] Introduce elasticsearch io. #546

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Conversation

dmvk
Copy link

@dmvk dmvk commented May 18, 2021

No description provided.

public void write(StreamElement element, CommitCallback commitCallback) {
Preconditions.checkArgument(!element.isDelete(), "Delete not supported.");
Preconditions.checkArgument(
!element.getAttributeDescriptor().isWildcard(), "Wildcard not supported.");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for this? I don't see anything that would stop it from working.

Copy link

@tisonet tisonet May 18, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ... it wasn't needed so I didn't test it. I don't think there is any other reason.

@je-ik je-ik self-assigned this May 18, 2021
@sonarqubecloud
Copy link


final IndexRequest request =
new IndexRequest(accessor.getIndexName())
.id(element.getKey())
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the key should be element's key-attribute, no? I'm not familiar with elastic's datamodel, but if documents with the same key are replaced (upserted), then it looks like that.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean wildcard elements? It is true that document will be replaced if the key is the same for wildcard. Maybe this is the real reason why i didn't implemented support for wildcard elements.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is not related to wildcard attributes. The key will be the same for all attributes of given entity (key). Seems like in the current version these are overwritten, that is update to attribute X overwrites update to attribute Y.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every attribute has to have own elastic index, otherwise there could be problem with fields mapping. Elastic is not schemaless storage. In older version of elastic there was concept of index types which could be used for different attributes within single index, but it was problematic and removed from elastic. So now is recommended to have single index per data schema.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. I think forcing separate index per attribute will not work for me. What is the use-case, that was behind this implementation?

Copy link

@tisonet tisonet May 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that it is best practice. If you share single index for more document types you can have mapping issue. It means that field with the same name always has to have the same field type.

Different approach could be like this:
Instead of:

{"key": "my-key", "attribute": "my-attr", "data": {....}} 

we can have:

{"key": "my-key", "my-attr": {....}} 

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I understand that inside index documents should have the same structure. But that is fine, I think that we can have the same structure for all attributes. My use-case would be to be able to search for documents (representing entity-key-attribute tuples) with values of some attributes matching a full-text query. I think we can do that by unifying the document and using https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text-usage.html
We would break down the attributes of the entity and annotate values (as string) with a dot-notation of the inner attribute.
WDYT?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, interesting approach I haven't used annotated text before, it is relatively a new feature. So it means that data would not be structured json, but just single string with annotations? Try it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably something like

{
"entity": "...",
"key": " ...",
"attribute": "...",
"stamp": "...",
"jsonValue": "value serialized as json string",
"value": "[field_value](field_alias1&field_alias2&...&field_aliasN)
               [field_value](field_alias1&field_alias2&...&field_aliasN)
               ...."
}

I think it might be then searchable using the value, and any of the aliases - a field alias would be a stipped dot notation - if we have a nested structure of something like

message A {
  message B {
    message C {
      int val = 1;
    }
    C c;
  }
  B b;
 }

The field would then be b.c.val, other aliases would be c.val and val. That should make possible queries like value:"field_value=b.c.val". Would that work for your case?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants