WIP: [io-elastic] Introduce elasticsearch io. #546

dmvk · 2021-05-18T13:54:45Z

No description provided.

…ase. Co-authored-by: Zdenek Tison <[email protected]> Co-authored-by: Lukas Drbal <[email protected]>

je-ik · 2021-05-18T13:58:55Z

direct/io-elastic/src/main/java/cz/o2/proxima/direct/elastic/ElasticWriter.java

+  public void write(StreamElement element, CommitCallback commitCallback) {
+    Preconditions.checkArgument(!element.isDelete(), "Delete not supported.");
+    Preconditions.checkArgument(
+        !element.getAttributeDescriptor().isWildcard(), "Wildcard not supported.");


What is the reason for this? I don't see anything that would stop it from working.

I think ... it wasn't needed so I didn't test it. I don't think there is any other reason.

sonarqubecloud · 2021-05-18T17:36:08Z

SonarCloud Quality Gate failed.

0 Bugs
0 Vulnerabilities
0 Security Hotspots
14 Code Smells

16.6% Coverage
0.0% Duplication

je-ik · 2021-05-18T18:13:25Z

direct/io-elastic/src/main/java/cz/o2/proxima/direct/elastic/ElasticWriter.java

+
+    final IndexRequest request =
+        new IndexRequest(accessor.getIndexName())
+            .id(element.getKey())


Looks like the key should be element's key-attribute, no? I'm not familiar with elastic's datamodel, but if documents with the same key are replaced (upserted), then it looks like that.

Do you mean wildcard elements? It is true that document will be replaced if the key is the same for wildcard. Maybe this is the real reason why i didn't implemented support for wildcard elements.

That is not related to wildcard attributes. The key will be the same for all attributes of given entity (key). Seems like in the current version these are overwritten, that is update to attribute X overwrites update to attribute Y.

Every attribute has to have own elastic index, otherwise there could be problem with fields mapping. Elastic is not schemaless storage. In older version of elastic there was concept of index types which could be used for different attributes within single index, but it was problematic and removed from elastic. So now is recommended to have single index per data schema.

Oh. I think forcing separate index per attribute will not work for me. What is the use-case, that was behind this implementation?

I would say that it is best practice. If you share single index for more document types you can have mapping issue. It means that field with the same name always has to have the same field type.

Different approach could be like this:
Instead of:

{"key": "my-key", "attribute": "my-attr", "data": {....}}

we can have:

{"key": "my-key", "my-attr": {....}}

Yes, I understand that inside index documents should have the same structure. But that is fine, I think that we can have the same structure for all attributes. My use-case would be to be able to search for documents (representing entity-key-attribute tuples) with values of some attributes matching a full-text query. I think we can do that by unifying the document and using https://www.elastic.co/guide/en/elasticsearch/plugins/current/mapper-annotated-text-usage.html
We would break down the attributes of the entity and annotate values (as string) with a dot-notation of the inner attribute.
WDYT?

Hmm, interesting approach I haven't used annotated text before, it is relatively a new feature. So it means that data would not be structured json, but just single string with annotations? Try it.

Probably something like

{ "entity": "...", "key": " ...", "attribute": "...", "stamp": "...", "jsonValue": "value serialized as json string", "value": "[field_value](field_alias1&field_alias2&...&field_aliasN) [field_value](field_alias1&field_alias2&...&field_aliasN) ...." }

I think it might be then searchable using the value, and any of the aliases - a field alias would be a stipped dot notation - if we have a nested structure of something like

message A { message B { message C { int val = 1; } C c; } B b; }

The field would then be b.c.val, other aliases would be c.val and val. That should make possible queries like value:"field_value=b.c.val". Would that work for your case?

[io-elastic] Initial import of elastic connector from Seznam.cz codeb…

91b4561

…ase. Co-authored-by: Zdenek Tison <[email protected]> Co-authored-by: Lukas Drbal <[email protected]>

probot-autolabeler bot added direct direct-io legal labels May 18, 2021

je-ik reviewed May 18, 2021

View reviewed changes

je-ik self-assigned this May 18, 2021

[proxima-direct-io-elastic] build with JDK11

04fa5e8

je-ik reviewed May 18, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: [io-elastic] Introduce elasticsearch io. #546

WIP: [io-elastic] Introduce elasticsearch io. #546

dmvk commented May 18, 2021

je-ik May 18, 2021

tisonet May 18, 2021 •

edited

Loading

sonarqubecloud bot commented May 18, 2021

je-ik May 18, 2021

tisonet May 18, 2021

je-ik May 18, 2021

tisonet May 18, 2021

je-ik May 18, 2021

tisonet May 19, 2021 •

edited

Loading

je-ik May 19, 2021

tisonet May 19, 2021

je-ik May 19, 2021

WIP: [io-elastic] Introduce elasticsearch io. #546

Are you sure you want to change the base?

WIP: [io-elastic] Introduce elasticsearch io. #546

Conversation

dmvk commented May 18, 2021

Choose a reason for hiding this comment

tisonet May 18, 2021 • edited Loading

Choose a reason for hiding this comment

sonarqubecloud bot commented May 18, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tisonet May 19, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tisonet May 18, 2021 •

edited

Loading

tisonet May 19, 2021 •

edited

Loading