Maturing Shotover - part 1 #378

benbromhead · 2021-03-27T00:26:27Z

benbromhead
Mar 27, 2021
Maintainer

Next Steps

The following section is a list of tasks / changes required to support the redis caching, authentication for redis and general development on shotover.

Message/Messages structure

This is probably the biggest one (in terms of impact of changes + ease of development with shotover).
Currently we pass around a vector of messages. This has some speed benefits (Esp wrt to Redis), where we can read a bunch of queued messages and process them really fast + bulk send to upstream.
We can also do this with Cassandra messages where Cassandra supports "in-flight" requests per connection. Currently however we don't do this with the codec.
The downside is that it dramatically complicates some transforms and how we process them. If we need to match data from a response to the request on the response path. It becomes tricky.
We might need to start thinking about whether we treat transform chains as a set of futures that we pass data to, or we build a set of futures per message we receive (ala Tower.rs).
Some other areas to explore:
- Transforms could either be something that act on one or multiple messages, and then we have the transform chain figure out how it needs to iterate on that group of messages.
- This approach where we have a transform operate on one message in the group of messages then means we could have an issue with how do we get the response for a single message, within the actual transform (given they are not split all the time in the response).
- Need to really think this one through

Shared config maps:

Currently transforms like cassandra_destination, redis cluster and redis cache all need some configuration or understanding of the underlying schema (for cassandra). Currently the only option we have is to define the schema multiple times in the configuration file which is not amazing.
To address this we either need to have a shared model, where each transform gets passed a reference or we make the config templateable (yuck?)

Colored connections

Currently there is no real mechanism or place to store information about the state of the connection that a transform is attached to. This could be things like, who the authenticated user is, whether the connection has been established etc.
We could probably provide a simple mechanism that transforms will have access to, to read and write state about the current connection they are attached to.

Connection setup

There is a pre-setup stage for transforms that gets called that could be better used

Stop abusing Clone trait

Currently we overuse clone on a transform chain. The main point is when we get a new connection, we clone an existing chain that has never been used. This is problematic as we limit ourselves to only synchronous function calls when we need to setup a new chain for a new connection. We should probably move this to a specific new function that is also async.
The other option would be to store the config struct in the source/listener. Then it creates a new transfrom chain from the config (which is already an async function).

Module structure refactor

The current set of modules are a bit of a mess and we should probably move to a Cargo workspace with multiple modules to simplify things. This will also dramatically speed up compilation. Currently we have to compile like 400 dependencies which is a bit insane.
Another approach would be to start making use of features to enable/disable certain parts of shotover to reduce build / test / iteration cycles.

Error handling audit

There are some areas and some transforms that just straight up swallow or log an error but don't do anything sensible with it.

benbromhead · 2021-06-21T02:31:37Z

benbromhead
Jun 21, 2021
Maintainer Author

Transform Chains

Currently transform chains only allow a single wrapper struct to traverse them at a single time. This due to the fact that the wrapper actually contains a set of mutable references to each transform in the chain and owns them for the life of the request. The method transform requires a mutable reference to the transform struct.

This make it easier to reason about, easier to manage on a per connection basis, but also (maybe) has an impact on performance as its harder to have multiple requests in flight across the chain structure.

Moving transforms to a generator approach, may allow us to keep the transform function semantics, simplify some things and be able to process multiple messages across a chain at any given point in time.

0 replies

Claudenw · 2021-08-27T22:29:15Z

Claudenw
Aug 27, 2021

RDF as a potential solution

The last place I worked we had a system where as a request moved through the system various modules/transforms would add data to the request. Since anyone could write a transform we never knew what data would be added to the request. To solve this we made the request an RDF graph. We stored the graph in a Fuseki (Apache Jena component) data store. Then any transform can access any data inserted by any other transform or process. This makes it possible to do Colored Connections easily. And also makes it easy to share information about what nodes are closest (latency wise) which nodes have duplicates of what data sources (though the clustering software already knows). I gave a talk about this application at ApacheCon 2 years ago (the last face-to-face meeting).

I used a similar system to manage a system with several disparate configuration files that had to be kept in sync. In effect we merged all the configs into one space and then could request the data point by any alias. We could also execute some logic to derive new data points.

Since RDF is generally not well known, I could give a talk about it and the solution we used for the application I mentioned above as well as an early applications that did data mapping.

0 replies

benbromhead · 2021-08-28T22:46:31Z

benbromhead
Aug 28, 2021
Maintainer Author

I think doing a talk / bit of education on this approach would be super useful. Do you have any background reading you would recommend?

0 replies

Claudenw · 2021-08-29T09:16:26Z

Claudenw
Aug 29, 2021

A very short introduction: https://sites.google.com/site/restframework/introduction-to-rdf
A slightly longer introduction: https://www.w3schools.com/XML/xml_rdf.asp

The above examples talk about writing RDF in XML as though that is the only format. RDF is a conceptual framework, RDF/XML is the XML serialization of a specific dataset. There are other, in my opinion easier to read, serializations. So don't get caught up in trying to read the RDF/XML just understand how it works.

The best source, an probably largest rabbit hole, is w3c itself. RDF is, like HTML, a w3c recommendation. The most recent W3C primer on RDF can be found at: https://www.w3.org/TR/rdf11-primer/

If you want to play with it check out the fuseki on docker
https://github.com/stain/jena-docker

Fuseki is part of the Apache Jena project (https://jena.apache.org) and a reference implementation for the W3C RDF and SPARQL recommendations.

Finally, I have a very old implementation of Jena storage layer on Cassandra. I have never tested it under load and it is based on old Jena code so it would need to be updated. The code is at https://github.com/Claudenw/jena-on-cassandra

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Maturing Shotover - part 1 #378

{{title}}

Replies: 4 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Maturing Shotover - part 1 #378

benbromhead Mar 27, 2021 Maintainer

Next Steps

Message/Messages structure

Shared config maps:

Colored connections

Connection setup

Stop abusing Clone trait

Module structure refactor

Error handling audit

Replies: 4 comments

benbromhead Jun 21, 2021 Maintainer Author

Transform Chains

Claudenw Aug 27, 2021

RDF as a potential solution

benbromhead Aug 28, 2021 Maintainer Author

Claudenw Aug 29, 2021

benbromhead
Mar 27, 2021
Maintainer

benbromhead
Jun 21, 2021
Maintainer Author

Claudenw
Aug 27, 2021

benbromhead
Aug 28, 2021
Maintainer Author

Claudenw
Aug 29, 2021