-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Friendly complementary project #66
Comments
@windelbouwman sorry for late reply. |
No problemo! The TSDB is all in memory, so it might even be more a simple data structure, and it supports timestamp / f64 pairs as well as timestamp / string pairs. I guess you could always convert the data to a string as a catch all case? Strings are rendered as text, f64 values as a chart. |
Thanks for the response. HawkTracer events are more like JSON structures, and we'd like to be able to render some of the fields on the graph. E.g. we can have event like:
and we'd like to use |
Okay, now I understand what you mean by structured format. No, lognplot cannot handle this directly, the simplest solution to this would be to write a python adapter script which takes in this format and forwards it to the lognplot GUI. Take a look for example at this MQTT adapter script: https://github.com/windelbouwman/lognplot/blob/master/adapters/mqtt_adapter.py For now, I deliberately kept the data ingestion API super simple, and made some external python adatper scripts, since python lends itself very well for this job. In the case of hawktracer, one would process each structured event, take the timestamp, and process each field and forward it to the lognplot GUI tool. So, no, the database does not support these kind of events as a whole. What would be the benefit of this vs splitting out the event in seperate traces? |
@windelbouwman thanks for explanation. I don't think there's any particular benefit of supporting unstructured data in the TSDB other than simplicity, but I see that having adapter for that would completely solve the problem. |
That would be an interesting idea. Currently my incentive is low, since I do not use hawktracer yet. My approach would be to take the hello world example, run the demo, and then collect the data as a json stream. My question here would be: is there a mechanism to convert the binary stream to json while the stream is live? Of is this only possible as a post processing batch operation? The idea would be this chain: hello-world-c.exe --> TCP/IP (binary trace data) --> conversion from binary into JSON --> TCP/IP (json data) --> lognplot adapter script.py --> TCP/IP (lognplot binary protocol) --> lognplot GUI Is there a script which converts the hawktracer binary data into a JSON stream? Or is there a python library to decode the incoming binary hawktracer data? |
Even though you don't use HawkTracer yet, I think it'd be valuable for the lognplot project to do the integration as it might bring quite a few new users to the project (at the moment HawkTracer does not have 1st class UI integraiton at all, so I'm sure many users, including myself, would like to try it out). There's a python example for reading HawkTracer stream and convert it to python dictionary and then plots the data: https://github.com/amzn/hawktracer/blob/master/bindings/python3/plot-events.py Many thanks for doing this, I think this integration will be beneficial for users of both projects! |
Allright, this example looks straight forward, and I know how to hook it up to lognplot GUI. This seems rather easy to take the applicable parts of the plot-events.py and the mqtt-adapter.py scripts. |
Okay, a first prototype is here: Things to note:
Not all events are handled, so there might be more interesting events. |
That looks great, thanks for that! |
@windelbouwman I've looked closer into the project; am I right that python and rust are 2 completely separate implementations? Somehow I assumed that the python lognplot is just a wrapper for Rust, but from what I see in the source code, it's not. What's the reason of having 2 separate implementations rather than having a binding layer in Python? Many thanks |
Hi, yes, it's two fully redundant implementations. The python version lends itself for experiments, but is many times slower. I did not benchmark this, but my advice is to use the rust variant, since it has more performance and contains less bugs. |
Thanks
I use latest stable rust compiler ( |
Okay, this is interesting, the project should compile well. This might be caused by the fact that there is no Latest rust version should be good. |
Thanks, |
Okay, that explains. Please update the lognplotgtk/Cargo.toml file to use ndarray 0.13 instead of 0.12, that should fix the issue. Btw, to understand this problem, hdf5 uses ndarray 0.13.1 and lognplotgtk uses 0.12.1. This is caused by the relax spec for the hdf5 version (and actually the missing Cargo.lock file). Having two versions of ndarray prevents from passing ndarray 0.12 data to ndarray 0.13 functions (in this case). The HDF5 support was an option in the past, maybe it should be an build option again? |
That worked, thanks!
but that gave me a bit unexpected result; in the UI, instead of aggregating all the fields into the same label, there's a lot of duplicated labels with |
The |
I see, thanks for the explanation. |
Ah, that's interesting. For now for sure, the best fixing place would be in de adapter script in the pull request, simply ignoring events at an earlier time. The internal data structure of lognplot expects monotonically increasing timestamps for subsequent events, and this is not something that will change probably. |
I'm afraid this issue will be a blocker for integrating HawkTracer with the lognplot library as multi-threaded environment is quite a common usecase for the profiler, and ignoring some events comes with some implications:
As mentioned, the first problem is not a big one, but the second one sounds to me like an issue which will block the integration. |
The main reason is performance, since the data is appended in increasing time, a quick query can be made on intervals. This is one of the reasons why you are able to smoothly zoom / pan data sets with millions of points. This, combined with the fact that events occur at increasing timestamps (after one another) led to the current implementation. I don't think discarding data is a good idea, all events should be preserved. For my understanding of hawk tracer, I have some questions:
|
Events are timestamped when the event happens in the system, not when the buffer gets flushed
Monotonic clock is used for internal implementation and timestamping events, but users (if they really want to) can override the timestamp and e.g. for specific group of events can use different clocks.
Yes, different threads can emit the same type of event (e.g. function Overall, HawkTracer is not very strict regarding the values of the events being passed around - that makes it more flexible and suitable for many usecases. We don't want to put more restrictions on that, as that could break quite a few existing usecases. Since in most of the cases incomming events are (more or less) sorted by timestamp, would it be possible in lognplot to sort the data set on insert (I guess some kind of implementation of insertsort would be optimal, as the data is likely to be added somewhere to the end of the dataset)? It could be an optional feature so we could enable it only when really needed (e.g. for HawkTracer usecase)? Thanks |
This is great news, so events will have increasing timestamps!
So one solution would be to include the thread ID in the name of the event, as a sort of prefix? Or would this only work when users explicitly attach the thread ID?
Agreed, this API should remain flexible.
Sure, I guess this could be added, but for the moment being I would rather implement this logic in the adapter script. My proposal would be to have a sort of ordered queue of events for each signal name, and only forward events when they have a certain age. This would mean in python something like this: queues = []
name_to_queue = {}
def on_sample(trace_name, timestamp, value):
q = name_to_queue[trace_name]
q.insert_ordered((timestamp, value))
def once_in_a_while():
for q in queues:
while q.first.timestamp is old_enough:
lognplot.send_value(trace_name, q.pop_first()) Something like that. Is there an example application which has the behavior you describe? I do not know how to reproduce the issue with the backup signals. I guess it has to be a multithreaded demo? |
User has the flexibility to attach the thread (or not). Also, even if user decides to attach the thread, it's up to the user to come up with the field name (it doesn't have to be always
No, actually it doesn't even have to be multithreaded demo. If you consider the following example:
In this case, HawkTracer generates 2 events (one for measuring scope of bar, and one for foo). The callstack would look something like that:
Since the You should be able to reproduce this by running one of the hello-world examples, e.g. https://github.com/amzn/hawktracer/blob/master/examples/tutorials/hello_world/hawktracer-hello-world.cpp The proposal with keeping track of age would work for many cases, unfortunately, I saw quite a few cases where people do:
That means, that the One solution to overcome it would be to generate 2 events (start and stop) for the function, but that again is just a pre-defined HawkTracer event, that means that user might implement the old behaviour anyway. Also, we try to avoid having 2 events for one function call to save some bandwidth (in many cases, the bandwidth is critical). I'm happy to consider other ideas as well, but at the moment nothing that works comes to my mind :) |
Aw snap, I understand the problem now. So the issue is, an event can be received with a timestamp much longer ago, since the timestamp will be of the start event. The value of the event will be the duration of that event, say the function call. One solution would be as you suggest, emit a start event and a stop event for functions. Another option would be to take the timestamp of the function end event as timestamp. Both options are pretty lame I must say. Too bad, it would be nice to visualize hawktracer output in realtime, instead of via post processing. lognplot is realy intended for live data. One option would be to add code which allows measurements to be inserted in the middle of a trace, but this would add extra code, which I'm reluctant to add. The usecase for the backup signal is that if a trace starts over, say from zero, it is added as a new trace. This may happen in embedded systems where the clock starts at zero after a reset. |
Hi,
Thanks for the talk at fosdem! Interesting stuff. This might be interesting as a visualization tool for incoming data:
https://github.com/windelbouwman/lognplot
The text was updated successfully, but these errors were encountered: