Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grapher could support computed traces as a display option. #181

Open
btchiaro opened this issue May 24, 2016 · 28 comments
Open

Grapher could support computed traces as a display option. #181

btchiaro opened this issue May 24, 2016 · 28 comments
Milestone

Comments

@btchiaro
Copy link

It would be really useful if the grapher could plot computed traces. For example in the ramsey function we store the envelope as a data column, although this is redundant data. Another example is in the rapid RTO, the raw output of a 1,0 timeseries is not human readable and not worth displaying in the grapher. It would be nice to save the raw binary data, but have the grapher plot the spectrum after Fourier analysis.

@btchiaro
Copy link
Author

btchiaro commented Jun 1, 2016

Bump. I just deleted a computed trace from a scan that I often use and merged this change to master. This measurement takes a lot of time and it is useful to watch the data come in. This would be a really useful feature.

@joshmutus
Copy link
Contributor

Yeah, this would be a really useful feature, but will take serious thinking to implement properly as it might need changes to the datavault server, the format of storage on the datavault and the grapher itself.

@btchiaro what is the actual formula for the data you would want to have plotted? There may be a way to add part of this feature on the frontend, but it would be nice to know what you need to plot.

@btchiaro
Copy link
Author

btchiaro commented Jun 2, 2016

I'm storing the data from each tomography phase as a data column, but what
I want to plot is the envelope value. So the formula I have in mind is
envelope = np.sqrt((data[0]-data[2])**2 + (data[1]-data[3])**2). We don't
want to just store the envelope to the dataset because it is redundant.

On Wed, Jun 1, 2016 at 8:31 PM, Josh Mutus [email protected] wrote:

Yeah, this would be a really useful feature, but will take serious
thinking to implement properly as it might need changes to the datavault
server, the format of storage on the datavault and the grapher itself.

@btchiaro https://github.com/btchiaro what is the actual formula for
the data you would want to have plotted? There may be a way to add part of
this feature on the frontend, but it would be nice to know what you need to
plot.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWl2lmsvNviVrIXyKQ_ZUa46FVmUqCks5qHk6sgaJpZM4Il-yo
.

@DanielSank
Copy link
Member

DanielSank commented Jun 2, 2016

@joshmutus I think we could do this by adding properties to the dataset. No need for modifications to the datavault server. This is just a rendering issue.

It seems totally reasonable to read a parameter from the dataset which could even just be a bit of js that says how to compute the extra curves.

If we don't like allowing arbitrary code to be run from user data (which does indeed sound kinda jank) we could parametrize some of the most common curves.

@btchiaro
Copy link
Author

btchiaro commented Jun 2, 2016

Another example is when I make the noise spectrum measurements, the raw 1,0 output is not human readable at all. It would be nice to save the raw 1,0 time series, but plot the power spectral density using our code in pyle. Ideally, we could access whatever processing parameters are exposed in the pyle functions e.g. 'frequency_average' for the rapid rto data. This would be extremely useful to me. This is also kind of an extreme case since this type of processing requires other datasets as inputs (spectroscopy_z_func data to calibrate frequency noise to flux noise) I'm not sure I like using a dataset parameter, I think we want this to be mutable. I can easily imagine people wanting to plot the data in different ways, for example as above. I think it would be really good to allow the user to call pyle processing code for the plotting.

@joshmutus
Copy link
Contributor

The idea is that you could have a dataset parameter that would dictate the default way it was plotted. You could still do whatever you wanted with it later.

@DanielSank
Copy link
Member

DanielSank commented Jun 2, 2016

but plot the power spectral density using our code in pyle

@btchiaro It sounds like you want to put the pyle analysis code into the web browser. Is this because you're looking for a graphical intuitive way to browse through your processed data? Let's try to distill the thing you want without the context of the grapher, and then we can decide how to implement it.

@joshmutus
Copy link
Contributor

@btchiaro we should probably chat about this Friday after group meeting. I can't think of an easy way to implement pyle in a browser.

@btchiaro
Copy link
Author

btchiaro commented Jun 6, 2016

@DanielSank I'd like to be able to watch my data come in in some human
readable format. The grapher is a nice, organized, easy access data
repository and real-time viewer. If we're going to be more strict about
having computed traces stored with the datasets than it would be nice to
have the grapher be able to generate them, but I don't know where to draw
the line on what the grapher should be able to do. It sounds like the case
that I mentioned with showing the Ramsey envelope computed from the stored
tomography data should be doable so that would be a great start.
Displaying the noise data in a useful form seems more difficult. The raw
noise data is not human readable at all, so it would be nice to be able to
view spectra in the grapher, either as it comes in or some time after the
fact. However, It sounds like to processing required to make the data
human readable is just too much to put into the grapher. I suppose I could
do the data processing outside the grapher and write additional processed
files to the datavault, but this seems like it would violate data
redundancy best practices. If what I want is to see properly scaled noise
spectra in the grapher, what do you think is the best way for me to get
there? What is the major challenge with having the grapher call arbitrary
logic from, say, pyle to generate the displayed data?

On Wed, Jun 1, 2016 at 9:35 PM, Josh Mutus [email protected] wrote:

@btchiaro https://github.com/btchiaro we should probably chat about
this Friday after group meeting. I can't think of an easy way to implement
pyle in a browser.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWl5VsSEw2jOwB7aTXM3deJFcbhcdwks5qHl2PgaJpZM4Il-yo
.

@DanielSank
Copy link
Member

This is an interesting question! I think @btchiaro is saying that it would be nice to easily associate processed data and plots with the raw data entries in the data vault, and view them in the same application as the grapher. Right now I'm not really sure how to do that, although I do have an interesting idea: put the analyzed data in Drive (slides or otherwise) and link to that document in the comments box in the grapher.

At this point I would like to formally say to @maffoo that he was right about hyperlinks being a good reason to rewrite the grapher using the web. You were right and I was wrong :)

@jwenner
Copy link
Contributor

jwenner commented Jun 6, 2016

Instead of dealing with the difficulty (and security risk) of running arbitrary (possibly Python) code in the grapher, I would propose the following:

  1. For after-the-fact, either follow Dan's suggestion above or do as Ben said of writing the processed data to the data vault (with a comment to link the raw and processed data sets). Although, do we still have a comments box? I know we had one in the Delphi grapher...
  2. For realtime processing, I would propose to do the plotting using strictly pyle. The data would then be fetched using the DataVaultWrapper (although we would need to ensure that refreshing the cache works - see martinisgroup/pyle#1285).

@btchiaro
Copy link
Author

btchiaro commented Jun 6, 2016

I think that there is a pretty limited security risk if we restrict the
code to pyle/master. What if the grapher had a checkout of pyle/master and
you could specify functions to be called on the data from functions in that
repo? My mention of writing separately processed data seems bad from a
redundancy point of view (Eliminating redundant traces is actually the
motivation for this issue). Being able to put hyperlinks as dataset
parameters could provide some interesting opportunities though.

On Mon, Jun 6, 2016 at 2:12 PM, Jim Wenner [email protected] wrote:

Instead of dealing with the difficulty (and security risk) of running
arbitrary (possibly Python) code in the grapher, I would propose the
following:

  1. For after-the-fact, either follow Dan's suggestion above or do as
    Ben said of writing the processed data to the data vault (with a comment to
    link the raw and processed data sets). Although, do we still have a
    comments box? I know we had one in the Delphi grapher...
  2. For realtime processing, I would propose to do the plotting using
    strictly pyle. The data would then be fetched using the DataVaultWrapper
    (although we would need to ensure that refreshing the cache works).


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWlykFsusP7c8AIpTU49bWSkNFj774ks5qJI0kgaJpZM4Il-yo
.

@jwenner
Copy link
Contributor

jwenner commented Jun 6, 2016

@btchiaro, note this scalabrad-web project is a public project which other groups are using. As such, the approach I could see for what you suggest is having an environment variable specifying external projects to use for plotting. @joshmutus, is this even possible?

@btchiaro
Copy link
Author

btchiaro commented Jun 6, 2016

Yea, what I'm thinking is that we have our own instance of the grapher
running on a server somewhere. It would be cool if we could attach a pyle
checkout to that instance. Like a plug-in.

On Mon, Jun 6, 2016 at 2:30 PM, Jim Wenner [email protected] wrote:

@btchiaro https://github.com/btchiaro, note this scalabrad-web project
is a public project which other groups are using. As such, the approach I
could see for what you suggest is having an environment variable specifying
external projects to use for plotting. @joshmutus
https://github.com/joshmutus, is this even possible?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWl3xd4j0EvAbMfVXpM0PfeEJu8Xgbks5qJJFkgaJpZM4Il-yo
.

@joshmutus
Copy link
Contributor

So @maffoo and I talked about this briefly and what you're talking about is making scalabrad web into a fully featured analysis tool, which is waaaay beyond the scope of this project. Generally speaking adding simple features introduces all sorts of interactions and bugs and makes a project super hard to maintain. Adding a complex features like this are particularly daunting. We can talk about it in detail later. Basically numpy doesn't exist for javascript and all the things you take for granted in pyle don't exist in the browser.

I don't see the problem of storing a computed trace in this case. You have all the code on your end to compute it and adding it is trivial. It's not like we're hard up for hard drive space and the live update is a useful feature. Why do we have to kill ourselves for the DRY principle here? @DanielSank @maffoo

@DanielSank
Copy link
Member

Storing computed data in the datavault is certainly possible and in particular there's no way to stop people from doing that.

However, I wouldn't. I prefer to keep the thing I collected the experiment well separated from everything else in the project. In my mind, data has a very special role as completely immutable and un-erasable. I prefer to use more user-friendly tools like Google Drive, which support link sharing, editing, commenting, etc. for my analysis and general "lab notebook" style work.

tl,dr: I recommend using the data vault for storing raw data and nothing else. Use more appropriate tools for analysis. See IPython notebooks, for example.

@joshmutus
Copy link
Contributor

joshmutus commented Jun 6, 2016

But that (ipython notebook) doesn't solve the use case of live-view

@DanielSank
Copy link
Member

But that doesn't solve the use case of live-view

Neither does storing processed data.

We can use derived traces to make live-view a little nicer, but I think processed data is a totally separate issue (already commented on it in my previous post).

@joshmutus
Copy link
Contributor

I don't understand how storing processed data doesn't solve the live-view use case. You create a separate processed dependent and it live updates as you take data?

@jwenner
Copy link
Contributor

jwenner commented Jun 6, 2016

I would say there are multiple separate live-view cases:
a. Where the data-processing is, e.g., point-by-point. Here, it should be possible to save processed data alongside raw data.
b. Where the data-processing depends on an entire row/column of data (for instance, a 2D swap spectroscopy, with data-processing being T1 vs bias/freq.)
c. Where the data-processing depends on the entire dataset (e.g., the noise spectrum)

In (b) and (c), it's probably best to use pyle for plotting so long as we can refresh the cache (martinisgroup/pyle#1285). I guess the question here is what to do about (a). Am I right @btchiaro @joshmutus?

@joshmutus
Copy link
Contributor

joshmutus commented Jun 6, 2016

Yes, thank you for clarifying @jwenner
I'm only referring to case (a). I see (b) & (c) outside the scope of the grapher and agree with @DanielSank about the iPython notebook, etc.

@btchiaro
Copy link
Author

btchiaro commented Jun 6, 2016

I'm personally OK with storing processed data along with the raw data.
That's what I had been doing and it worked fine for my purposes. The issue
here is that some code reviewers don't want processed data columns to be
present in scans that are merged into master. If we cannot merge that code
into master than it is really difficult to maintain scans with processed
data columns. I think we need to reach a decision as a group whether
processed traces can be present in master branch scans. If Its fine with
the group, this is a fine solution for me. At least for the time being.

On Mon, Jun 6, 2016 at 3:38 PM, Josh Mutus [email protected] wrote:

Yes, thank you for putting clarifying @jwenner
https://github.com/jwenner
I'm only referring to case (a). I see (b) & (c) outside the scope of the
grapher.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWlzsyeKxVOADxWoKXxxTE6IOMsPuwks5qJKFIgaJpZM4Il-yo
.

@joshmutus
Copy link
Contributor

@btchiaro does your processing update in the live view or does the whole dataset have to be taken first?

@btchiaro
Copy link
Author

btchiaro commented Jun 6, 2016

I had been storing the computed trace point by point and it was present in
liveview.

On Mon, Jun 6, 2016 at 3:43 PM, Josh Mutus [email protected] wrote:

@btchiaro https://github.com/btchiaro does your processing update in
the live view or does the whole dataset have to be taken first?


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#181 (comment),
or mute the thread
https://github.com/notifications/unsubscribe/AIeWlxuAav8nr96ZyHdCqmeKtq8cY8dBks5qJKKNgaJpZM4Il-yo
.

@maffoo
Copy link
Contributor

maffoo commented Jun 6, 2016

I would be ok with having a computed saved column in a dataset. This kind of reminds me of the idea of denormalizing data in a database (basically, storing redundant or derived data to improve read performance, or in our case, to avoid having to recompute it all the time). One simple example would be storing both P_0 and P_1. Clearly P_1 can be computed as 1 - P_0, but it's just easier to store both. (For that matter, the fact that we store probabilities instead of raw counts is also worth noting here.)

That said, I think this should be used judiciously, for things where there is one "obvious" way to compute the derived data column, because once it is stored it is immutable. If you're trying to compute and store something and then later decide that the computation needs to be modified slightly, then we're going to have a problem because all that old data is fixed. I don't know the specific case that @btchiaro is referring to, so I don't what to think about that. @btchiaro, can you give some specifics?

@joshmutus
Copy link
Contributor

Disclaimer: I'm totally unfamiliar with @btchiaro's code.

We store "processed" data all the time in say, T1 where we convert from IQ point to one state probability. What the difference here?

@joshmutus
Copy link
Contributor

Can the computed column be tied to a commit so we know what code is used to compute it?

@btchiaro
Copy link
Author

btchiaro commented Jun 7, 2016

@maffoo the use case that brought this all up is the rapid Ramsey scan that I wrote. This is just a Ramsey scan that is where the stats are sampled without qubit reset at a user defined sampling interval. The issue there was that I was storing data from each tomography phase and and also storing the Ramsey fringe envelope computed from those phases. During code review it was suggested that this is redundant information that should be computed by the grapher rather than stored as it's own trace.

The other use case that I think would be useful is to save the raw binary data stream that is generated by the rapid rto, but also save the noise spectrum that is revealed through Fourier analysis. The in this case the raw data is not human readable at all and requires fairly significant processing to get it into a useful form. That said there is a pretty clear "right" way to plot this data, and I think that it would be useful to store that processed trace for easy access through the grapher.

@joshmutus, the T1 example ocurred to me too and I considered it for a while. I had thought that the logical implication of the no processed data view point was that we should only ever save raw IQ data, and so even the idea of averaging over stats should be forbidden. Thinking about it more though, I think that the issue is redundancy.. Recording p1 data is not redundant unless you're storing the raw IQ also. You kind of get to set the "initial resolution" of your raw data, with out breaking the redundancy policy.

All that said. I think that storing processed data alongside the raw data can be useful both from a convenience standpoint, and also from a preservation stand point. It often happens that the format of a scan and / or its associated processing code is changed. When this happens it can be difficult to go back in time and plot an old data set if all you have is the raw ( potentially human un-readable) trace. Just having a processed trace right alongside the raw data can save a lot of time when stuff like this happens and you want to quickly compare with an old dataset.

@jwenner jwenner added this to the v2.1 milestone Jun 30, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants