-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
annotation of timelines / units / ... #131
Comments
Hi @yarikoptic, I'm glad you mentioned this! There are a bunch of possibilities floating around in my mind. I'll just toss a couple out there to see what you think. Let's start with just adding text notes. I know that structured annotations (labels on units, etc) are more useful, but it's good to start with a simple case. From a user's perspective they should be able to click to add a note to any neurodata object within an NWB file that is loaded into Neurosift. That note could be visible to them and also to any other viewer of that particular NWB file. This could include top-level notes that would apply to the entire file. It gets interesting when we start to think about (a) where those notes would get stored, and (b) how they get stored there. For (b), let's assume we have figured out the github authentication stuff so that the web app (neurosift) will have the ability to act on GitHub on behalf of the user. I can think of a few possibilities for (a)
Thoughts? |
I was thinking of the 3rd option:
so that user has full control etc over it, we could benefit from whatever they want to configure it how (private, collaborators etc). neurosift could provide a github app which gets registered against account to manipulate that repo. Somewhere we just store the URL for that repo... may be even a cookie or may be there is a way to store some settings (like a cookie) within github account itself for the app so as soon as app is registered -- there is a way to discover which repo to contain annotations. |
Sounds good. |
good question/point... in principle it should all be up to a user. I guess there might be some configuration file where user could instruct one way or another, and we could default (template) it to be "announced" (or whatever best description of being shown to others) by default. |
How should we name the files in the repo containing the annotations Like if I am annotating 000582/sub-10073/sub-10073_ses-17010302_behavior+ecephys.nwb then would there be a file in the repo like this: dandisets/000582/sub-10073/sub-10073_ses-17010302_behavior+ecephys.nwb/annotations.jsonl or what? I was thinking of doing .jsonl (json lines) because then each action would get appended to the file. An action could be Or maybe there's a better scheme. |
here comes a tricky part -- if we want to associate it with content, better to use asset_id or blob_id since content could change under that path. But those are too cryptic etc. I see two principled ways
|
Makes sense. How about
so you can see that the path has the asset ID embedded in it. And now I'm thinking of doing json-lines (jsonl) for a different reason. Instead of append-only log, each annotation would be on a separate line. That way you can add-action, delete-action, replace-action, etc., and all these operations would be (a) small-deltas and (b) the commit changes would be human-readable. This would be more difficult with a .json file. |
Could you try this out when you have a chance? |
given that lists (and now even dicts) are ordered, I still do not see how jsonl would be beneficial, but that is ok -- you are the doer here, so do the way you see it best fit. |
NB dang chatgpt can't spell correctly... need to send a PR |
@yarikoptic We'll need to think about how to share these annotations. What if I do some annotations (unit labels, or whatever), and then I want you to see them. What would I do? |
yeap... note: now it is under Some thoughts:
But would be nice if people could benefit from discovery of annotations present for any given dandiset, for that we need to either
|
BTW, the question is not entirely unlike our discussion on association of notebooks with dandisets with @bendichter and @waxlamp |
So it would be dandisets/dandi/000001/... ? Or dandi/dandisets/.... ? Or dandi/dandi/... ? One consideration. GitHub apps can only make 5000 API requests per hour (per user). So if we had 10 separate repos contributing annotations to a particular NWB file, then whenever the page gets reloaded we might end up with around 20-30 api requests. What I propose is to have a neurosift-annotations database that mirrors the annotation contents in github repos. This serves two purposes: (1) avoid excessive github api calls for the app - retrieve data from the database instead, documents expire every 60 seconds or something in case the contents get modified using a method other than neurosift UI; (2) allows retrieving annotations across all repos that have ever been loaded in neurosift. Note that the github repos would still be the source of truth for all annotations. |
I thought of
aggregating DB or some other way indeed probably would eventually be needed, but I wonder if we should get to hitting closed to the limits first which might give better grounds for optimizations? ;-) |
But I don't know how else to retrieve annotations across all repos. I feel like this is an important step... and we still have the repos as source of truth... |
My point is that ATM nobody yet has even capability to link multiple repositories and do not see people getting that many linked. May be we will hit limits even without that somehow and would need to abandon "(ab)use github" idea entirely. So why not to make it available, advocate and monitor if/when we get close to hitting those limits? FWIW in cron jobs of https://github.com/con/tinuous we also include output of
just to see if we do error out -- how close we were to hitting the limits. I wonder if it is worth querying them once in a while and reacting somehow when getting close to it so we would get feedback on that. |
But I'm not just talking about the rate limit issue. See my reason (2) above. Without an aggregation database I don't know how to discover other annotations from other repos. |
if there is a unique filename/path (e.g. some |
Okay I see. That lookup tool looks pretty useful. Does it do a github search for that key file through the github api? I suppose this would only work for public repos. I think I will move forward with the aggregated database approach since that is going to be easier for me. But I will make sure that the gh repos stay as the source of truth, so we could transition away from this if needed, to use a pure gh solution. I really would like an efficient way of querying all annotations for a particular nwb file... and if it is going to be done in a single request, it requires a query-able database. |
yes. here: https://github.com/datalad/datalad-usage-dashboard/blob/master/find_datalad_repos/github.py#L91 NB results of those discoveries then used to populate/update http://registry.datalad.org/ which you could navigate/query more interactively
I think so too.
agreed that would require a single aggregated source. But IMHO the "DB" could be a (JSON or YAML) file for this purpose. But sure thing -- proceed you see it fit and which is easier for you. If anything -- could be redone later. Cheers and Thanks! |
if DB: if there would be API or a way to check/get annotations (and may be their number) per each dandiset and/or path within -- we should then look into adding that to dandiarchive web UI. Would be great to have one more target use case in addition to notebooks so we finally come up with some "generic way" to integrate with such external resources, hopefully similarly flexible as we do for external services linkage for individual files. |
Sounds great! |
@yarikoptic I have something working Here's an example where I have put a top-level note in a public repo. You should be able to see it if you log in to neurosift-annotations. This has the following properties:
|
I remember that there was some initial prototype for annotation . I wonder in what stage that development is? and how/what could be a way to integrate with annotation of data on DANDI? (e.g. may be as a github app of some kind to deposit actual annotations to github per each user)
The text was updated successfully, but these errors were encountered: