-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of DICOMs3Client based on DICOMwebProtocol #57
Comments
It isn't clear to me what you are trying to do here - can you ELI5? An architecture diagram would help (sorry I am not familiar with this project at all) |
As @pieper mentioned, I am working on a DICOMweb based archive format as well as a serverless DICOMweb implementation. Here are some links: |
@chafey, (AFAIK) this would be a python client for fetching data from a serverless DICOMweb implementation (rather than specifying paths directly). |
@ntenenz i am still confused - this is a client library which speaks dicomweb - shouldn't it work as is with any dicomweb implementation regardless of its architecture? I feel like I am missing something here |
I don't have enough context on the proposal to answer your question intelligently (does it match the existing DICOMweb spec and simply leverage a different form of implementation via static S3 blobs?). @pieper referenced it here: #56 (comment), so perhaps he's best positioned to discuss what he's thinking? |
The concept is still coming together, but the general idea is to leverage object storage as much as possible for scalability. Storing pregenerated DICOMweb responses in object storage enables many access use cases and layering on serverless compute (lambda, dynamodb, object lambda) to implement the other DICOMweb functionality. This has been a regular topic of the monday morning meetings: https://github.com/chafey/medical-imaging-community |
I've only been able to attend a couple meetings thus far due to a standing conflict. That being said, if the proposal ends up being API-compatible with the existing DICOMweb standard (i.e., a new method of implementation), then the existing |
In terms of this specific issue/request, I suppose another approach could be to implement the existing API, but have it run directly use AWS (S3, maybe other services). I am not sure if this is really worth it though as we should be able to get most if not all of DICOMweb implemented using servless as described above.. |
I think the pre-computed metadata works for archives (rarely changing data, like IDC where we plan monthly versions). @hackermd mentioned he wanted to use this client in a dynamic environment and was concerned about keeping static content in sync. |
We do want to trigger rebuilding of the metadata any time the source data changes so that shouldn't be a problem. Another option is to store DICOM P10 in S3 with wado-uri paths and then implement DICOMweb using serverless on top of it. It wouldn't be as fast, but it should be just as scalable. |
@chafey the |
Maybe you could start by taking a similar approach, just use S3 ListObjects API and parse each one to build the master list. After that is done, you can figure out how to optimize it (e.g. cache offsets/tags in the metadata as you suggested using an object lambda?) |
The main question to me is to which extent the data should be pre-indexed versus indexed dynamically by the client. The |
I was hoping that we could provide the same functionality for S3 buckets via the |
Better to pre-index it since S3 is a shared resource and building the index is expensive. Ideally this architecture would be portable to other languages so an index that is universally understandable would be good. Alternatively, you could package sqllite in a WASM module and create bindings for each language. Just thinking out loud here... |
Also - if you have < 100,000 studies you could simply store the contents of the index in a JSON file and build the indexes on the fly in memory |
In principle, the DICOM standard already provides such indices via DICOMDIR files. They are not pleasant to work with, but are very compact and already supported by most DICOM libraries. The Study Resource, Series Resource, and Instance Resource are (more or less) a JSON representation of DICOMDIR files. Either way, I agree that it would be highly desirable to be able to share the indices between applications. We could just add methods to That would even be very useful for DICOMweb origin servers, because they could then also just be pointed to an S3 bucket and could import the index into their database without having to re-import all the data! |
@chafey, could you elaborate on the motivation (either here on in Monday's meeting) for not leveraging a database for metadata? I completely understand how serving of pixel data could fully utilize a PACS resources. Is the assumption that QIDO requests (could) do the same? |
Databases are typically the performance and scalability choke point for medical image archives. You want to avoid them when possible and minimize them when you can't avoid them completely. For smallish archives (< 100,000 studies), they are not needed at all as the entire data set can easily fit in memory and you can implement the queries using simple scans (e.g. iterate over the series in a study to implement QIDO-RS StudySeries queries) or an in memory index (e.g. a map of patientIds to implement QIDO-RS Search Studies queries). Any queries that are constrained to a single study can be done entirely in memory without a database. The only time you need an index is when doing Study level queries for large archives and something like redis or no-sql are far better options than a RDBMS |
That is a big assumption. In pathology, not even a single study may fit in memory! Or do you just refer to the indexed metadata? |
I am just referring to metadata |
A few questions:
I'm wondering if an easier solution would be to lean on a vanilla datastore, either database or redis, for DICOM metadata (e.g., QIDO-RS queries) and pre-compute the instances/series/studies (or a subset with lambdas for aggregation/de-aggregation). I'd be curious to see whether any clinical or research workflow could strain such a setup. |
Many PACS systems store everything they need to respond to QIDO-RS/CFIND in an RDBMS. This is problematic because the DB becomes very large and also a chokepoint for data ingestion (each SOP Instance is an insert possibly causing rebalancing of indexes). The strategy I have used to reach incredibly high scale is to store the metadata on disk and read from it to handle queries below the study level - this can be done via pre-generation of metadata as we are discussing with the dicomweb-static project. When you take that approach, you just need to provide indexes for study level attributes (patient name, patient id, study date, etc) and can use any technology you want (RDBMS, NoSQL, Reddis, etc) as the problem is much simpler. The strategy is independent of DB technology - object storage is always cheaper than DB and performs well enough for the target use cases. The other advantage is you can do all sorts of analytics on the storage tier with map reduce without having to ETL/Sync it to another reporting db. Synchronization from server to client is a solved problem - there are many solutions for this (basically use pub/sub to distribute data store changes to clients over WebSockets and such) |
I am fully on board with storing instance metadata as documents and keeping those in memory for fast queries. We've run into several issues with RDBMS-based implementations when storing many instances via the DICOMweb Store Transaction. The main question to me is whether those documents should be cached (relative to the full instances) and whether other applications (in particular serverless clients such as the |
Let's try to find some time to discuss on Monday if the agenda isn't already spoken for. |
@chafey we may want to consider Supplement 223: Inventory IOD and Related Services. The proposed Inventory IOD could be used to represent and share a database index. |
@pieper @chafey As discussed in #56, it would be great to implement a
DICOMs3Client
based on DICOMwebProtocol.In general, I think we should consider the following aspects for the design of an S3 client:
(1) I've been thinking that we could put the metadata into the S3 Object Metadata. The PUT request header is limited to 8 KB in size, but that should be sufficient for the metadata. This could also include the offset to the Pixel Data, Float Pixel Data, or Double Float Pixel Data element (see 2).
(2) If the Pixel Data element contains a Basic Offset Table item, then the client could just read its value and cache the offsets locally. Otherwise the client would have to determine the offset of frame items by reading the header of each frame item. That could result in a lot of HTTP calls, so ideally we would make sure that the Basic Offset Table gets included into image data sets.
(3) To support writes (including parallel writes), we shouldn't store any aggregate study- or series-level information such as Number of Study Related Series in the object metadata, but dynamically compute it client-side.
What do you think?
The text was updated successfully, but these errors were encountered: