Skip to content

Commit

Permalink
Docs/uc (#164)
Browse files Browse the repository at this point in the history
* docs(UC): Add use cases

* test [branch ch46]
  • Loading branch information
zflamig authored and philloooo committed Sep 25, 2018
1 parent 9094dc0 commit 9123541
Show file tree
Hide file tree
Showing 3 changed files with 23 additions and 5 deletions.
24 changes: 21 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
Indexd
===
![version](https://img.shields.io/badge/version-0.0.1-orange.svg?style=flat) [![Apache license](http://img.shields.io/badge/license-Apache-blue.svg?style=flat)](LICENSE) [![Travis](https://travis-ci.org/uc-cdis/indexd.svg?branch=master)](https://travis-ci.org/uc-cdis/indexd)
![version](https://img.shields.io/github/release/uc-cdis/indexd.svg) [![Apache license](http://img.shields.io/badge/license-Apache-blue.svg?style=flat)](LICENSE) [![Travis](https://travis-ci.org/uc-cdis/indexd.svg?branch=master)](https://travis-ci.org/uc-cdis/indexd)

Indexd is a prototype data indexing and tracking service. It is intended to be
Indexd is a data indexing and tracking service. It is intended to be
distributed, hash-based indexing service, designed to be accessed via a
REST-like API or via a client, such as the
[reference implementation](https://github.com/uc-cdis/indexclient).
Expand All @@ -17,6 +17,24 @@ Digital IDs are intended to be publicly readable documents, and therefore contai

The second layer of user defined aliases are introduced to add flexibility of supporting human readable identifiers and allow referencing existing identifiers that are created in other systems.

## Use Cases For Indexing Data

Data may be loaded into Indexd through a few different means supporting different use cases.

1. Index creation through Sheepdog.

When data files are submitted to a Gen3 data commons using Sheepdog, the files are automatically indexed into indexd. Sheepdog checks if the file being submitted has a hash & file size that match anything currently in indexd and if so uses the returned document GUID as the object ID reference. If no match is found in Indexd then a new record is created and stored in Indexd.

2. Indexing files on creation in object storage.

Using AWS SNS or Google PubSub it is possible to have streaming notifications when files are created, modified or deleted in the respective cloud object storage services (S3, GCS). It is then possible to use an AWS Lambda or GCP Cloud Function to automatically index the new object into Indexd. This may require using the batch processing services on AWS if the file is large to compute the necessary minimal set of hashes to support indexing. This feature can be set up on a per commons basis for any buckets of interest. The buckets do not have to be owned by the commons, but permissions to read the bucket objects and permissions for SNS or PubSub are necessary.

For existing data in buckets, the SNS or PubSub notifications may be simulated such that the indexing functions are started for each object in the bucket. This is useful because only a single code path is necessary for indexing the contents of an object.

3. Using the Indexd REST API for record insertion.

In rare cases, it may be necessary to interact directly with the Indexd API in order to create index records. This would be necessary if users are loading data into a data commons in non-standard ways or not utilizing Sheepdog as part of their data commons.

## Documentation

[View in Swagger](http://petstore.swagger.io/?url=https://raw.githubusercontent.com/uc-cdis/indexd/master/openapis/swagger.yaml)
Expand Down Expand Up @@ -94,7 +112,7 @@ py.test -v tests/

## Testing with Docker

Doesn't work with all the DB tests yet, but you can adjust to run specific tests as necessary.
Doesn't work with all the DB tests yet, but you can adjust to run specific tests as necessary.

```
docker build -t indexd -f TestDockerfile .
Expand Down
2 changes: 1 addition & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
flask==0.10.1
flask==0.12.4
jsonschema==2.5.1
sqlalchemy==1.0.8
sqlalchemy-utils>=0.32.21
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@
]
},
install_requires=[
'flask==0.10.1',
'flask==0.12.4',
'jsonschema==2.5.1',
'sqlalchemy==1.0.8',
'sqlalchemy-utils>=0.32.21',
Expand Down

0 comments on commit 9123541

Please sign in to comment.