Skip to content

Commit

Permalink
Merge pull request #551 from xdelox/master
Browse files Browse the repository at this point in the history
feature/548_add_column_mode_mongo_sink
  • Loading branch information
frbattid committed Nov 4, 2015
2 parents e969b5d + 7247499 commit 9b0d63a
Show file tree
Hide file tree
Showing 8 changed files with 287 additions and 23 deletions.
1 change: 1 addition & 0 deletions CHANGES_NEXT_RELEASE
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,4 @@
- [HARDENING] Explain the batching mechanism in the performance document (#564)
- [HARDENING] Fix the documentation regarding the installation from sources (#596)
- [BUG] Add extra fields (servicePath in row-like modes, servicePath, entityId and entityType in column-like modes) to Hive tables (#598)
- [FEATURE] Column-like persistence in OrionMongoSink (#548)
100 changes: 95 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -476,23 +476,109 @@ NOTES:

MongoDB organizes the data in databases that contain collections of Json documents. Such organization is exploited by [`OrionMongoSink`](doc/desing/OrionMongoSink.md) each time a Flume event is taken from its channel.

Assuming `mongo_username=myuser` and `should_hash=false` as configuration parameters, the data within the body will be persisted as:
According to different combinations of the parameters `datamodel` and `attr_persistence`, the system will persist the data in different ways, as we will describe below.
Assuming `mongo_username=myuser` and `should_hash=false` and `data_model=collection-per-entity` and `attr_persistence=row` as configuration parameters, then `OrionMongoSink` will persist the data within the body as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
vehicles 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels_car1_car
system.indexes
> db['sth_/4wheels_car1_car'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db82"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrName" : "speed", "attrType" : "float", "attrValue" : "112.9" }
{ "_id" : ObjectId("5534d143fa701f0be751db83"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "74.6" }

If `data_model=collection-per-entity` and `attr_persistence=column` then `OrionMongoSink` will persist the data within the body as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
4wheels_car1_car
sth_/4wheels_car1_car
system.indexes
> db.4wheels_car1_car.find()
{ "_id" : ObjectId("5534d143fa701f0be751db82"), "recvTime" : ISODate("2015-04-20T12:13:22.41Z"), "attrName" : "speed", "attrType" : "float", "attrValue" : "112.9" }
> db['sth_/4wheels_car1_car'].find()
{"_id" : ObjectId("56337ea4c9e77c1614bfdbb7"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "speed" : "112.9", "oil_level" : "74.6"}

If `data_model=collection-per-service-path` and `attr_persistence=row` then `OrionMongoSink` will persist the data within the body in the same collection (i.e. `4wheels`) for all the entities of the same service path as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels
system.indexes
> db['sth_/4wheels'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db82"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "attrName" : "speed", "attrType" : "float", "attrValue" : "112.9" }
{ "_id" : ObjectId("5534d143fa701f0be751db83"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "74.6" }
{ "_id" : ObjectId("5534d143fa701f0be751db84"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car2", "entityType" : "car", "attrName" : "speed", "attrType" : "float", "attrValue" : "123.0" }
{ "_id" : ObjectId("5534d143fa701f0be751db85"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car2", "entityType" : "car", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "40.9" }

Note: The first two documents were generated by the above flume-event, while the last two documents (`"entityId" : "car2"`) were originated by another event (not shown here).
We have left these documents in order to show that the same collection stores data of different entities, unlike what it happens with other value of `data_model` parameter.

Similarly, if `data_model=collection-per-service-path` and `attr_persistence=column` then `OrionMongoSink` will persist the data as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels
system.indexes
> db['sth_/4wheels'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db86"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "speed" : "112.9", "oil_level" : "74.6" }

If `data_model=collection-per-attribute` and `attr_persistence=row` then `OrionMongoSink` will persist the data as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels_car1_car_speed
sth_/4wheels_car1_car_oil_level
system.indexes
> db['sth_/4wheels_car1_car_speed'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db87"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrType" : "float", "attrValue" : "112.9" }
> db['sth_/4wheels_car1_oil_level'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db87"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrType" : "float", "attrValue" : "74.6" }

Finally, the pair of parameters `data_model=collection-per-attribute` and `attr_persistence=column` has no palpable sense if used together, thus **DON'T USE IT**.
In this case, in fact, `OrionMongoSink` will not persist anything; only a warning will be logged.

NOTES:

Expand Down Expand Up @@ -861,6 +947,10 @@ cygnusagent.sinks.mongo-sink.db_prefix = sth_
cygnusagent.sinks.mongo-sink.collection_prefix = sth_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# specify if the sink will use a single collection for each service path, for each entity or for each attribute
cygnusagent.sinks.mongo-sink.data_model = collection-per-entity
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.mongo-sink.attr_persistence = column

# ============================================
# OrionSTHSink configuration
Expand Down
4 changes: 4 additions & 0 deletions conf/agent.conf.template
Original file line number Diff line number Diff line change
Expand Up @@ -170,6 +170,10 @@ cygnusagent.sinks.mongo-sink.db_prefix = sth_
cygnusagent.sinks.mongo-sink.collection_prefix = sth_
# true is collection names are based on a hash, false for human redable collections
cygnusagent.sinks.mongo-sink.should_hash = false
# specify if the sink will use a single collection for each service path, for each entity or for each attribute
cygnusagent.sinks.mongo-sink.data_model = collection-per-entity
# how the attributes are stored, either per row either per column (row, column)
cygnusagent.sinks.mongo-sink.attr_persistence = column

# ============================================
# OrionSTHSink configuration
Expand Down
96 changes: 94 additions & 2 deletions doc/design/OrionMongoSink.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,8 +60,9 @@ Assuming the following Flume event is created from a notified NGSI context data
]
}
}

Assuming `mongo_username=myuser` and `should_hash=false` as configuration parameter, then `OrionMongoSink` will persist the data within the body as:

According to different combinations of the parameters `datamodel` and `attr_persistence`, the system will persist the data in different ways, as we will describe below.
Assuming `mongo_username=myuser` and `should_hash=false` and `data_model=collection-per-entity` and `attr_persistence=row` as configuration parameters, then `OrionMongoSink` will persist the data within the body as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
Expand All @@ -78,7 +79,92 @@ Assuming `mongo_username=myuser` and `should_hash=false` as configuration parame
system.indexes
> db['sth_/4wheels_car1_car'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db82"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrName" : "speed", "attrType" : "float", "attrValue" : "112.9" }
{ "_id" : ObjectId("5534d143fa701f0be751db83"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "74.6" }

If `data_model=collection-per-entity` and `attr_persistence=column` then `OrionMongoSink` will persist the data within the body as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels_car1_car
system.indexes
> db['sth_/4wheels_car1_car'].find()
{"_id" : ObjectId("56337ea4c9e77c1614bfdbb7"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "speed" : "112.9", "oil_level" : "74.6"}

If `data_model=collection-per-service-path` and `attr_persistence=row` then `OrionMongoSink` will persist the data within the body in the same collection (i.e. `4wheels`) for all the entities of the same service path as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels
system.indexes
> db['sth_/4wheels'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db82"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "attrName" : "speed", "attrType" : "float", "attrValue" : "112.9" }
{ "_id" : ObjectId("5534d143fa701f0be751db83"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "74.6" }
{ "_id" : ObjectId("5534d143fa701f0be751db84"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car2", "entityType" : "car", "attrName" : "speed", "attrType" : "float", "attrValue" : "123.0" }
{ "_id" : ObjectId("5534d143fa701f0be751db85"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car2", "entityType" : "car", "attrName" : "oil_level", "attrType" : "float", "attrValue" : "40.9" }

Note: The first two documents were generated by the above flume-event, while the last two documents (`"entityId" : "car2"`) were originated by another event (not shown here).
We have left these documents in order to show that the same collection stores data of different entities, unlike what it happens with other value of `data_model` parameter.

Similarly, if `data_model=collection-per-service-path` and `attr_persistence=column` then `OrionMongoSink` will persist the data as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels
system.indexes
> db['sth_/4wheels'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db86"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "entityId" : "car1", "entityType" : "car", "speed" : "112.9", "oil_level" : "74.6" }

If `data_model=collection-per-attribute` and `attr_persistence=row` then `OrionMongoSink` will persist the data as:

$ mongo -u myuser -p
MongoDB shell version: 2.6.9
connecting to: test
> show databases
admin (empty)
local 0.031GB
sth_vehicles 0.031GB
test 0.031GB
> use vehicles
switched to db vehicles
> show collections
sth_/4wheels_car1_car_speed
sth_/4wheels_car1_car_oil_level
system.indexes
> db['sth_/4wheels_car1_car_speed'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db87"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrType" : "float", "attrValue" : "112.9" }
> db['sth_/4wheels_car1_oil_level'].find()
{ "_id" : ObjectId("5534d143fa701f0be751db87"), "recvTimeTs": "1402409899391", "recvTime" : "2015-04-20T12:13:22.41.412Z", "attrType" : "float", "attrValue" : "74.6" }

Finally, the pair of parameters `data_model=collection-per-attribute` and `attr_persistence=column` has no palpable sense if used together, thus **DON'T USE IT**.
In this case, in fact, `OrionMongoSink` will not persist anything; only a warning will be logged.

NOTES:

* `mongo` is the MongoDB CLI for querying the data.
Expand Down Expand Up @@ -118,6 +204,8 @@ A configuration example could be:
cygnusagent.sinks.mongo-sink.db_prefix = cygnus_
cygnusagent.sinks.mongo-sink.collection_prefix = cygnus_
cygnusagent.sinks.mongo-sink.should_hash = false
cygnusagent.sinks.mongo-sink.data_model = collection-per-entity
cygnusagent.sinks.mongo-sink.attr_persistence = column

[Top](#top)

Expand Down Expand Up @@ -168,6 +256,10 @@ Creates a collection, given its name, if not exists in the given database.

Inserts a new document in the given collection within the given database. Such a document contains all the information regarding a single notification for a single attribute. See STH at [Github](https://github.com/telefonicaid/IoT-STH/blob/develop/README.md) for more details.

public void insertContextDataRaw(String dbName, String collectionName, long recvTimeTs, String recvTime, String entityId, String entityType, Map<String, String> attrs, Map<String, String> mds) throws Exception

Inserts a new document in the given collection within the given database. Such a document contains all the information regarding a single notification for multiple attributes. See STH at [Github](https://github.com/telefonicaid/IoT-STH/blob/develop/README.md) for more details.

[Top](#top)

##<a name="section5"></a>Contact
Expand Down
3 changes: 2 additions & 1 deletion doc/quick_start_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ $ export JAVA_HOME=/usr/lib/jvm/java-1.6.0-openjdk.x86_64
In order to do it permanently, edit `/root/.bash_profile` (root user) or `/etc/profile` (other users).

##Configuring a test agent

This kind of agent is the simplest one you can configure with Cygnus. It is based on a standard `HTTPSource`, a `MemoryChannel` and a `OrionTestSink`. Don't worry about the configuration details, specially those about the source; simply think on a Http listener waiting for Orion notifications on port TCP/5050 and sending that notifications in the form of Flume events to a testing purpose sink that will not really persist anything in a third-party storage, but will log the notified context data.

(1) Create and edit a `/usr/cygnus/conf/agent_test.conf` file (as a sudoer):
Expand Down Expand Up @@ -169,4 +170,4 @@ There are several channels suited for reporting issues and asking for doubts in
* [[email protected]]([email protected]) **[Contributor]**
* [[email protected]](mailto:[email protected]) **[Quality Assurance]**

**NOTE**: Please try to avoid personaly emailing the contributors unless they ask for it. In fact, if you send a private email you will probably receive an automatic response enforcing you to use [stackoverflow.com](stackoverflow.com) or [ask.fiware.org](https://ask.fiware.org/questions/). This is because using the mentioned methods will create a public database of knowledge that can be useful for future users; private email is just private and cannot be shared.
**NOTE**: Please try to avoid personaly emailing the contributors unless they ask for it. In fact, if you send a private email you will probably receive an automatic response enforcing you to use [stackoverflow.com](stackoverflow.com) or [ask.fiware.org](https://ask.fiware.org/questions/). This is because using the mentioned methods will create a public database of knowledge that can be useful for future users; private email is just private and cannot be shared.
Loading

0 comments on commit 9b0d63a

Please sign in to comment.