Skip to content

Commit

Permalink
Merge pull request #34 from jason-fox/feature/github-actions
Browse files Browse the repository at this point in the history
Add GitHub Actions, tidy markdown
  • Loading branch information
anmunoz authored Dec 4, 2020
2 parents a819f61 + 9c7d0b9 commit b5da341
Show file tree
Hide file tree
Showing 9 changed files with 302 additions and 212 deletions.
59 changes: 59 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
name: CI
'on':
push:
branches:
- master
pull_request:
branches:
- master
jobs:
# lint-dockerfile:
# name: Lint Dockerfile
# runs-on: ubuntu-latest
# steps:
# - name: Git checkout
# uses: actions/checkout@v2
# - name: Run Hadolint Dockerfile Linter
# uses: burdzwastaken/hadolint-action@master
# env:
# GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
# HADOLINT_ACTION_DOCKERFILE_FOLDER: nifi-ngsi-resources/docker

lint-markdown:
name: Lint Markdown
runs-on: ubuntu-latest
steps:
- name: Git checkout
uses: actions/checkout@v2
- name: Use Node.js 12.x
uses: actions/setup-node@v1
with:
node-version: 12.x
- name: Run Remark Markdown Linter
run: |
npm install
npm run lint:md
- name: Run Textlint Markdown Linter
run: npm run lint:text

unit-test:
name: Unit Tests
runs-on: ubuntu-latest
steps:
- name: Git checkout
uses: actions/checkout@v2
- name: Use Java 8
uses: actions/setup-java@v1
with:
java-version: 8
- name: 'Unit Tests with Java 8'
run: |
cd nifi-ngsi-bundle
mvn -s ../settings.xml install -DskipTests=true -Dmaven.javadoc.skip=true -Padd-dependencies-for-IDEA > maven-install.log
mvn -s ../settings.xml verify -Padd-dependencies-for-IDEA > maven-verify.log
cd nifi-ngsi-processors
mvn -s ../../settings.xml clean test -Dtest=Test* cobertura:cobertura coveralls:report -Padd-dependencies-for-IDEA -DrepoToken="${COVERALLS_TOKEN}"
mvn clean cobertura:cobertura coveralls:report -DrepoToken="${COVERALLS_TOKEN}"
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
COVERALLS_TOKEN: ${{ secrets.COVERALLS_TOKEN }}
38 changes: 0 additions & 38 deletions .travis.yml

This file was deleted.

2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![Support badge](https://img.shields.io/badge/support-askbot-yellowgreen.svg)](https://ask.fiware.org/questions/scope%3Aall/tags%3Adraco/)
<br/>
[![Documentation badge](https://readthedocs.org/projects/fiware-draco/badge/?version=latest)](http://fiware-draco.rtfd.io)
[![Build Status](https://travis-ci.com/ging/fiware-draco.svg?branch=master)](https://travis-ci.com/ging/fiware-draco)
[![CI](https://github.com/ging/fiware-draco/workflows/CI/badge.svg)](https://github.com/ging/fiware-draco/actions?query=workflow%3ACI)
[![Coverage Status](https://coveralls.io/repos/github/ging/fiware-draco/badge.svg?branch=develop)](https://coveralls.io/github/ging/fiware-draco?branch=develop)
[![Known Vulnerabilities](https://snyk.io/test/github/ging/fiware-draco/badge.svg?targetFile=nifi-ngsi-bundle/nifi-ngsi-processors/pom.xml)](https://snyk.io/test/github/ging/fiware-draco?targetFile=nifi-ngsi-bundle/nifi-ngsi-processors/pom.xml)
![Status](https://nexus.lab.fiware.org/static/badges/statuses/draco.svg)
Expand Down
2 changes: 1 addition & 1 deletion docs/credits.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,5 @@ Sonsoles López Pernas <sonsoleslp>
Jason Fox <jason-fox>

Pooja Pathak <pooja1pathak>

José Virseda<josevirseda>
262 changes: 150 additions & 112 deletions docs/processors_catalogue/ngsi_carto_sink.md

Large diffs are not rendered by default.

81 changes: 43 additions & 38 deletions docs/processors_catalogue/ngsi_cassandra_sink.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,8 @@ objects) is put into the internal channels for future consumption (see next sect

### Mapping `NGSIEvent`s to Cassandra data structures

Cassandra organizes the data in Keyspacees that contain tables of data rows. Such organization is exploited by `NGSIToCassandra`
each time a `NGSIEvent` is going to be persisted.
Cassandra organizes the data in Keyspacees that contain tables of data rows. Such organization is exploited by
`NGSIToCassandra` each time a `NGSIEvent` is going to be persisted.

<a name="section1.2.1"></a>

Expand All @@ -38,10 +38,12 @@ each time a `NGSIEvent` is going to be persisted.
A Keyspace named as the notified `fiware-service` header value (or, in absence of such a header, the defaulted value for
the FIWARE service) is created (if not existing yet).

It must be said Cassandra [only accepts](http://cassandra.apache.org/doc/latest/cql/definitions.html) alphanumerics `$` and `_`.
This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration parameter.
It must be said Cassandra [only accepts](http://cassandra.apache.org/doc/latest/cql/definitions.html) alphanumerics `$`
and `_`. This leads to certain [encoding](#section2.3.3) is applied depending on the `enable_encoding` configuration
parameter.

Cassandra [Keyspace name length](http://cassandra.apache.org/doc/latest/cql/definitions.html) is limited to 64 characters.
Cassandra [Keyspace name length](http://cassandra.apache.org/doc/latest/cql/definitions.html) is limited to 64
characters.

#### Cassandra tables naming conventions

Expand All @@ -57,8 +59,9 @@ details):
`_` (underscore). If the FIWARE service path is the root one (`/`) then only the entity ID and type are
concatenated.

It must be said Cassandra [only accepts](http://cassandra.apache.org/doc/latest/cql/definitions.html) alphanumerics `$` and `_`.
This leads to certain [encoding](#section2.3.5) is applied depending on the `enable_encoding` configuration parameter.
It must be said Cassandra [only accepts](http://cassandra.apache.org/doc/latest/cql/definitions.html) alphanumerics `$`
and `_`. This leads to certain [encoding](#section2.3.5) is applied depending on the `enable_encoding` configuration
parameter.

Cassandra [tables name length](http://cassandra.apache.org/doc/latest/cql/definitions.html) is limited to 64 characters.

Expand Down Expand Up @@ -171,7 +174,8 @@ Using the new encoding:

#### Row-like storing

Assuming `attr_persistence=row` as configuration parameter, then `NGSIToCassandra` will persist the data within the body as:
Assuming `attr_persistence=row` as configuration parameter, then `NGSIToCassandra` will persist the data within the body
as:

```cql
cqlsh> use vehicles;
Expand Down Expand Up @@ -211,19 +215,19 @@ cqlsh:vehicles> select * from 4wheels_car1_car;

`NGSIToCassandra` is configured through the following parameters(the names of required properties appear in bold)):

| Name | Default Value | Allowable Values | Description |
| --------------------------------- | ------------- | ------------------------ | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **Cassandra Connection Provider** | no | | Controller service for connecting to a specific Keyspace engine |
| **NGSI version** | v2 | | list of supported version of NGSI (v2 and ld), currently only support v2 |
| **Data Model** | db-by-entity | | The Data model for creating the Columns when an event have been received you can choose between: db-by-service-path or db-by-entity, default value is db-by-service-path |
| **Attribute persistence** | row | row, column | The mode of storing the data inside of the Column allowable values are row and column |
| Default Service | test | | In case you dont set the Fiware-Service header in the context broker, this value will be used as Fiware-Service |
| Default Service path | /path | | In case you dont set the Fiware-ServicePath header in the context broker, this value will be used as Fiware-ServicePath |
| Enable encoding | true | true, false | true applies the new encoding, false applies the old encoding. |
| Enable lowercase | true | true, false | true for creating the Schema and Columns name with lowercase. |
| **Batch size** | 10 | | The preferred number of FlowFiles to put to the Keyspace in a single transaction |
| Consistency Level | Serial | Serial, Local_serial | The strategy for how many replicas must respond before results are returned. |
| Batch Statement Type | Serial | Logged, Unlogged, Counter| Specifies the type of 'Batch Statement' to be used. |
| Name | Default Value | Allowable Values | Description |
| --------------------------------- | ------------- | ------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| **Cassandra Connection Provider** | no | | Controller service for connecting to a specific Keyspace engine |
| **NGSI version** | v2 | | list of supported version of NGSI (v2 and ld), currently only support v2 |
| **Data Model** | db-by-entity | | The Data model for creating the Columns when an event have been received you can choose between: db-by-service-path or db-by-entity, default value is db-by-service-path |
| **Attribute persistence** | row | row, column | The mode of storing the data inside of the Column allowable values are row and column |
| Default Service | test | | In case you dont set the Fiware-Service header in the context broker, this value will be used as Fiware-Service |
| Default Service path | /path | | In case you dont set the Fiware-ServicePath header in the context broker, this value will be used as Fiware-ServicePath |
| Enable encoding | true | true, false | true applies the new encoding, false applies the old encoding. |
| Enable lowercase | true | true, false | true for creating the Schema and Columns name with lowercase. |
| **Batch size** | 10 | | The preferred number of FlowFiles to put to the Keyspace in a single transaction |
| Consistency Level | Serial | Serial, Local_serial | The strategy for how many replicas must respond before results are returned. |
| Batch Statement Type | Serial | Logged, Unlogged, Counter | Specifies the type of 'Batch Statement' to be used. |

A configuration example could be:

Expand All @@ -239,8 +243,8 @@ Use `NGSIToCassandra` if you are looking for a Keyspace storage not growing so m

The Column type configuration parameter, as seen, is a method for <i>direct</i> aggregation of data: by <i>default</i>
destination (i.e. all the notifications about the same entity will be stored within the same Cassandra Column) or by
<i>default</i> service-path (i.e. all the notifications about the same service-path will be stored within the same Cassandra
Column).
<i>default</i> service-path (i.e. all the notifications about the same service-path will be stored within the same
Cassandra Column).

#### About the persistence mode

Expand All @@ -263,13 +267,14 @@ deal with the persistence details of such a batch of events in the final backend

What is important regarding the batch mechanism is it largely increases the performance of the sink, because the number
of writes is dramatically reduced. Let's see an example, let's assume a batch of 100 `NGSIEvent`s. In the best case, all
these events regard to the same entity, which means all the data within them will be persisted in the same Cassandra Column.
If processing the events one by one, we would need 100 inserts into Cassandra; nevertheless, in this example only one insert
is required. Obviously, not all the events will always regard to the same unique entity, and many entities may be
involved within a batch. But that's not a problem, since several sub-batches of events are created within a batch, one
sub-batch per final destination Cassandra Column. In the worst case, the whole 100 entities will be about 100 different
entities (100 different Cassandra Columns), but that will not be the usual scenario. Thus, assuming a realistic number of
10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with only 10-15 inserts.
these events regard to the same entity, which means all the data within them will be persisted in the same Cassandra
Column. If processing the events one by one, we would need 100 inserts into Cassandra; nevertheless, in this example
only one insert is required. Obviously, not all the events will always regard to the same unique entity, and many
entities may be involved within a batch. But that's not a problem, since several sub-batches of events are created
within a batch, one sub-batch per final destination Cassandra Column. In the worst case, the whole 100 entities will be
about 100 different entities (100 different Cassandra Columns), but that will not be the usual scenario. Thus, assuming
a realistic number of 10-15 sub-batches per batch, we are replacing the 100 inserts of the event by event approach with
only 10-15 inserts.

The batch mechanism adds an accumulation timeout to prevent the sink stays in an eternal state of batch building when no
new data arrives. If such a timeout is reached, then the batch is persisted as it is.
Expand All @@ -280,17 +285,17 @@ retry intervals can be configured. Such a list defines the first retry interval,
on; if the TTL is greater than the length of the list, then the last retry interval is repeated as many times as
necessary.

By default, `NGSIToCassandra` has a configured batch size and batch accumulation timeout of 1 and 30 seconds, respectively.
Nevertheless, as explained above, it is highly recommended to increase at least the batch size for performance purposes.
Which are the optimal values? The size of the batch it is closely related to the transaction size of the channel the
events are got from (it has no sense the first one is greater then the second one), and it depends on the number of
estimated sub-batches as well. The accumulation timeout will depend on how often you want to see new data in the final
storage.
By default, `NGSIToCassandra` has a configured batch size and batch accumulation timeout of 1 and 30 seconds,
respectively. Nevertheless, as explained above, it is highly recommended to increase at least the batch size for
performance purposes. Which are the optimal values? The size of the batch it is closely related to the transaction size
of the channel the events are got from (it has no sense the first one is greater then the second one), and it depends on
the number of estimated sub-batches as well. The accumulation timeout will depend on how often you want to see new data
in the final storage.

#### Time zone information

Time zone information is not added in Cassandra timestamps since Cassandra stores that information as a environment variable.
Cassandra timestamps are stored in UTC time.
Time zone information is not added in Cassandra timestamps since Cassandra stores that information as a environment
variable. Cassandra timestamps are stored in UTC time.

#### About the encoding

Expand Down
Loading

0 comments on commit b5da341

Please sign in to comment.