Incredibely fast integration of data from different sources/silos with siginificantly reduced ETL needs
- Loading data as-is from external data sources (RSS, Twitter, Quandl)
- Peaceful coexistence of JSON and XML data in the same database, used in the same application
- Envelope pattern
- Using db tasks to schedule re-ocurring jobs (such as pulling in data updates here)
- Using mlcp to load and parse PDF documents
- MarkLogic Search capabilities including free-text and structured search, facets and dynamic date bucketing
- Combining multiple JSON properties and XML elements into single Fields and defining Field range index
- Customizing the Search API to perform a on-the-fly transform of the search response
- Highlighting of matched keywords
- Building word clouds based on distinctive terms function
- MarkLogic mathematical functions (to calculate winner/loser stock of the day/week)
- Using both XQuery and Server-side JavaScript to write server-side code
- Exposing server-side functions as REST API to be consumed by the front end
- Some UI goodies provided by the Slush generator: Tag Cloud, AngularJS Highcharts, document viewer (PDF and HTML)
- Last but not least: incredibly short time it takes to build such an application with MarkLogic and Slush: total effort was 12 person/days
This application was generated by the MarkLogic-Node Slush generator, with the following components:
- AngularJS
- Gulp
- node.js: very thin layer, hosting the Angular code and proxying MarkLogic REST API requests
- Roxy Deployer: bootstrap MarkLogic databases, application servers, etc; scaffolding for MarkLogic REST API service extensions
- node.js
- npm: Built-in package manager for node (comes with
node, but check to be sure you have latest version:
npm install -g npm
) - gulp: Javascript task automation (
npm install -g gulp
) - Bower: A package manager for front-end libraries (
npm install -g bower
) - Git - Roxy depends on this version control system
- Ruby - Roxy depends on Ruby in order to run server configuration scripts
- [mlpm] MarkLogic package installer:
npm install -g mlpm
Change ports, username/password for your local deployment. Here's an example content for deploy/local.properties:
#################################################################
# This file contains overrides to values in build.properties
# These only affect your local environment and should not be checked in
#################################################################
#
# The ports used by your application
#
app-port=9101
# Taking advantage of not needing a XCC Port for ML8
xcc-port=${app-port}
install-xcc=false
#
# the uris or IP addresses of your servers
# WARNING: if you are running these scripts on WINDOWS you may need to change localhost to 127.0.0.1
# There have been reported issues with dns resolution when localhost wasn't in the hosts file.
#
local-server=ea4-ml1
#
# Admin username/password that will exist on the local/dev/prod servers
#
user=admin
password=admin
server-version=9
mlcp-home=/Users/smitrovi/apps/mlcp-9.0-EA4
load-js-as-binary=false
(Make sure you have the required dependecies installed before you proceed here)
Run following roxy commands to bootstrap (configure) the database and the API servers, deploy servers-side code (modules, triggers and external packages) and, finally, import data:
mlpm install ml-datetime --save
mlpm install ml-open-calais --save
./ml local bootstrap
./ml local deploy packages
./ml local deploy modules
./ml local deploy triggers
./import-config.sh
Install additional js dependencies:
npm install
bower install
gulp init-local
Edit ./local.json
to set your desired ports: "ml-http-port" for MarkLogic http port and "node-port" for node port. If these ports are not already taken within your environment, leave the default valurs.
The content of the local.json should be:
{
"ml-version": "8",
"ml-host": "ea4-ml1",
"ml-admin-user": "admin",
"ml-admin-pass": "admin",
"ml-app-user": "admin",
"ml-app-pass": "admin",
"ml-http-port": "9101",
"node-port": 9070
}
To run the web-server
gulp serve-local # this will watch the .less file for changes, compile them to .css, and run the node server
For best browsing experience, use Chrome or Safari.
See etc/INSTALL.md
To do a initial insert of data or a one-time, manual update, run
./import-data.sh
./import-internal-docs.sh
This script will call other scripts that update data per data source.
Data used in this demo include several RSS feeds and Twitter status update and prices of 160 stock listed in the Frankfurt Stock Exchange available on Quandl.com.
Tasks defined in src/tasks take care of perioducally updated data from these data sources so the amaount of data in the database will grow over time.
To add new data sources, go to config page (/config):
When adding an RSS source, make sure you add the correct encoding. For new Twitter sources, add a new Twitter screen name.
Don't forget to click on the "Save" button when done to update the configuration file ("/config/sources.json" in the main/document dB). Otherwise, the changes will not be saved.
New sources will be ingested at the next scheduled task runs. If you want to ingest data from newly added sources immediatelly, run:
./import-data.sh
from the command line.
In order to enrich your RSS content with tags coming from the Open Calais API, you'll need to register for an API key here: http://www.opencalais.com/opencalais-api/
The key can be entered on the Config page of the Demo.