Skip to content

Index music album from the MusicBrainz open music encyclopedia into Elasticsearch

License

Notifications You must be signed in to change notification settings

achretien/musicbrainz-elasticsearch

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MusicBrainz Elasticsearch

Like freedb, MusicBrainz is an open music encyclopedia that collects music metadata and makes it available to the public.

The musicbrainz-elasticsearch project is a java batch that indexes release groups of the MusicBrainz database into an Elasticsearch index.
From release groups, only "real" Album are indexed. Single, EP and Broadcast are not indexed. And from Album release group primary type, neither Compilation, Live, Remix or Soundtrack secondary types are indexed.

Features

  • SQL request selecting music album from the MusicBrainz PostgreSQL datanase
  • Elasticsearch index settings and mapping of the musicalbum index in JSON format
  • Tasklet deleting previous index
  • Tasklets creating settings and mappings for the musicalbum index
  • Parallel ES indexation using multi-threads on a single process
  • A java main class to launch the batch (through command line, IDE or maven)
  • End-to-end unit tests with U2 discography

Powered by

This project depends on several other open source projects:

Prerequisites

1. MusicBrainz

To index MusicBrainz data, the batch requires a connection to the MusicBrainz PostgreSQL relational database.
Musicbrainz.org does not provide a public access to its database. Thus you have to install your own database. There are a two different methods to get a local database up and running, you can either:

For my part, I have chosen to download the MusicBrainz Server virtual machine. Available in Open Virtualization Archive (OVA), I have deployed it into Oracle VirtualBox but you may prefer VMWare.
Once finished the MusicBrainz Server setup guide, you have to follow the below two final steps in order the PostgreSQL database be accessible to your host:

  1. Configuring port forwarding with NAT
    Port forwarding enables VirtualBox to listen to certain ports on the host and resends all packets which arrive there to the guest, on the same or a different port. You may used same port on host and guest. Configure two rules (the second is optional):
  • PostgreSQL database - TCP - host : 5432 / guest : 5432
  • MusicBrainz web server : TCP - host : 5000 / guest : 5000
  1. Configuring PostgreSQL
    To enable remote access to the PostgreSQL database server, you may follow those instructions. Log into the VM (credentials: vm / musicbrainz) and edit the two configuration files pg_hba.conf and postgresql.conf.

Once steps done, you may connect to the database with any JDBC clients (ie. SQuireL):

  • URL: jdbc:postgresql://localhost:5432/musicbrainz
  • Credentials: musicbrainz / musicbrainz

2. Elasticsearch

Before launching the batch, you have to download Elasticsearch v0.90.5 and configure it. One unziped, edit the config/elaticsearch.yml configuration file. Uncomment the cluster.name line and set it with the musicbrainz cluster name: cluster.name: musicbrainz You may also prefer to keep the default elasticsearch cluster name and change the name in the es-musicbrainz-batch.properties configuration file.

Quick Start

  • git clone https://github.com/arey/musicbrainz-elasticsearch.git
  • start Elasticsearch
  • start MusicBrainz database or VM
  • mvn install
  • mvn exec:java (execute the IndexBatchMain main class)

Demo

MusicBrainz database searching with Elasticsearch : http://musicsearch.javaetmoi.com/

My Demo Screenshot

For command line testing, you could execute the two following curl scripts: musicbrainz_autocomplete_u2.sh and musicbrainz_fulltext_u2_war.sh

Contributing to MusicBrainz Elasticsearch project

  • Github is for social coding platform: if you want to write code, we encourage contributions through pull requests from forks of this repository. If you want to contribute code this way, please reference a GitHub ticket as well covering the specific issue you are addressing.

Development environment installation

Download the code with git: git clone git://github.com/arey/musicbrainz-elasticsearch.git

Compile the code with maven:

mvn clean install

If you're using an IDE that supports Maven-based projects (InteliJ Idea, Netbeans or m2Eclipse), you can import the project directly from its POM. Otherwise, generate IDE metadata with the related IDE maven plugin:

mvn eclipse:clean eclipse:eclipse

Documentation

French articles on the javaetmoi.com blog:

Release Note

VersionRelease dateFeatures date
1.1-SNAPSHOTNext versionElasticsearch 1.0.0 update
1.026/10/2013Initial version developed for a workshop about Elasticsearch (0.90.5)

Credits

  • Uses Maven as a build tool
  • Uses Cloudbees and Travis CI for continuous integration builds whenever code is pushed into GitHub
  • Authors of all used open source librairies

Build Status

Travis : Build Status

Cloudbees Jenkins : Build Status

Bitdeli Badge

About

Index music album from the MusicBrainz open music encyclopedia into Elasticsearch

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Java 100.0%