Skip to content
This repository has been archived by the owner on Sep 6, 2024. It is now read-only.
/ lighthouse Public archive

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

License

Notifications You must be signed in to change notification settings

datamindedbe/lighthouse

Repository files navigation

Lighthouse

Maven Central CircleCI Codacy Badge

Caution

This library hasn't been actively maintained for a while, so on the 6th of September 2024 it has been archived.

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Principles

  • Configuration as code
  • Idempotent execution
  • Utilities for easier building and testing Apache Spark based applications

Start using Lighthouse

In your build.sbt, add this:

libraryDependencies += "be.dataminded" %% "lighthouse" % <version>
libraryDependencies += "be.dataminded" %% "lighthouse-testing" % <version> % Test

If you are using Maven, add this to your pom.xml:

<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse_2.11</artifactId>
    <version>[version]</version>
</dependency>
<dependency>
    <groupId>be.dataminded</groupId>
    <artifactId>lighthouse-testing_2.11</artifactId>
    <version>[version]</version>
    <scope>test</scope>
</dependency>

Online Documentation

This README file only contains basic instructions. Here is a more complete tutorial: https://datamindedbe.github.io/lighthouse/tutorial/

About

Lighthouse is a library for data lakes built on top of Apache Spark. It provides high-level APIs in Scala to streamline data pipelines and apply best practices.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages