xlparsec for Apache Spark

A library built in Scala 2.12 that utilizes Apache POI to parse Excel Workbooks based on a flexible JSON configuration. It seamlessly integrates with Apache Spark, allowing you to load Excel data into Spark DataFrames for further processing.

Features

JSON-Based Configuration: Easily define how to extract data from Excel sheets, including ranges, columns, and virtual columns using a simple JSON config.
Virtual Columns: Add computed or derived columns that don't exist in the original Excel sheet.

Installation

Download the xlparsec-${VERSION}.jar.
Place the JAR file in your ${SPARK_HOME}/jars/ directory.

Usage

Basic Example

import de.fxttr.scala.xlparsec.Xlparsec
import org.apache.spark.sql.SparkSession

implicit val spark: SparkSession = SparkSession.builder()
  .appName("ExcelParserExample")
  .master("local[*]")
  .getOrCreate()

val configJson = """{...}"""

val result = Xlparsec.toDFs(configJson)

result match {
  case Right(dataFrames) => 
    dataFrames.foreach { case (sheetName, scopes) =>
      scopes.foreach { case (scopeName, dfEither) =>
        dfEither match {
          case Right(df) => df.show()
          case Left(error) => println(s"Error parsing scope $scopeName: $error")
        }
      }
    }
  case Left(error) => println(s"Error parsing file: $error")
}

For more examples, refer to the /examples directory.

Documentation

Comprehensive documentation can be found in the /docs directory, including detailed configuration instructions, supported JSON schema, and advanced usage scenarios.

Contributing

Feel free to open issues or submit pull requests for improvements or bug fixes. Contributions are always welcome!

License

This project is licensed under the Apache-2.0 License.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
.vscode		.vscode
project		project
src		src
.envrc		.envrc
.gitignore		.gitignore
.scalafix.conf		.scalafix.conf
.scalafmt.conf		.scalafmt.conf
LICENSE		LICENSE
README.md		README.md
build.sbt		build.sbt
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

xlparsec for Apache Spark

Features

Installation

Usage

Basic Example

Documentation

Contributing

License

About

Releases

Languages

License

fxttr/xlparsec

Folders and files

Latest commit

History

Repository files navigation

xlparsec for Apache Spark

Features

Installation

Usage

Basic Example

Documentation

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Languages