Go CSV Importer: Multi-threaded Fast CSV to Database Tool

Overview

Go CSV Importer is a powerful, multi-threaded command-line tool designed to import large CSV files into databases quickly and efficiently. It supports various database types, offers customizable import modes, and is optimized for performance.

Key Features

Support for Large CSV Files: Handles millions of rows with ease.
Multi-Database Compatibility: Works with SQLite, MySQL, PostgreSQL, and Firebird.
Configurable Import Modes:
- Transactional or non-transactional imports.
- Batch SQL inserts or row-by-row operations.
- Single or multiple database connections.

Installation

Install the latest version directly from Go:

go install github.com/olbrichattila/gocsvimporter/cmd/csvimporter@latest

How It Works

Analyze the CSV: Determines data types and structures.
Prepare the Database: Automatically creates the necessary table, dropping it if it already exists.
Import Data: Optimizes the process based on database type and chosen parameters.

Usage

Run the tool with:

csvimporter <csv_file> <table_name> [delimiter]

Parameters:

CSV File: Path to the CSV file.
Table Name: Target database table name.
Delimiter (Optional): CSV delimiter (default: ,).

Example:

csvimporter data.csv vehicles ";"

Field Header Management for Faster Imports

The CSV importer command line tool now includes a powerful feature for handling CSV headers efficiently.

Feature Overview

When importing CSV files, the tool automatically:

Analyzes the CSV content: Deduplicates field names, determines proper field types, and calculates field sizes based on the data.
Optimizes imports: Speeds up processing for repeated file formats by allowing you to save and reuse header definitions.

This analysis can take time, especially with large files. If you're importing the same file format multiple times, you can now save the calculated headers for reuse, bypassing recalculation for subsequent imports.

How It Works

Save Field Definitions:

Use the -sh= parameter to save the field definitions in a JSON file. You can optionally modify this JSON file to fine-tune your header configurations.

csvimporter <csv_file> <table_name> [delimiter] -sh=customerFieldDefinition

This will generate a JSON file with the field definitions.

Load Field Definitions:

Use the -lh= parameter to load the pre-saved field definitions, skipping the recalculation step.

csvimporter <csv_file> <table_name> [delimiter] -lh=customerFieldDefinition

When this parameter is provided, the import process starts immediately, but progress percentage updates will not be displayed.

Example JSON Structure

Here’s an example of the auto-saved JSON file generated by the tool:

[
 {
  "name": "index",
  "Type": "INT",
  "Length": 7
 },
 {
  "name": "customer_id",
  "Type": "VARCHAR",
  "Length": 15
 },
 {
  "name": "first_name",
  "Type": "VARCHAR",
  "Length": 11
 },
 {
  "name": "last_name",
  "Type": "VARCHAR",
  "Length": 11
 },
 {
  "name": "company",
  "Type": "VARCHAR",
  "Length": 38
 },
 {
  "name": "city",
  "Type": "VARCHAR",
  "Length": 24
 },
 {
  "name": "country",
  "Type": "VARCHAR",
  "Length": 51
 },
 {
  "name": "phone_a1",
  "Type": "VARCHAR",
  "Length": 22
 },
 {
  "name": "phone_a2",
  "Type": "VARCHAR",
  "Length": 22
 },
 {
  "name": "email",
  "Type": "VARCHAR",
  "Length": 44
 },
 {
  "name": "subscription_date",
  "Type": "VARCHAR",
  "Length": 10
 },
 {
  "name": "website",
  "Type": "VARCHAR",
  "Length": 40
 }
]

Benefits

Time-Saving: Skip repetitive calculations for consistent file formats.
Customizable: Modify the saved JSON to align with specific import requirements.
Improved Efficiency: Ideal for workflows requiring frequent CSV imports.

Notes

The -lh parameter disables progress percentage display.
Ensure the JSON file accurately reflects your data structure for seamless imports.

By leveraging this feature, the CSV importer command line tool becomes even more efficient for managing recurring import tasks and large datasets.

Supported Databases

SQLite

DB_CONNECTION=sqlite
DB_DATABASE=./database/database.sqlite

MySQL

DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=mydb
DB_USERNAME=myuser
DB_PASSWORD=mypassword

PostgreSQL

DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_SSLMODE=disable

Firebird

DB_CONNECTION=firebird
DB_HOST=127.0.0.1
DB_PORT=3050
DB_DATABASE=/path/to/database.fdb
DB_USERNAME=SYSDBA
DB_PASSWORD=masterkey

Configuration

Create a .env.csvimporter file in the application's directory or export environment variables:

BATCH_SIZE=500               # Rows per batch (default: 100)
MAX_CONNECTION_COUNT=25      # Maximum connections (default: 10)
BATCH_INSERT=on              # Enable/disable batch insert
MULTIPLE_CONNECTIONS=on      # Enable/disable multi-threading
TRANSACTIONAL=off            # Enable/disable transactions

Note: Unsupported options for certain databases (e.g., SQLite) are ignored.

Performance

Speed Test: 2 Million Rows

System Configuration:

OS: Ubuntu Linux
Processor: Intel® Core™ i7-3770S @ 3.10GHz
Storage: SSD

Database	Duration	Mode	Threads
SQLite	52 seconds	Transactional, Batch SQL	1
MySQL	65 seconds	Transactional, Multi-Threaded	10
PostgreSQL	43 seconds	Transactional, Multi-Threaded	10
Firebird	5m 42s	Transactional, Multi-Threaded	10

Makefile Targets

Test Imports

make vehicles
make customers

Switch Environments

make switch-sqlite
make switch-mysql
make switch-pgsql
make switch-firebird

Local Testing with Docker

A docker-compose setup is provided for testing:

cd docker
docker-compose up -d

Roadmap

Planned Improvements

Distributed Import: Split CSV files across multiple instances (pods/servers) for faster parallel imports.
Enhanced Configuration: Support more advanced database-specific settings.

Start importing your CSV files faster and more efficiently today!

About me:

Learn more about me on my personal website. https://attilaolbrich.co.uk/menu/my-story
Check out my latest blog blog at my personal page. https://attilaolbrich.co.uk/blog/1/single

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
cmd/csvimporter		cmd/csvimporter
docker		docker
internal		internal
.env.firebird.example		.env.firebird.example
.env.mysql.example		.env.mysql.example
.env.pgsql.example		.env.pgsql.example
.env.sqlite.example		.env.sqlite.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
header-info.json		header-info.json
stats.txt		stats.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Go CSV Importer: Multi-threaded Fast CSV to Database Tool

Overview

Key Features

Installation

How It Works

Usage

Parameters:

Example:

Field Header Management for Faster Imports

Feature Overview

How It Works

Save Field Definitions:

Load Field Definitions:

Example JSON Structure

Benefits

Notes

Supported Databases

SQLite

MySQL

PostgreSQL

Firebird

Configuration

Performance

Speed Test: 2 Million Rows

Makefile Targets

Test Imports

Switch Environments

Local Testing with Docker

Roadmap

Planned Improvements

About me:

About

Releases 5

Packages

Languages

olbrichattila/gocsvimporter

Folders and files

Latest commit

History

Repository files navigation

Go CSV Importer: Multi-threaded Fast CSV to Database Tool

Overview

Key Features

Installation

How It Works

Usage

Parameters:

Example:

Field Header Management for Faster Imports

Feature Overview

How It Works

Save Field Definitions:

Load Field Definitions:

Example JSON Structure

Benefits

Notes

Supported Databases

SQLite

MySQL

PostgreSQL

Firebird

Configuration

Performance

Speed Test: 2 Million Rows

Makefile Targets

Test Imports

Switch Environments

Local Testing with Docker

Roadmap

Planned Improvements

About me:

About

Topics

Resources

Stars

Watchers

Forks

Releases 5

Packages 0

Languages

Packages