Go CSV Importer
is a powerful, multi-threaded command-line tool designed to import large CSV files into databases quickly and efficiently. It supports various database types, offers customizable import modes, and is optimized for performance.
- Support for Large CSV Files: Handles millions of rows with ease.
- Multi-Database Compatibility: Works with SQLite, MySQL, PostgreSQL, and Firebird.
- Configurable Import Modes:
- Transactional or non-transactional imports.
- Batch SQL inserts or row-by-row operations.
- Single or multiple database connections.
Install the latest version directly from Go:
go install github.com/olbrichattila/gocsvimporter/cmd/csvimporter@latest
- Analyze the CSV: Determines data types and structures.
- Prepare the Database: Automatically creates the necessary table, dropping it if it already exists.
- Import Data: Optimizes the process based on database type and chosen parameters.
Run the tool with:
csvimporter <csv_file> <table_name> [delimiter]
- CSV File: Path to the CSV file.
- Table Name: Target database table name.
- Delimiter (Optional): CSV delimiter (default:
,
).
csvimporter data.csv vehicles ";"
The CSV importer command line tool now includes a powerful feature for handling CSV headers efficiently.
When importing CSV files, the tool automatically:
- Analyzes the CSV content: Deduplicates field names, determines proper field types, and calculates field sizes based on the data.
- Optimizes imports: Speeds up processing for repeated file formats by allowing you to save and reuse header definitions.
This analysis can take time, especially with large files. If you're importing the same file format multiple times, you can now save the calculated headers for reuse, bypassing recalculation for subsequent imports.
Use the -sh= parameter to save the field definitions in a JSON file. You can optionally modify this JSON file to fine-tune your header configurations.
csvimporter <csv_file> <table_name> [delimiter] -sh=customerFieldDefinition
This will generate a JSON file with the field definitions.
Use the -lh= parameter to load the pre-saved field definitions, skipping the recalculation step.
csvimporter <csv_file> <table_name> [delimiter] -lh=customerFieldDefinition
When this parameter is provided, the import process starts immediately, but progress percentage updates will not be displayed.
Here’s an example of the auto-saved JSON file generated by the tool:
[
{
"name": "index",
"Type": "INT",
"Length": 7
},
{
"name": "customer_id",
"Type": "VARCHAR",
"Length": 15
},
{
"name": "first_name",
"Type": "VARCHAR",
"Length": 11
},
{
"name": "last_name",
"Type": "VARCHAR",
"Length": 11
},
{
"name": "company",
"Type": "VARCHAR",
"Length": 38
},
{
"name": "city",
"Type": "VARCHAR",
"Length": 24
},
{
"name": "country",
"Type": "VARCHAR",
"Length": 51
},
{
"name": "phone_a1",
"Type": "VARCHAR",
"Length": 22
},
{
"name": "phone_a2",
"Type": "VARCHAR",
"Length": 22
},
{
"name": "email",
"Type": "VARCHAR",
"Length": 44
},
{
"name": "subscription_date",
"Type": "VARCHAR",
"Length": 10
},
{
"name": "website",
"Type": "VARCHAR",
"Length": 40
}
]
- Time-Saving: Skip repetitive calculations for consistent file formats.
- Customizable: Modify the saved JSON to align with specific import requirements.
- Improved Efficiency: Ideal for workflows requiring frequent CSV imports.
- The -lh parameter disables progress percentage display.
- Ensure the JSON file accurately reflects your data structure for seamless imports.
By leveraging this feature, the CSV importer command line tool becomes even more efficient for managing recurring import tasks and large datasets.
DB_CONNECTION=sqlite
DB_DATABASE=./database/database.sqlite
DB_CONNECTION=mysql
DB_HOST=127.0.0.1
DB_PORT=3306
DB_DATABASE=mydb
DB_USERNAME=myuser
DB_PASSWORD=mypassword
DB_CONNECTION=pgsql
DB_HOST=127.0.0.1
DB_PORT=5432
DB_DATABASE=postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_SSLMODE=disable
DB_CONNECTION=firebird
DB_HOST=127.0.0.1
DB_PORT=3050
DB_DATABASE=/path/to/database.fdb
DB_USERNAME=SYSDBA
DB_PASSWORD=masterkey
Create a .env.csvimporter
file in the application's directory or export environment variables:
BATCH_SIZE=500 # Rows per batch (default: 100)
MAX_CONNECTION_COUNT=25 # Maximum connections (default: 10)
BATCH_INSERT=on # Enable/disable batch insert
MULTIPLE_CONNECTIONS=on # Enable/disable multi-threading
TRANSACTIONAL=off # Enable/disable transactions
Note: Unsupported options for certain databases (e.g., SQLite) are ignored.
System Configuration:
- OS: Ubuntu Linux
- Processor: Intel® Core™ i7-3770S @ 3.10GHz
- Storage: SSD
Database | Duration | Mode | Threads |
---|---|---|---|
SQLite | 52 seconds | Transactional, Batch SQL | 1 |
MySQL | 65 seconds | Transactional, Multi-Threaded | 10 |
PostgreSQL | 43 seconds | Transactional, Multi-Threaded | 10 |
Firebird | 5m 42s | Transactional, Multi-Threaded | 10 |
make vehicles
make customers
make switch-sqlite
make switch-mysql
make switch-pgsql
make switch-firebird
A docker-compose
setup is provided for testing:
cd docker
docker-compose up -d
- Distributed Import: Split CSV files across multiple instances (pods/servers) for faster parallel imports.
- Enhanced Configuration: Support more advanced database-specific settings.
Start importing your CSV files faster and more efficiently today!
- Learn more about me on my personal website. https://attilaolbrich.co.uk/menu/my-story
- Check out my latest blog blog at my personal page. https://attilaolbrich.co.uk/blog/1/single