Skip to content

3rdcycle/csv2db

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

csv2db

csv2db helps with importing values from one or more csv files into a relational database.

Check out import.py in the example directory or read the documentation below. (The example is probably easier to understand)

/doc/concept.png?raw=true

Copyright (C) 2014, ETH Zurich

Import Specification

The importer creates one or more related DbRecord objects for each row that it reads from the csv file according to the import specification that is passed to CsvImporter on init. These objects can then be converted to sql insert statements. The import specification is a dictionary with the following format:

import_specs = {
    <table_name>: {
        <instance_name>: RecordSpec(<attribute_map> [, condition=<cond>])
    },
    <table name>: {
        <instance_name>: RecordSpec(<attribute_map> [, condition=<cond>]),
        <instance_name>: RecordSpec(<attribute_map> [, condition=<cond>]),
        ...
    },
    ...
}

Here's what the different parts mean:

<table_name>
Name of the table in the target database where objects specified by the RecordSpec instances under this key should be inserted.
<instance_name>
An identifier for each record instance that should be created when a csv row is imported. I.e. you can create multiple records in the same target table per csv row. The <instance_name> is just an arbitrary name which the script needs to resolve cross references (see below).
<attribute_map>
The attribute map tells the RecordSpec how to fill the columns in the target table. See below for details.
<cond>
An optional callable that tells the script whether the object should be created for one particular row or not. <cond> must accept exactly one argument which is the currently imported csv row dictionary.

Attribute map specification

A RecordSpec is initialized with an attr_map dictionary that specifies how each column of the target table should be filled with values. Options are:

  • Values extracted from a column in the csv (ColumnValue)
  • Values computed from multiple columns (MultiColumnValue)
  • Constant values (ConstValue)
  • Dynamically generated values (DynamicValue)
  • Values taken from other DbRecord objects (XReference)

In the example given below, the assumptions are that the target table has a column named _oid whose value is dynamically generated by a callable OidFactory class. The _parent_oid refers to the _oid of other_instance in other_table (corresponding to a <table_name> and <instance_name> in the import_spec. The value for x_value is extracted from the csv column csv_x_value. The column m m_name is always filled with the string 'Some Text'.

attr_map = {
    '_oid':        DynamicValue(OidFactory())
    '_parent_oid': XReference('other_table', 'other_instance', '_oid'),
    'x_value':     ColumnValue('csv_x_value'),
    'm_name':      ConstValue("'Some Text'"),
}

The script takes care of creating records in the correct order so that XReferences can be resolved (as long as there are no circular references). ColumnValue takes an additional optional argument convert, where you can specify a function or other callable that converts the value before inserting it into the record. For MultiColumnValue the convert argument is mandatory and provides the function that contracts the input values into a single output value.

class csv2db.ColumnValue(col_name, convert=None)

Read an input value from a csv column

Parameters:
  • col_name -- Column name to read the value from
  • convert -- Optional conversion function that takes exactly one argument which is the row dict for the currently imported row

class csv2db.ConstValue(value)

Always returns the same constant value

Parameters:value -- The value to return for each row

class csv2db.CsvImporter(path, dialect, import_specs)

CsvImporter imports values from a csv file into records and creates sql insert statements to create the corresponding rows in the target db.

Parameters:
  • path -- Path to the csv file to import
  • dialect -- Dictionary with csv reader dialect specifications (see http://docs.python.org/2/library/csv.html#csv-fmt-params)
  • import_specs -- Dictionary with import specifications for each table. RecordSpecs are used to tell the script how to extract the csv columns into db records. Each entry can have multiple RecordSpecs, identified by a unique key which is used to resolve cross references in the attr_map of each RecordSpec.

import_data(id_col=None)

Imports the csv into DbRecords and returns them.

The method uses the import specification (import_specs) that was passed to the importer on init to convert csv table columns to DbRecord objects.

class csv2db.DbRecord(table_name, row_id)

One or more DbRecords are created for each imported row accoding to the RecordSpecs.

import_attributes(attr_map, existing_records, row)

Import attributes according to the attr_map and resolve cross references to existing_records.

insert_statement()

Returns the insert statement sequence for the current object

class csv2db.DynamicValue(generate)

Creates a value dynamically using the callable generate

Parameters:generate -- A function or other callable that takes a single argument (the current row dict) and returns a single value

class csv2db.MultiColumnValue(col_names, convert)

Reads input from multiple columns and contracts them into a single value using the (non-optional) callable given in convert.

Parameters:
  • col_names -- List of column names to read values from
  • convert -- Conversion function that takes exactly one argument (the row dict of the currently imported row) and contracts the values into a single return value

class csv2db.RecordSpec(attr_map, condition=None)

Specifications for extracting csv columns into the corresponding database record.

Parameters:
  • attr_map -- A dictionary that maps database columns to csv columns using any of the ...Value classes below.
  • condition -- An optional callable that returns false if the object should not be created for the row that is currently. The callable must accept exactly one parameter (the current row).

class csv2db.XReference(table_name, instance_name, attribute_name)

Takes the value of a specific attribute of another record.

Parameters:
  • table_name -- Table name in the import_specs table given to the CsvImporter
  • instance_name -- Identifies a specific instance under table_name
  • attribute_name -- Name of the attribute to return

License (MIT)

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

About

A simple python ETL tool: CSV -> SQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages