tap-s3

This is a Singer tap that reads data from files located inside a given S3 bucket and produces JSON-formatted data following the Singer spec.

How to use it

tap-s3 works together with any other Singer Target to move data from s3 to any target destination.

Install

First, make sure Python 3 is installed on your system or follow these installation instructions for Mac or Ubuntu.

This project is set up using Python Poetry. Once cloned and within the project directory, install dependencies with:

poetry install
``

### Configuration

Here is an example of basic config, and a bit of a run down on each of the properties:

```json
{
    "aws_access_key_id": "key",
    "aws_secret_access_key": "secret",
    "start_date": "2020-08-21T00:00:00Z",
    "tables": {
        "people": {
            "bucket_name": "bucket_name",
            "search_prefix": "prefix",
            "search_pattern": "people",
            "file_type": "csv",
            "delimiter": "/",
            "tap_stream_id": "people",
            "primary_key": ["id"],
            "replication_method": "FULL_TABLE",
            "valid_replication_keys": [""],
            "replication_key": "",
            "object_type": "PERSON"
        },
        "houses": {
            "bucket_name": "bucket_name",
            "search_prefix": "prefix_1",
            "search_pattern": "houses",
            "file_type": "csv",
            "delimiter": "/",
            "tap_stream_id": "houses",
            "primary_key": ["id"],
            "replication_method": "INCREMENTAL",
            "valid_replication_keys": [""],
            "replication_key": "",
            "object_type": "HOUSE"
        }
    }
}

aws_access_key_id: This is your AWS access key
aws_secret_access_key: This is your AWS secret key.
start_date: This is the datetime that the tap will use to look for newly updated or created files, based on the modified timestamp of the file.
tables: A dictionary of how to find the tables you want to create from S3 Currently only supports CSV format.

    {
        "bucket_name": "bucket_name",
        "search_prefix": "prefix_1",
        "search_pattern": "houses.csv",
        "delimiter": "/",
        "tap_stream_id": "houses",
        "primary_key": ["id"],
        "replication_method": "INCREMENTAL",
        "valid_replication_keys": [""],
        "replication_key": "",
        "object_type": "HOUSE"
    }

bucket_name: This is the name of the bucket in which the objects reside.
search_prefix: This is a prefix to apply after the bucket, but before the file search pattern, to allow you to find files in "directories" below the bucket.
search_pattern: This is based on the key of the object you wish to retrieve.
file_type: This is the file extention of the object you wish to read. Currently only csv file extensions are supported
delimiter: This allows you to specify a custom delimiter, such as \t or |, if that applies to your files. Currently not used.
tap_stream_id: The name of the tap stream and the name of the resulting table. This should be unique within this tap.
primary_key: The primary key(s) for the stream.
replication_method: Whether the stream should sync all records from start_date on every run, or whether the stream should only sync new/updated records. Valid values: "INCREMENTAL", "FULL_TABLE"

A sample configuration is available inside config.sample.json

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.circleci		.circleci
tap_s3		tap_s3
tests		tests
.gitignore		.gitignore
README.md		README.md
config.sample.json		config.sample.json
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tap-s3

How to use it

Install

About

Releases

Packages

Languages

Mashey/tap-s3

Folders and files

Latest commit

History

Repository files navigation

tap-s3

How to use it

Install

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages