Skip to content

Commit

Permalink
Bento616
Browse files Browse the repository at this point in the history
Added capability to split transactions by file using the --split-transactions argument
  • Loading branch information
AustinSMueller committed Sep 17, 2020
1 parent 371ab10 commit 78a2301
Show file tree
Hide file tree
Showing 3 changed files with 20 additions and 3 deletions.
4 changes: 3 additions & 1 deletion config/data-loader-config.example.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,10 @@ Config:
no_confirmation: false
# Max violations to display, default is 10, can be overridden by -M/--max-violations argument
max_violations: 10
#Disable saving parent IDs in children
# Disable saving parent IDs in children
no_parents: false
# Split the loading transaction into separate transactions for each file
split_transactions: false

# S3 bucket name, if you are loading from an S3 bucket, can be overridden by -b/--bucket argument
s3_bucket:
Expand Down
6 changes: 6 additions & 0 deletions docs/data-loader.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,7 @@ An example configuration file can be found in ````config/data-loader-config.exam
* ````no_confirmation````: Automatically confirms any confirmation prompts that are displayed during the data loading
* ````max_violations````: The maximum number of violations (per data file) to be displayed in the console output during data loading
* ````no_parents````: Does not save parent node IDs in children nodes
* ````split_transactions````: Splits the database load operations into separate transactions for each file
* ````s3_bucket````: The name of the S3 bucket containing the data to be loaded
* ````s3_folder````: The name of the S3 folder containing the data to be loaded
* ````loading_mode````: The loading mode to be used
Expand Down Expand Up @@ -157,6 +158,11 @@ All of command line arguments can be specified in the configuration file. If an
* Command : ````--no-parents````
* Not Required
* Default Value : ````false````
* **Enable Split Transactions Mode**
* Creates a separate database transactions for each file while loading
* Command : ````--split-transactions````
* Not Required
* Default Value : ````false````
* **Dataset Directory**
* The directory containing the data to be loaded, a temporary directory if loading from an S3 bucket
* Command : ````--dataset <dir>````
Expand Down
13 changes: 11 additions & 2 deletions loader.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,7 +49,8 @@ def parse_arguments():
default=UPSERT_MODE)
parser.add_argument('--dataset', help='Dataset directory')
parser.add_argument('--no-parents', help='Does not save parent IDs in children', action='store_true')

parser.add_argument('--split-transactions', help='Creates a separate transaction for each file',
action='store_true')
return parser.parse_args()


Expand Down Expand Up @@ -93,10 +94,16 @@ def process_arguments(args, log):
sys.exit(1)

# Conditionally Required Fields
if args.split_transactions:
config.split_transactions = args.split_transactions
if args.no_backup:
config.no_backup = args.no_backup
if args.backup_folder:
config.backup_folder = args.backup_folder
if config.split_transactions and config.no_backup:
log.error('--split-transaction and --no-backup cannot both be enabled, a backup is required when running'
' in split transactions mode')
sys.exit(1)
if not config.backup_folder and not config.no_backup:
log.error('Backup folder not specified! A backup folder is required unless the --no-backup argument is used')
sys.exit(1)
Expand Down Expand Up @@ -163,6 +170,8 @@ def process_arguments(args, log):
if args.no_parents:
config.no_parents = args.no_parents



return config


Expand Down Expand Up @@ -262,7 +271,7 @@ def main():
loader = DataLoader(driver, schema, visit_creator)

loader.load(file_list, config.cheat_mode, config.dry_run, config.loading_mode, config.wipe_db,
config.max_violations, config.no_parents)
config.max_violations, config.no_parents, split=config.split_transactions)

if driver:
driver.close()
Expand Down

0 comments on commit 78a2301

Please sign in to comment.