Skip to content

Replace a string across all files in an S3 bucket

Notifications You must be signed in to change notification settings

DallasMorningNews/s3replace

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

s3replace

Screenshot

Sometimes we have to replace the same thing in lots of files in an S3 bucket - for example, when the paywall code changes or when we switch commenting vendors. This repo is our automated solution.

It uses the AWS API to roll through all of the objects in a bucket:

  1. Filtering the objects to search using a regular expression, it downloads any object that matches.
  2. Of those objects that match, it uses another regular expression to find the relevant code to replace.
  3. If the object's content is a match, you'll be given a preview and asked for confirmation before anything is changed.
  4. It replaces the code, copying metadata such as the ContentType, ContentDisposition and other key fields. A backup of the file is saved locally, just in case.

Requirements

  • Python 3 - brew install python
  • pipenv - brew install pipenv

Installation and setup

  1. Clone this repo
  2. Install requirements using pipenv.

Usage

Configuration

In s3replace/main.py:

  • update the needle_pattern at the top. This pattern will be used by re.search to find matching documents and it'll be the content that is replaced using re.sub.
  • set replace_with at the top of the file to the text you want to replace the needle_pattern with
  • update the key_pattern variable to match the keys you want to run needle_pattern against; the more specific this is, the better; files that match this won't be downloaded, which is the slowest part of the process

Running

This runs as a command line tool. See all the options by running python s3replace --help:

$ python s3replace --help

Find and replace for an S3 bucket.

Usage:
  s3replace <bucket> [--dry-run] [--access-key-id=<key>] [--secret-access-key=<key>]
  s3replace -h | --help
  s3replace --version

Options:
  -h --help                 Show this screen.
  --version                 Show version.
  --dry-run                 Don't replace.
  --access-key-id=<key>     AWS access key ID
  --secret-access-key=<key> AWS secret access key

Basic usage only requires a bucket name and credentials:

$ python s3replace <bucket> --access-key-id=<yourid> --secret-access-key=<yourkey>

You can pass your AWS credentials using the flags, as above, or you can provide them using any of the other methods supported by boto3.

Copyright

© 2018 The Dallas Morning News

About

Replace a string across all files in an S3 bucket

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages