Skip to content

steve-chavez/pg_bzip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

9ab680d · Dec 17, 2023

History

5 Commits
Dec 15, 2023
Dec 17, 2023
Dec 15, 2023
Dec 17, 2023
Dec 17, 2023
Dec 15, 2023
Dec 15, 2023
Dec 15, 2023
Dec 17, 2023
Dec 15, 2023
Dec 15, 2023

Repository files navigation

pg_bzip

Motivation

If you get data compressed as bzip2, whether through HTTP or from a file, it's convenient to decompress it in SQL. pg_bzip does that, it provides functions to decompress and compress data using bzip2.

Functions

  • bzcat(data bytea) returns bytea

    This function mimics the bzcat command, which decompresses data using bzip2.

    For this example, we'll use the native pg_read_binary_file to read from a file.

    select convert_from(bzcat(pg_read_binary_file('/path/to/all_movies.csv.bz2')), 'utf8') as contents;
    
                                                                      contents
    --------------------------------------------------------------------------------------------------------------------------------------------
     "id","name","parent_id","date"                                                                                                            +
     "2","Ariel","8384","1988-10-21"                                                                                                           +
     "3","Varjoja paratiisissa","8384","1986-10-17"                                                                                            +
     "4","État de siège",\N,"1972-12-30"                                                                                                       +
     "5","Four Rooms",\N,"1995-12-22"                                                                                                          +
     "6","Judgment Night",\N,"1993-10-15"                                                                                                      +
     "8","Megacities - Life in Loops",\N,"2006-01-01"                                                                                          +
     "9","Sonntag, im August",\N,"2004-09-22"                                                                                                  +
     "11","Star Wars: Episode IV – A New Hope","10","1977-05-25"                                                                               +
     "12","Finding Nemo","112246","2003-05-30"                                                                                                 +
     ...
     ....
     .....
  • bzip2(data bytea, compression_level int default 9) returns bytea

    This function is a simplified version of the bzip2 command. It compresses data using bzip2.

    For this example we'll use fio_writefile from pgsql-fio, which offers a convenient way to write a file from SQL.

    select fio_writefile('/path/to/my_text.bz2', bzip2(repeat('my text to be compressed', 1000)::bytea)) as writesize;
    
     writesize
    -----------
           109

Installation

bzip2 is required. Under Debian/Ubuntu you can get it with

sudo apt install libbz2-dev

Then on this repo

make && make install

Now on SQL you can do:

CREATE EXTENSION bzip;

pg_bzip is tested to work on PostgreSQL >= 12.

Development

Nix is used to get an isolated and reproducible enviroment with multiple postgres versions.

# enter the Nix environment
$ nix-shell

# to run the tests
$ with-pg-16 make installcheck

# to interact with the isolated pg
$ with-pg-16 psql

# you can choose the pg version
$ with-pg-15 psql