Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change postgres column types to use less space #269

Closed
lyricnz opened this issue Sep 21, 2023 · 5 comments
Closed

Change postgres column types to use less space #269

lyricnz opened this issue Sep 21, 2023 · 5 comments
Labels
enhancement New feature or request

Comments

@lyricnz
Copy link
Contributor

lyricnz commented Sep 21, 2023

Following on from the work in #255
and related to (but maybe obsoleted by?) the work in #268
we might be able to save a little space by using shorter columns, and/or a numeric.

Current schema is

  gnaf_pid text NOT NULL,
  address text NOT NULL,
  locality_name text NOT NULL,
  postcode text NULL,
  state text NOT NULL,
  latitude numeric(10,8) NOT NULL,
  longitude numeric(11,8) NOT NULL

However:

  • gnaf_pid is always 14 characters
  • the longest address is 61 characters
  • the longest locality_name is 37 characters
  • postcode 4 digits
  • state is 2-3 characters (from small set)
  • the index (locality_name + state) is >100MB
@lyricnz
Copy link
Contributor Author

lyricnz commented Sep 21, 2023

I think postgres is fairly efficient - the storage of a text field is only the number of characters+1. So we'd save 1 byte per row by changing gnaf_pid, and 3 bytes per row changing postcode to a shortint.

At 14.7m rows, this would save ~57MB on data

@lyricnz
Copy link
Contributor Author

lyricnz commented Sep 22, 2023

Postgres doesn't have a one-byte integer (to use for a State enum) so changing the state to an int would only save 1-2 bytes. Probably not worth the hassle and extra code. Changing it to a CHAR(3) would do the same.

Edit: could use a CHAR type (a single character)

@lyricnz
Copy link
Contributor Author

lyricnz commented Sep 22, 2023

Updating gnaf_pid to CHAR(14) and postcode to SMALLINT:

❯ docker images
REPOSITORY                     TAG       IMAGE ID       CREATED          SIZE
<none>                         <none>    86ed372521d9   39 minutes ago   2.97GB
lukeprior/nbn-upgrade-map-db   latest    8a2b859be4ba   4 days ago       3.01GB

Or, more accurately 2972585117 vs 3014560925 (1.4% saving)

@lyricnz
Copy link
Contributor Author

lyricnz commented Sep 22, 2023

FWIW it seems (experimentally) that using CHAR(n) columns types actually consumes n+1 bytes of storage, which is the same as TEXT, so no point in changing gnaf_pid to CHAR(14). Ditto for state as CHAR(3).

@lyricnz
Copy link
Contributor Author

lyricnz commented Sep 22, 2023

We could save a couple of bytes per row using a CHAR for state and mapping it to/from client side, but this doesn't seem worth the bother.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants