-
Notifications
You must be signed in to change notification settings - Fork 0
compression
- Compressing & Archiving Files
Difference between compression and archiving:
Kind | Behaviour |
---|---|
Archiving only | Multiple files => one file (no compression) |
Compression only | Multiple files => multiple compressed files |
Archiving & Compression | Multiple files => one compressed file |
=> Most widespread tools can be found on the Arch Linux Wiki
Before compressing huge amounts of data it's worth performing a dry run with cp
/rsync
beforehand in order to check if any invalid filenames etc. might appear. It is worth the time at the end since compression may take hours or days on a standard PC.
# vv-- globbing
tar cf - file0* | xz -T 0 -4 -vv - > test-new.tar.xz
# ^--- no file specified due to pip (see also: https://unix.stackexchange.com/a/41829/116710)
# Is equivalent to:
# v-- use xz (J: xz, j: bzip2)
XZ_OPT="-T 0 -4 -vv" tar cfJ text-new.tar.xz file0*
# ^^^^ ^^--- compression preset/level
# ||||
# xz argument env var (for simple args you can write: XZ_OPT=-e9)
-
-T
: Number of CPU cores (0: use all available) -
-[0...9]
: compression preset/level (0
: fastest;9
: slowest, best compression;xz
auto-adjust this setting if you would run out of RAM) -
-v
: Verbose (shows (un-)compressed sizes, compression ratio) -
-vv
: Verbose (shows (un-)compressed sizes, compression ratio, memory required for (de-)compression, threads) -
-e
: Exteme (trade CPU time for better compression ratio; memory does not increase) -
-M
/--memory
: Limit memory to a certain limit (-M 0
=-M 40%
= default) => e.g.-M 70%
,-M 800MiB
, ...
# Auto-detect compression
tar xf test.tar.xz
# Explicitly state xz archive
tar xfJ test.tar.xz
# Decompress into PREEXISTING folder
tar xf test.tar.xz -C test-decomp
xz -l test.tar.xz
Strms Blocks Compressed Uncompressed Ratio Check Filename
1 1 690.9 KiB 940.0 KiB 0.735 CRC64 test.tar.xz
-
xz
dict size and compression levels/presets are the same as lzma2 (see below) -
xz
uses lzma(2) -
tar
: if you state arguments via the dash syntax make suref
is the last option (-cfJ
does not work,cfJ
/-cJf
works)
lz4
is pretty interesting since it has a good balance of compression ration plus good (de-)compression speeds.
tar cf - /input/path/ | lz4 -6 - /output/archive.tar.lz4
# ^^-- Compression ratio [1; 12]; I only recommend levels from 1 to ~6 (others are slow for the achieved compression ratio)
Personally, I had problems with 7zip
for very large folders (> 500 GiB). I moved to tar
+ lz4
for such use cases.
# Add files to archive
# files to compress (globbing is allowed)
# vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
7z a -m0=lzma2 -mx=7 backups.7z file1.txt aFolder file1.csv blahFiles*
# ^ ^^^^^^ ^--------- compression
# | compression ^^ level
# | method ||
# add archive name
7z l ./backups.7z
https://quixdb.github.io/squash-benchmark/#results-table
- LZMA
- LZMA2
- PPMd
- BZip2
- Deflate
- Delta
- BCJ
- BCJ2
- Copy
Level | Meaning | Dict size |
---|---|---|
0 | Copy | -- |
1 | Fastest | 64 Kb |
3 | Fast | 1 MB |
5 | Normal | 16 MB |
7 | Maximum | 32 MB |
9 | Ultra | 64 MB |
Format | 7z / 7zip | xz | OS |
---|---|---|---|
.tar.xz | r/-/m | r/w/m | Windows / Linux |
.7z | r/w/- | r/-/- | Windows / Linux (no metadata) |
=> r/w/m
: read/write/modify
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License *.
Code (snippets) are licensed under a MIT License *.
* Unless stated otherwise