Signup/Sign In

Use bzip2 utility to compress files in Linux

In this tutorial, we will learn about another Linux/Unix command line tool, bzip2. It is used compress and decompress files using the Burrows-Wheeler block sorting text compression algorithm, and Huffman coding. The command-line options are deliberately very similar to those of GNU gzip, but they are not identical.

How to install bzip2 tool?

Debian based - apt install bzip2
Raspbian - apt-get install bzip2
Alpine - apk add bzip2
Arch Linux - pacman -S bzip2
CentOS - yum install bzip2
Fedora - dnf install bzip2
OS X - brew install bzip2
Docker - docker run cmd.cat/bzip2 bzip2

More about bzip2 command

bzip2 expects a list of file names to accompany the command-line flags/options. Each file is replaced by a compressed version of itself, with the same name with .bz2 suffix, i.e, "original_name.bz2". Each compressed file has the same modification date, permissions, and, when possible, ownership as the corresponding original, so that these properties can be correctly restored at decompression time. File name handling is naive in the sense that there is no mechanism for preserving original file names, permissions, ownerships or dates in file systems which lack these concepts, or have serious file name length restrictions, such as MS-DOS systems.

Similarly, bzip2 -d decompresses all specified files. Files which were not created by bzip2 will be detected and ignored (based on hex values), and a warning issued. bzip2 attempts to guess the filename for the decompressed file from that of the compressed file as follows:

filename.bz2 -> filename
filename.bz -> filename
filename.tbz2 -> filename.tar
filename.tbz -> filename.tar
anyothername -> anyothername.out


Notice that if the provided compressed file does not have a file extension (bz2,bz.tbz.tbz2) then the compressed file will get ".out" appended to the original name.

Syntax of bzip2 command:

bzip2 [ -cdfkqstvzVL123456789 ] [ filenames ...  ]
bzip2 [ -h|--help ]

Options of bzip2 command:

-c, --stdout You can compress or decompress files to the standard output by using this flag.
-d, --decompress Force decompression.
-z, --compress The complement to -d: forces compression.
-t, --test Check integrity of the specified file(s), but don't decompress them. This really performs a trial decompression and throws away the result.
-f, --force bzip2 will by default not overwrite existing files. We can force overwrite of output files. Normally, bzip2 will not overwrite existing output files. Also forces bzip2 to break hard links to files, which it otherwise wouldn't do. bzip2 normally declines to decompress files which don't have the correct magic header bytes. If forced (-f), however, it will pass such files through unmodified.
-k, --keep Keep (don't delete) input files during compression or decompression.
-s, --small Reduce memory usage, for compression, decompression and testing. Files are decompressed and tested using a modified algorithm which only requires 2.5 bytes per block byte. This means any file can be decompressed in 2300 k of memory, albeit at about half the normal speed.
During compression, -s selects a block size of 200 k, which limits memory use to around the same figure, at the expense of your compression ratio. In short, if your machine is low on memory (8 megabytes or less), use -s for everything.
-q, --quiet Suppress non-essential warning messages. Messages pertaining to I/O errors and other critical events will not be suppressed.
-1 (or --fast) to -9 (or --best) Set the block size to 100 k, 200 k ... 900 k when compressing. Has no effect when decompressing. In particular, --fast doesn't make things significantly faster. And --best merely selects the default behavior.
-- Treats all subsequent arguments as file names, even if they start with a dash. This is so you can handle files with names beginning with a dash, for example: bzip2 -- -myfilename.
-v, --verbose Verbose mode - show the compression ratio for each file processed. Further -v's increase the verbosity level, spewing out lots of information which is primarily of interest for diagnostic purposes.
-L, --license Display the software version, license, terms and conditions.
--help display this help and exit.

Using bzip2 command in Linux

bzip2 command example

We will use this file for further examples:

$ echo -e "This is a file containing plain text. \nWe will use this text file for bzip2commands explanation" > foo
$
$ cat foo
This is a file containing plain text.
We will use this text file for bzip2commands explanation

1. Compress data with

One of the most straightforward use cases of the bzip2 command is to compress the provided data and write it to a file. Let’s take a look at the two ways to do so:
Read data from a file and write a compressed form of that data to the same file and also add .bz2 as extension. i.e, The file named foo becomes foo.bz2 after compression.

$ ls
foo
$ bzip2 foo
$ ls
foo.bz2

Read data from standard input (stdin) and compress the data. So, let us again use bzip2 to compress data for a file named foo.

$ echo -e "This is a file containing plain text. \nWe will use this text file for bzip2 commands explanation" | bzip2
bzip2: I won't write compressed data to a terminal.
bzip2: For help, type: `bzip2 --help'.


But, If no file names are specified, bzip2 compresses from standard input to standard output. In this case, bzip2 will decline to write compressed output to a terminal, as this would be entirely incomprehensible and therefore pointless.

$ echo -e "This is a file containing plain text. \nWe will use this text file for bzip2commands explanation" | bzip2 > bar
$ ls
bar  foo
$ cat -A bar
BZh91AY&SYM-QM-^KM-^?V^@^@^IM-SM-^@^@^P@^A^DM-^@?M-gM-^M-P ^@@M-SM"OSM-TM-rM-^^M-!M-dOBM-^@^@^C&F^]g9M-2M-sM-$M-0hM-?^YTV^QM-LM-aM-^J^\^&^KJM-kM-hM-jM-R1M->4+M-^AM-K_M-M-t^D?.lM-tM-\M-^WM-r7^CM-\M-ZM-^DiM-7M-XM-;M-^R)M-BM-^DM-^FM-^L_M-zM-0

Use -zc (or -z -c) options to read data from a file and compress the data and write to standard output.

$ bzip2 -zc foo > foo.bz2
$ ls
foo  foo.bz2

2. Decompress data

Read data from a file and write a decompressed form of that data to the same file and also remove .bz2 as extension. i.e, The file named foo becomes foo after decompression.

$ ls
foo.bz2
$ bzip2 -d foo
$ ls
foo

Read data from standard input (stdin) and decompress the data. So, let us again use bzip2 to decompress data for a file named bar.

$ cat bar | bzip2 -d
bzip2: I won't read compressed data from a terminal.
bzip2: For help, type: `bzip2 --help'.


But, If no file names are specified, bzip2 decompresses from standard input to standard output. In this case, bzip2 will decline to write decompressed output to a terminal, as this would be entirely incomprehensible and therefore pointless.

$ cat bar | bzip2 -d > foo
$ ls
bar  foo
$ cat -A foo
This is a file containing plain text. \nWe will use this text file for bzip2 commands explanation

Use -dc (or -d -c) options to read data from a file and decompress the data and write to standard output

$ bzip2 -zc foo.bz2 > foo
$ ls
foo  foo.bz2

Conclusion

In this article, we explored the bzip2 command. This command is used to compress and decompress data. Compression is generally considerably better than that achieved by more conventional LZ77/LZ78-based compressors, and approaches the performance of the PPM family of statistical compressors. It aids in combining the files into one that uses less storage space than the original file did. It uses more memory and has a slower decompression time.



About the author:
Pradeep has expertise in Linux, Go, Nginx, Apache, CyberSecurity, AppSec and various other technical areas. He has contributed to numerous publications and websites, providing his readers with insightful and informative content.