Compression and Archiving on CP/M

CP/M has the ability to handle lots of different compression and archive formats which was important due to the limited capacity of floppy disks and the cost of downloading/uploading files on BBS's. They each have their pros and cons and this article will explore some of the most common ones and where you can find programs on the Walnut Creek CD to handle them.

Compression Only

The first compression formats on CP/M only compressed single files and would change the middle letter of the file extension to signify that the file had been compressed.

.?Q?
Squeeze was an early compression format that used Huffman encoding to compress files. These can be squeezed (compressed) with sq and unsqueezed (decompressed) with usq.
.?Z?
Crunch brought LZW compression to CP/M and these files can be handled with crunch.
.?Y?
These files, using LHA compression, were relatively uncommon. They can be handled with crlzh or my favourite for just decompressing is uncr.

Archive Only

.LBR
LBR was an early CP/M format that allowed you to combine multiple files into a single archive. These files would often have been compressed with tools such as squeeze or crunch. Because it was so common it was well supported by other tools such as QL, LRUN, LSWEEP and others which can look into a .LBR archive and use individual files without having to separately extract first. These files can be handled using nulu or if you just want to extract files, delbr. For more information have a look at our article: Working with .LBR on CP/M.

Multiple File Archives with Compression

Later on CP/M adopted formats from other platforms, such as MS-DOS, which integrated file compression and archiving into a single format.

Compress and Decompress

.ARC/.ARK
This is the most common compressed archive format on CP/M. Internally it analyses each file which it is asked to compress and tries to find the best compression method such as squeeze, crunch, etc. It can be decompressed using unarc or created using arc.
.LZH/LHA
A common format at one time on MS-DOS and still is on the Amiga. These can handled using crlzh.
.PMA
This is a variant of LHA and as far as I'm aware was only used on CP/M. These files can be handled using PMarc.

Decompress Only

CP/M can also decompress formats that were common on other platforms such as MS-DOS and Windows and in the case of .ZIP still is. They can't be created under CP/M but it is useful to be able to decompress them so that you can read files created on other systems. Unfortunately, the unzip utilities I've found only unzip files created with PKZIP 1.x and therefore can't use the DEFLATE algorithm introduced by Phil Katz's 1993 release of PKZIP 2.04g.

.ZIP
There are lots of files compressed as .ZIP files on the Walnut Creek CD and therefore despite not being able to decompress modern .ZIP files under CP/M it is still useful to decompress them. They can be unzipped with unzip.
.ARJ
This was pretty common at one time but got overtaken by .ZIP. To decompress use unarj.

Self-Extracting Archives

The PMarc tool mentioned above can also create self-extracting .com files. Which made it really easy to distribute multiple files, but this does add extra overhead and reduce flexibility.

Benchmarks

The various compression formats produce different results. To compare them I have taken some of the most common and used them to compress two files: ED.COM and TAO.TXT. These files can be seen in the first two rows of the table followed by various compressed versions of them.

FilenameSize (Kb)Size (Records)
ED.COM 10 73 Original binary file (CP/M Plus Editor)
TAO.TXT 27 214 Original text file
ED.CQM 8 63 Squeezed version of ED.COM
TAO.TQT 14 110 Squeezed version of TAO.TXT
ED.CZM 7 54 Crunched version of ED.COM
TAO.TZT 11 86 Crunched version of TAO.TXT
BOTH.LBR 36 288 LBR archive containing files: ED.COM and TAO.TXT (no compression)
BOTHS.LBR 22 174 LBR archive containing files: ED.CQM and TAO.TQT (squeezed)
BOTHC.LBR 18 141 LBR archive containing files: ED.CZM and TAO.TZT (crunched)
ED.ARK 7 56 Ark version of ED.COM (Ark crunched this file)
TAO.ARK 11 88 Ark version of TAO.TXT (Ark crunched this file)
BOTH.ARK 18 142 Ark version containing files: ED.COM and TAO.TXT (Ark crunched both files)

Video of Compression and Archiving Tools

You can see some of the tools in action below.

Creative Commons License
Compression and Archiving on CP/M by Lawrence Woodman is licensed under a Creative Commons Attribution 4.0 International License.

Share This Post

Feedback/Discuss

Related Articles

Advanced Use of .LBR files on CP/M

Lots of utilities make use of .LBR (Library) files to provide extra facilities such as the ability to run commands from archives or mount them as if they were drives. This article shows some more adva...   Read More

Working with .LBR files on CP/M

The .LBR (Library) file format was the most common form of multi-file archive on personal computers at one time. It was created by Gary P. Novosielski in 1982 for use by his LU (Library Utility) progr...   Read More

Modula-2 Compilers on CP/M

Modula-2 is a great language in general and is a good choice for programming on CP/M. There are three good compilers available for CP/M which all require a Z80 processor and we'll compare each in turn...   Read More

XCCP: A Shell Extension for CP/M

XCCP describes itself as an Extended Console Command Processor for CP/M. It supports the 8080 and v1.0 was released by Anton R. Fleig in 1984. Like EPEX, XCCP doesn't require installing so we can begi...   Read More

EPEX: An Environment Extension for CP/M

Epex is an evironment extension for CP/M. It stands for Environmental Processing EXecutive, and v1.1 was released by James H. Whorton in 1986. It can make using CP/M much more comfortable at the cost...   Read More