Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Post History

80%
+6 −0
Q&A reduce gzip archive size if files are similar

I'd like to use the similarities between files to produce a gzip archive that's as small as possible. On gzip's Wikipedia entry it says that: gzip is not to be confused with the ZIP archive for...

1 answer  ·  posted 2y ago by Matthias Braun‭  ·  last activity 2y ago by Canina‭

Question gzip compression tar
#2: Post edited by user avatar Matthias Braun‭ · 2022-05-04T19:01:51Z (almost 2 years ago)
add versions
  • gzip reduce archive size if files are similar
  • reduce gzip archive size if files are similar
  • I'd like to use the similarities between files to produce a gzip archive that's as small as possible.
  • On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that:
  • > gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).
  • I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png`
  • But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical.
  • Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`.
  • **Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?**
  • ---
  • I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.
  • I'd like to use the similarities between files to produce a gzip archive that's as small as possible.
  • On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that:
  • > gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).
  • I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png`
  • But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical.
  • Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`.
  • **Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?**
  • Used tools and version:
  • * gzip 1.12
  • * GNU tar 1.34
  • * bsdtar 3.6.1
  • ---
  • I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.
#1: Initial revision by user avatar Matthias Braun‭ · 2022-05-04T18:54:35Z (almost 2 years ago)
gzip reduce archive size if files are similar
I'd like to use the similarities between files to produce a gzip archive that's as small as possible.

On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that:

> gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).

I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png`

But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical.

Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`.

**Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?**

---

I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.