Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Post History
I'd like to use the similarities between files to produce a gzip archive that's as small as possible. On gzip's Wikipedia entry it says that: gzip is not to be confused with the ZIP archive for...
#2: Post edited
gzip reduce archive size if files are similar
- reduce gzip archive size if files are similar
- I'd like to use the similarities between files to produce a gzip archive that's as small as possible.
- On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that:
- > gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).
- I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png`
- But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical.
- Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`.
- **Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?**
- ---
- I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.
- I'd like to use the similarities between files to produce a gzip archive that's as small as possible.
- On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that:
- > gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression).
- I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png`
- But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical.
- Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`.
- **Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?**
- Used tools and version:
- * gzip 1.12
- * GNU tar 1.34
- * bsdtar 3.6.1
- ---
- I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.
#1: Initial revision
gzip reduce archive size if files are similar
I'd like to use the similarities between files to produce a gzip archive that's as small as possible. On [gzip's Wikipedia entry](https://en.wikipedia.org/wiki/Gzip) it says that: > gzip is not to be confused with the ZIP archive format, which also uses DEFLATE. The ZIP format can hold collections of files without an external archiver, but is less compact than compressed tarballs holding the same data, because it compresses files individually and cannot take advantage of redundancy between files (solid compression). I'd like use that "redundancy between files" and for testing have tried to gzip multiple *identical* PNG files, assuming that their size would not add up: `tar -czf pngs.tar.gz *.png` But as it turns out, the resulting `.tar.gz` file is considerably larger than one of the PNGs, indicating that the algorithm didn't pick up on the fact that all files were identical. Also, creating a non-compressed tar file with `tar -cvf pngs.tar *.png` and then using `gzip` on it did not reduce the size: `gzip --keep pngs.tar`. **Is there a way to create a `tar.gz` archive that uses the redundancies/similarities between its files?** --- I know I can use 7-Zip with something like `bsdtar --auto-compress -cf pngs.tar.7z pngs.tar` to create an archive that's smaller than *one* PNG but I'm curious if it's also possible with gzip.