Welcome to the Power Users community on Codidact!
Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.
Anti-Twin's Byte by Byte“ vs. ”Compare images (pixels)"
Kindly see the red rectangle below. How does "Byte by Byte" differ from "Compare images (pixels)"? Which is more accurate? What are the pros and cons of each?
Advise me if I got to ask this separately, but how does a byte relate to a pixel? I can't decide which answer on Quora is correct.
2 answers
Images can be compressed to take up less space; most image formats automatically include this feature.
There are two general categories of image compression. A lossless method is guaranteed to produce exactly the same picture elements (pixels) that went into it. A lossy method produces an approximation of the original image data.
If you had a photograph of the Great Wall of China taken on a high-end camera, you might end up with:
a RAW image which is very large and completely uncompressed
a PNG image which is quite large but compressed losslessly
a JPEG image which is medium-sized and looks pretty good but is flawed under magnification
a JPEG image which is quite small and doesn't look very good, to be used as a thumbnail
Of these, the byte-for-byte comparison would not call any of them duplicates, but the pixel-by-pixel comparison would call the RAW and PNG versions the same image.
I never heard of Anti-Twin. So I went to the web site and quickly found the answer:
http://www.joerg-rosenthal.com/en/antitwin/similar.html
As expected, "byte by byte" will look for an exact match. Change a single pixel and it fails. Change something in the header (e.g., strip EXIF details) and it fails.
"Compare image (pixels)" gets more interesting. In addition to the compressed vs. uncompressed issue, there are apparently other potential pitfalls:
As a technical requirement, the image comparison had to be programmed with a detection of only a fuzzy similarity. Due to different image formats and compressions, pixels always differ slightly, even if this difference is not intended (e.g. by color reduction or compression artifacts). As a consequence, Anti-Twin is deliberately inaccurate and even color-blind when comparing pixels. In addition, Anti-Twin ignores the image files' file sizes as well as height and width in pixels. This unfortunately results in a certain lack of reliability of the results displayed.
This can easily result in False Negatives - e.g., a resized or cropped image will always fail. It can also result in False Positives - an image that has been changed in some subtle but visible way, such as adjusting brightness uniformly across the entire image by a small amount, might show as a match due to each pixel being "close enough".
1 comment thread