Communities

Writing
Writing
Codidact Meta
Codidact Meta
The Great Outdoors
The Great Outdoors
Photography & Video
Photography & Video
Scientific Speculation
Scientific Speculation
Cooking
Cooking
Electrical Engineering
Electrical Engineering
Judaism
Judaism
Languages & Linguistics
Languages & Linguistics
Software Development
Software Development
Mathematics
Mathematics
Christianity
Christianity
Code Golf
Code Golf
Music
Music
Physics
Physics
Linux Systems
Linux Systems
Power Users
Power Users
Tabletop RPGs
Tabletop RPGs
Community Proposals
Community Proposals
tag:snake search within a tag
answers:0 unanswered questions
user:xxxx search by author id
score:0.5 posts with 0.5+ score
"snake oil" exact phrase
votes:4 posts with 4+ votes
created:<1w created < 1 week ago
post_type:xxxx type of post
Search help
Notifications
Mark all as read See all your notifications »
Q&A

Welcome to the Power Users community on Codidact!

Power Users is a Q&A site for questions about the usage of computer software and hardware. We are still a small site and would like to grow, so please consider joining our community. We are looking forward to your questions and answers; they are the building blocks of a repository of knowledge we are building together.

Comments on ELI5 these 9 methods to compare similar pictures – Hash, MD5, SHA?

Parent

ELI5 these 9 methods to compare similar pictures – Hash, MD5, SHA?

+0
−6

Please explain like I'm 5, in SIMPLE English – all 9 methods below? Please compare and contrast

  1. Hash, MD5, SHA?

  2. why Hash has a, b, d, p in front?

  3. why SHA has 4 different Bits?

AllDup image duplicate finder with list of search comparison methods ahash, bhash, dhash, phash, MD5 (128 bit), SHA-1 (160 bit), SHA-2 (256 bit), SHA-2 (384-bit), SHA-2 (512 bit)

History
Why does this post require moderator attention?
You might want to add some details to your flag.
Why should this post be closed?

2 comment threads

General comments (2 comments)
x-post https://superuser.com/q/1625385/383391 (1 comment)
Post
+4
−0

A hash is a mathematical function that maps a large number to a much smaller number. Files can be considered numbers; graphics files are certainly amenable to that.

For example, an MD5 hash is always 128 bits long. You can feed any file to an MD5 hash generator, and the same file will always produce the same resulting hash. However, each hash could correspond to many different files.

If you are looking for duplicate files, you could construct a hash for each file and then only check the files which have the same hash.

All of the methods described in this program's interface are hash functions.

If you don't care particularly, MD5 tends to be pretty fast to generate.

If you have a preference for which kind of hash to generate, you can select it here. Most people won't have much preference.

There are situations in which the kind of hash makes a difference -- this does not appear to be one of them.

History
Why does this post require moderator attention?
You might want to add some details to your flag.

1 comment thread

General comments (4 comments)
General comments
TextKit‭ wrote almost 3 years ago

Thanks. Can you please clarify which are the best and worse "Comparison methods" here?

dsr‭ wrote almost 3 years ago

There are two very reasonable choices: MD5 and SHA2-512. Use MD5 if speed is the most important factor. Use SHA2-512 if accuracy is the most important factor.

elgonzo‭ wrote almost 3 years ago · edited almost 3 years ago

The first sentence is wrong. A hash function can also map a smaller number to a larger number. You can prove this easily by yourself: Take a bunch of files that are 10 bytes long, which means each of these files is being equivalent to an at maximum 80 bit wide number. Now create the MD5 hash sum for each of these small files. The MD5 hash values/numbers for most of these files will certainly be larger than the largest number that could fit into 80 bits.

dsr‭ wrote almost 3 years ago

You are technically correct, which is the best kind of correct because if you aren't technically correct, you aren't thoroughly correct.

However, the original poster asked for ELI5, so I thought the small deviation from strict correctness to emphasize why hashing is used was worthwhile.

If you know in advance that all your values are smaller than the output of your hash function, there's no point in hashing.