A hash function is a function that, no matter how long the data you feed it, returns a fixed-length value (a hash value). With SHA-256, whether the input is a single character or a one-gigabyte file, the result is always a 256-bit value (64 hexadecimal characters). What matters is that the same input always produces the same value (deterministic), while you cannot get the original input back from the hash value (one-way). Using SHA-256 as an example, this article lays out how hash functions work, their properties, the difference from encryption, and the correct way to use them for password storage and tamper detection.
1. What a hash function is — any length to fixed length, one-way and deterministic
A hash function takes input (a message) of any length and converts it into a fixed-length bit string (a hash value, or digest). The leading example, SHA-256, always produces a 256-bit output no matter what the input is. It has three basic properties.
- Any length → fixed length: regardless of the input length, the output length is constant (SHA-256 is 256 bits = 64 hexadecimal characters).
- Deterministic: the same input always yields the same hash value, computed anytime, anywhere. That is what makes it usable for matching.
- One-way: computing input → hash value is easy, but there is no procedure to derive input → hash value in reverse. Because information is compressed and lost, it cannot be "decrypted" in principle.
Let us confirm that the same input always yields the same 64 characters with a concrete example.
SHA-256("hello") = 2cf24dba5fb0a30e26e83b2ac5b9e29e1b161e5c1fa7425e73043362938b9824
2. Properties of hash functions — collision resistance, preimage resistance, the avalanche effect
A cryptographically secure hash function does more than return a fixed length; it is designed to satisfy the following properties.
- Preimage resistance (one-wayness): given a hash value
h, it is computationally very hard to find an inputxsuch thathash(x) = h. - Second-preimage resistance: given an input
x, it is hard to find a different inputy(y ≠ x) such thathash(x) = hash(y). - Collision resistance: it is hard to find any pair of distinct inputs
(x, y)such thathash(x) = hash(y). Since the output is finite, collisions must mathematically exist, but the requirement is that they cannot be found with any realistic amount of computation.
Another important property is the avalanche effect. Changing the input by just one bit flips about half of the output bits, so the result looks completely different. Compare the example below (only a single trailing . was added).
| Input | SHA-256 (leading part) |
|---|---|
The quick brown fox | a value such as 5cac4f98… |
The quick brown fox. | changes to something entirely different like 7d38b56b… |
Because of this, a hash value gives no hint at all about which parts of the original data are similar. Even a one-character tamper changes the hash dramatically, which makes hashing well suited to tamper detection.
3. The difference from encryption — a hash cannot be decrypted
Hashing and encryption are often confused, but their purpose and reversibility are entirely different. The biggest difference is whether you can get the original back.
| Aspect | Hash (SHA-256, etc.) | Encryption (AES, etc.) |
|---|---|---|
| Purpose | Fingerprint / matching / tamper detection | Keeping data secret (making it unreadable) |
| Reversibility | Irreversible (cannot be decrypted) | Reversible (decryptable with the key) |
| Key | Not required | Required (encryption/decryption key) |
| Output length | Fixed length | Roughly proportional to the input |
| Typical example | Password matching / checksums | Encrypting data in transit or at rest |
Encryption is a mechanism that assumes a holder of the correct key will later restore the plaintext. A hash, by contrast, has no "restore" operation defined at all. The phrase "decrypt a hashed password" is incorrect; all you can actually do is hash the entered password again and compare it with the stored value.
4. Representative algorithms — MD5/SHA-1 deprecated, SHA-256/512 recommended
There are many kinds of hash functions, some confirmed secure and some not.
| Algorithm | Output length | Recommendation / notes |
|---|---|---|
| MD5 | 128 bits | Deprecated. Collisions are easy to construct. Not for cryptographic use |
| SHA-1 | 160 bits | Deprecated. A practical collision was published in 2017 (SHAttered) |
| SHA-256 | 256 bits | Recommended. SHA-2 family. The current standard choice |
| SHA-512 | 512 bits | Recommended. SHA-2 family. Often faster on 64-bit platforms |
| SHA-3 | Variable | Recommended. A newer generation with a different internal structure from SHA-2 |
For non-cryptographic purposes with no adversary, such as simple deduplication or cache keys, MD5 is not immediately problematic. Still, to avoid confusion and misuse, it is safest to avoid it in new implementations.
5. Use cases — password storage, tamper detection, data identification
Hash functions are widely used close at hand. Here are the representative use cases, along with how to use them correctly.
- Password storage: store the hash rather than the plaintext, and at login, hash the entered value with the same procedure to compare. However, raw SHA-256 is not enough; the rule is to add a salt per user and use a dedicated password-hashing function (key derivation function) such as bcrypt, scrypt, or Argon2 (see the next section).
- Tamper detection / checksums: by publishing the hash of a file or message, the recipient can recompute it and confirm a match, noticing corruption or tampering during transfer. A distribution's
SHA-256SUMSserves this purpose. - Data identification (content addressing): you can create a unique ID from the content itself. This is used for Git commits, deduplicating storage, cache keys, and more.
- Inside signatures and HMAC: digital signatures and message authentication (HMAC) also use a hash internally to guarantee the integrity and authenticity of data.
6. Cautions — do not store passwords with raw SHA, use salt and stretching
Finally, let us cover the easiest mistake. Do not hash passwords with raw SHA-256 for storage. The reason is that SHA-256 is built to be fast to compute. Speed is usually an advantage, but for an attacker it means "an enormous number of candidates can be tried per second," which works in their favor.
Salt
A salt is a unique random value added to the password per user before hashing. It defends against the following attacks.
- Defeats rainbow tables: precomputed hash lookup tables become useless.
- Prevents spotting shared passwords: even identical passwords produce different hashes per user, so an attacker cannot batch-target "people with the same password" after a leak.
Key stretching
Key stretching deliberately makes a single verification heavy by repeating the hash computation thousands to hundreds of thousands of times, or by forcing it to use a large amount of memory. A slowdown imperceptible to legitimate users becomes a major barrier to an attacker doing brute force. The functions that implement this safely are dedicated password functions such as bcrypt, scrypt, and Argon2.
Frequently Asked Questions (FAQ)
What is the difference between hashing and encryption?
The biggest difference is whether you can get the original back. Encryption uses a key to turn plaintext into ciphertext, and with the correct key you can decrypt it back to the original plaintext (reversible). A hash function, by contrast, only computes a fixed-length value from the input, and there is no procedure to recover the original input from the hash value (one-way and irreversible). So while you can "store a password as a hash," you cannot "decrypt hashed data" in principle. Use encryption for data you want to keep secret and read back later, and use hashing as a fingerprint for tamper detection or matching.
Are SHA-1 and MD5 still safe to use?
They must not be used where collision resistance is required, such as signatures, certificates, or tamper detection. MD5 was broken long ago, and practical collisions for SHA-1 (two different inputs producing the same hash value) were published in 2017; both are deprecated. For new uses, use SHA-256 or SHA-512 (the SHA-2 family) or SHA-3. For non-cryptographic purposes with no adversary, such as simple deduplication or cache keys, continuing to use MD5 is not immediately dangerous, but it is safest to avoid it in new implementations to prevent confusion.
Is SHA-256 alone enough for storing passwords?
No. SHA-256 is designed to be fast to compute, and that speed works in the attacker's favor, so hashing passwords with raw SHA-256 for storage is inappropriate. Attackers can try an enormous number of candidates per second on a GPU, and without a salt they can also use rainbow tables or spot users who share the same password. For passwords, add a unique salt per user and use a dedicated password-hashing function (key derivation function) with a tunable cost, such as bcrypt, scrypt, or Argon2. These are intentionally slow and make brute-force attacks hard.