Why not store passwords directly
We have a project set up. We know authentication is about proving identity. The most common way to prove identity on the web is with a password: the user knows a secret, the server checks it, and if it matches, they are in. Easy, right? Well, “check that the password matches” sounds trivial until you realize the server has to store the password somewhere in order to compare it later. And once you start thinking about how to store it, the whole thing falls apart. That is what this lesson is about.
The obvious thing to do
Imagine you are writing the signup endpoint for the very first time. A user sends their email and password. You need to remember them so the user can log in tomorrow. What is the most natural thing to write?
const users = new Map();
users.set("[email protected]", {
email: "[email protected]",
password: "hunter2", // <- the real password, sitting right there
}); This works. It actually works perfectly. When Alice tries to log in, you pull up her record, compare what she typed against "hunter2", and if they are equal, you let her in. It runs, the tests pass, you can ship it.
And it is a security disaster.
The problem is not whether it works on a happy day. The problem is what happens when anything goes wrong. Because eventually, something does.
Database breaches are normal
Let’s talk about something uncomfortable. Databases get exposed. It happens to giant companies. It happens to tiny ones. It happens through misconfigured servers, SQL injection bugs, leaked backups, compromised employee laptops, accidental public S3 buckets, and a hundred other ways. If you have been paying attention to tech news at all in the last decade, you have seen it constantly. The question is not “will my database ever be exposed?” The question is “what happens when it is?”
If you store passwords in plain text, the answer to that question is “it is over.” An attacker with your database immediately has every user’s email and password. They do not need to crack anything. They do not need to run a single cracking tool. They just read it.
It gets worse. Because people reuse passwords across services (they should not, but they do, constantly) that attacker now probably has working credentials for your users’ email accounts, bank accounts, and social media. Your breach is not just your problem. It leaks into the rest of your users’ digital lives. That is a real thing that real people deal with after real breaches, and it is the reason we take password storage so seriously.
”OK, I’ll encrypt them”
This is usually the next thought. Encryption sounds like the right tool. It takes readable data and turns it into unreadable gibberish using a key. If an attacker reads your database, all they see is the gibberish.
encrypt("hunter2", key) → "x7f9a2b..."
decrypt("x7f9a2b...", key) → "hunter2" Here is the catch. The key to decrypt has to live somewhere your server can reach, because your server needs to be able to actually check passwords at login time. Environment variable, secrets manager, config file, wherever, it has to be accessible to your running code. Now think about the scenario again. An attacker has breached your server deeply enough to read the database. How likely is it that the same attacker can also find the encryption key? Very likely. They have already shown they can get inside. They grab the key, run decrypt on every row, and you are right back where you started.
Encryption is a great tool when you actually need the original data back later. A classic example is storing a credit card so you can charge it next month. But for passwords, here is the thing: you never need the original password back. Not once. Not ever. Your server only needs to answer one question, over and over:
“Does this password attempt match the one we stored?”
That is a very different question than “what was the original?” And it opens up a completely different solution.
Hashing, a one-way trapdoor
A hash function takes some input and produces a fixed-size output. The critical property, the one that makes this whole scheme work, is that you cannot reverse it. Given a hash, there is no mathematical way to figure out what went in. It is a one-way street.
hash("hunter2") → "a3f5b7c9d2e8..." You can go forward easily (input to hash) but you cannot go backward (hash to input). This is completely different from encryption. There is no key. There is no decryption function. The original value is, in a real sense, gone.
So how do we use this to check a login? Like this. When a user tries to log in, you hash their attempt and compare that to the stored hash:
Stored hash: hash("hunter2") → "a3f5b7c9d2e8..."
Login attempt: hash("hunter2") → "a3f5b7c9d2e8..."
Match? Yes → authenticated Same password produces the same hash every time. If the hashes are equal, the passwords were equal. That is all we need.
And here is the win: if the database is breached, the attacker sees hashes, not passwords. They cannot reverse the hashes back to the original passwords. The whole “we stole your database, now we have all your passwords” attack evaporates.
Wait, is it really that simple?
Unfortunately, no. Plain hashing has two serious weaknesses that attackers figured out decades ago. Let’s walk through both.
Problem 1: rainbow tables
What happens if an attacker sits down ahead of time and pre-computes the hashes of millions of common passwords? Like, just writes them all down?
hash("password") → "5e884898da..."
hash("123456") → "8d969eef6e..."
hash("hunter2") → "a3f5b7c9d2..."
hash("qwerty") → "d8578edf85..."
... and millions more This is called a rainbow table. It is basically a giant reverse lookup. Now if the attacker breaches your database and sees a hash, they just check it against their table. If it matches any entry, they know the original password. They do not need to reverse the hash mathematically. They just look it up.
Rainbow tables for common hash functions like MD5 and SHA-256 are freely available for download. This is not hypothetical. This is “five minutes of work” territory.
Problem 2: duplicate passwords show up as duplicates
If two users both pick "password123" as their password, they both get the exact same hash. An attacker who cracks one has cracked both. And worse, even without cracking anything, the attacker can see which users picked popular passwords just by looking for repeated hash values in your database.
So plain hashing is not enough. We need something more.
Salt to the rescue
A salt is a random string that you generate uniquely for each user. Before hashing, you mash it together with the password:
User 1: hash("randomsalt1" + "password123") → "9f2a3b..."
User 2: hash("randomsalt2" + "password123") → "7c8d1e..." Same password. Different salts. Completely different hashes.
This fixes both problems at once. Rainbow tables become useless because the attacker would need a separate pre-computed table for every possible salt, which is computationally absurd. And two users with the same password no longer produce matching hashes, so the attacker cannot spot duplicates.
One thing that confuses beginners: the salt is not a secret. It gets stored right next to the hash, in plain view. Its job is not to be hidden. Its job is to be unique, so that every hash is unique even when the underlying passwords repeat. Think of it as a random seasoning that you sprinkle on each password to make sure no two end up tasting the same.
One more thing: we want it to be slow
Regular hash functions like SHA-256 are designed to be fast. Fast is great for things like file checksums, where you want to hash gigabytes of data quickly. Fast is terrible for passwords. A fast hash means an attacker, even with only the hashes, can try billions of password guesses per second on a modern GPU. Given that people pick bad passwords, “billions per second” cracks a lot of them.
Password-specific hashing algorithms solve this by being intentionally slow. They do not just hash once. They loop the hash function thousands (or hundreds of thousands) of times, so computing even one hash takes something like 100 milliseconds. That is imperceptible to a human logging in, but it is a crushing slowdown for an attacker trying to brute-force a billion guesses. This setting is called a work factor or cost factor, and the beautiful part is that you can turn it up as computers get faster.
This is why we use algorithms like bcrypt, scrypt, and Argon2 for passwords, not SHA-256. They are designed specifically for this job: slow, salted automatically, and tunable.
Summary
Here is the full picture in one table:
| Approach | Breach impact |
|---|---|
| Plaintext | All passwords immediately exposed |
| Encrypted | All passwords exposed if the attacker finds the key |
| Simple hash (no salt) | Vulnerable to rainbow tables and reveals duplicate passwords |
| Salted slow hash (bcrypt) | Each password must be brute-forced individually, and it is extremely slow to do so |
The last row is what we actually do in production, and specifically, what we are going to do in the next lesson. We are going to install bcrypt, hash a real password, and verify it. Time to stop talking about this in the abstract and actually use it.
Why is encryption not suitable for storing passwords?
What does a salt do?