Hashing vs Encryption vs Encoding
Understand the differences between these terms
In software development, hashing, encryption, and encoding are used a lot in different scenarios. Getting confused with these concepts may lead to falling into security traps. After reading this post, I expect you will understand the difference between them.
Okay, let’s get into it.
Encoding
An encoding algorithm is used to convert a message from A format to B format and later the receiver can convert B format back to A format. One of the popular encoding algorithms is Base64 which is famous in software development for converting data to a better format. This type of algorithm cannot be used for securing data, because the receiver can revert the encoded text to plaintext by using the same algorithm. In JWT implementation, the token is encoded using the Base64 algorithm. (Check out What is JWT?) Basically, the purpose of encoding is to transform the original message into a system-readable format.
Hashing
A hashing algorithm is a one-way function to convert a variable-length text to a fixed-length text and the hash value cannot be reverted reversely. It may have a chance to get the same value from different plaintext. That said, the hashing algorithm is possible to have hash collision but the possibility is low.
Also, since it is a one-way function, people cannot revert the hash value which means it is more secure. So this type of algorithm can be used to secure your data. People use a hashing algorithm to hash the password or other sensitive data that does not need to be revealed as plaintext.
When choosing a hash algorithm, you should choose the one that has a slow speed because it can secure your data better when the attacker is trying to crack your hash value. The attacker needs to take more time to crack your value if the hashing algorithm is slow enough.
Using hashing algorithm can also ensure data integrity since you can know whether the data get altered or not by looking into the hash value. So it is popular to use a hashing algorithm for data signature. In Designing Authentication for Your Public API Platform, we demonstrated how to use HMAC for data signature and to ensure data integrity.
Remark: MD5 is not recommended here because it can be cracked easily.
Side Topic: How an attacker crack the hash value?
Since it is a one-way function, the attackers cannot use the hash value to calculate the actual value reversely. What they can do is to build a dictionary or rainbow table to lookup the hash value and get the actual value.
Encryption
An encryption algorithm is a method that converts the original data to an unreadable format and makes it hard to decrypt if you don’t have a secret key. The main goal of encryption is to offer data confidentiality to you. It can protect your data from being read by unauthorized parties. In encryption, there are two main categories:
- Symmetric Encryption These algorithms use a pre-shared key to encrypt or decrypt the
data and both parties use the same key for encryption and decryption. That said, before
transferring data, both parties need to share their agreed secret key through any channel. Look at
the diagram above, if Ray wants to send something to Ken, it needs to go through these steps:
- Ray encrypts the plaintext using the pre-shared key
- Encrypted message sent to Ken
- Ken decrypts the encrypted message using the pre-shared key
- Ken gets the plaintext Using symmetric encryption, you need to take care of the key delivery because if your pre-shared key leak, your message delivery will not be secured anymore. Anyone can use the key to encrypt or decrypt the message since both parties use the same key.
- Asymmetric Encryption These algorithms use a private and public key pair to encrypt and
decrypt the data and both parties use a different key for encryption and decryption. The key
pair is generated by complex mathematics formula.
RSA is an example of this type of algorithm
and it is hard to crack the keys due to the complex math calculation. According to the diagram
above, if Ray wants to send something to Ken, it needs to go through these steps:
- Ray uses Ken’s public key to encrypt the plaintext
- Encrypted message sent to Ken
- Ken uses his private key to decrypt the encrypted message
- Ken gets the plaintext The reason why we use Ken’s public key for encryption is we want to ensure only Ken can decrypt the message by using his private key. If Ray uses his private key to encrypt the message, anyone who has Ray’s public key can decrypt the message. That said, if we want to ensure only Ken can read the message, we should use Ken’s public key to encrypt the message so that only Ken can decrypt the message via his private key. If we want to ensure that the message is sent by Ken, he should use his private key to encrypt the message so that other people can use Ken’s public key to decrypt the message. Since Ken uses his private key to encrypt the message, we can know the message sender is Ken. *(Remark: Private key is held by Ken only unless it gets leaked)*
Conclusion
To sum up, encoding is used for converting data to a system-readable format and it is not for security purposes. Hashing is a one-way function. Since it is irreversible so it is a secure method for converting data and is mostly used to ensure data integrity. Encryption is a secure method to ensure only authorized parties can reveal the plaintext by using keys and is used to ensure data confidentiality.