,

Introduction to Cryptography

Task 1 Introduction

The purpose of this room is to introduce users to basic cryptography concepts such as:

  • Symmetric encryption, such as AES
  • Asymmetric encryption, such as RSA
  • Diffie-Hellman Key Exchange
  • Hashing
  • PKI

Suppose you want to send a message that no one can understand except the intended recipient. How would you do that?

Top Secret Document with the words WUB KDFN PH

One of the simplest ciphers is the Caesar cipher, used more than 2000 years ago. Caesar Cipher shifts the letter by a fixed number of places to the left or to the right. Consider the case of shifting by 3 to the right to encrypt, as shown in the figure below.

Illustration of Caesar cipher encryption

The recipient needs to know that the text was shifted by 3 to the right to recover the original message.

Illustration of Caesar cipher decryption

Using the same key to encrypt โ€œTRY HACK MEโ€, we get โ€œWUB KDFN PHโ€.

The Caesar Cipher that we have described above can use a key between 1 and 25. With a key of 1, each letter is shifted by one position, where A becomes B, and Z becomes A. With a key of 25, each letter is shifted by 25 positions, where A becomes Z, and B becomes A. A key of 0 means no change; moreover, a key of 26 will also lead to no change as it would lead to a full rotation. Consequently, we conclude that Caesar Cipher has a keyspace of 25; there are 25 different keys that the user can choose from.

Consider the case where you have intercepted a message encrypted using Caesar Cipher: โ€œYMNX NX FQUMF GWFAT HTSYFHYNSL YFSLT MTYJQ RNPJโ€. We are asked to decrypt it without knowledge of the key. We can attempt this by using brute force, i.e., we can try all the possible keys and see which one makes the most sense. In the following figure, we noticed that key being 5 makes the most sense, โ€œTHIS IS ALPHA BRAVO CONTACTING TANGO HOTEL MIKE.โ€

Decrypting a ciphertext by trying all possible keys, i.e., by brute force

Caesar cipher is considered a substitution cipher because each letter in the alphabet is substituted with another.

Another type of cipher is called transposition cipher, which encrypts the message by changing the order of the letters. Letโ€™s consider a simple transposition cipher in the figure below. We start with the message, โ€œTHIS IS ALPHA BRAVO CONTACTING TANGO HOTEL MIKEโ€, and the key 42351. After we write the letters of our message by filling one column after the other, we rearrange the columns based on the key and then read the rows. In other words, we write by columns and we read by rows. Also notice that we ignored all the space in the plaintext in this example.  The resulting ciphertext โ€œNPCOTGHOTHโ€ฆโ€ is read one row after the other. In other words, a transposition cipher simply rearranges the order of the letters, unlike the substitution cipher, which substitutes the letters without changing their order.

Illustration of a transposition cipher

This task introduced simple substitution and transposition ciphers and applied them to messages made of alphabetic characters. For an encryption algorithm to be considered secure, it should be infeasible to recover the original message, i.e., plaintext. (In mathematical terms, we need a hard problem, i.e., a problem that cannot be solved in polynomial time. A problem that we can solve in polynomial time is a problem thatโ€™s feasible to solve even for large input, although it might take the computer quite some time to finish.)

If the encrypted message can be broken in one week, the encryption used would be considered insecure. However, if the encrypted message can be broken in 1 million years, the encryption would be considered practically secure.

Consider the mono-alphabetic substitution cipher, where each letter is mapped to a new letter. For example, in English, you would map โ€œaโ€ to one of the 26 English letters, then you would map โ€œbโ€ to one of the remaining 25 English letters, and then map โ€œcโ€ to one of the remaining 24 English letters, and so on.

For example, we might choose the letters in the alphabet โ€œabcdefghijklmnopqrstuvwxyzโ€ to be mapped to โ€œxpatvrzyjhecsdikbfwunqgmolโ€ respectively. In other words, โ€œaโ€ becomes โ€œxโ€, โ€œbโ€ becomes โ€œpโ€, and so on. The recipient needs to know the key, โ€œxpatvrzyjhecsdikbfwunqgmolโ€, to decrypt the encrypted messages successfully.

This algorithm might look very secure, especially since trying all the possible keys is not feasible. However, different techniques can be used to break a ciphertext using such an encryption algorithm. One weakness of such an algorithm is letter frequency. In English texts, the most common letters are โ€˜eโ€™, โ€˜tโ€™, and โ€˜aโ€™, as they appear at a frequency of 13%, 9.1%, and 8.2%, respectively. Moreover, in English texts, the most common first letters are โ€˜tโ€™, โ€˜aโ€™, and โ€˜oโ€™, as they appear at 16%, 11.7% and 7.6%, respectively. Add to this the fact that most of the message words are dictionary words, and you will be able to break an encrypted text with the alphabetic substitution cipher in no time.

We donโ€™t really need to use the encryption key to decrypt the received ciphertext, โ€œUyv sxd gyi siqvw x sinduxjd pvzjdw po axffojdz xgxo wsxcc wuidvw.โ€ As shown in the figure below, using a website such as quipqiup, it will take a moment to discover that the original text was โ€œThe man who moves a mountain begins by carrying away small stones.โ€ This example clearly indicates that this algorithm is broken and should not be used for confidential communication.

Screenshot of the quipquip website

Task 2ย ย Symmetric Encryption

Letโ€™s review some terminology:

  • Cryptographic Algorithm or Cipher: This algorithm defines the encryption and decryption processes.
  • Key: The cryptographic algorithm needs a key to convert the plaintext into ciphertext and vice versa.
  • plaintext is the original message that we want to encrypt
  • ciphertext is the message in its encrypted form

A symmetric encryption algorithm uses the same key for encryption and decryption. Consequently, the communicating parties need to agree on a secret key before being able to exchange any messages.

In the following figure, the sender provides the encrypt process with the plaintext and the key to get the ciphertext. The ciphertext is usually sent over some communication channel.

General block diagram of encryption using a secret key

On the other end, the recipient provides the decrypt process with the same key used by the sender to recover the original plaintext from the received ciphertext. Without knowledge of the key, the recipient wonโ€™t be able to recover the plaintext.

General block diagram of decryption using a secret key

National Institute of Standard and Technology (NIST) published the Data Encryption Standard (DES) in 1977. DES is a symmetric encryption algorithm that uses a key size of 56 bits. In 1997, a challenge to break a message encrypted using DES was solved. Consequently, it was demonstrated that it had become feasible to use a brute-force search to find the key and break a message encrypted using DES. In 1998, a DES key was broken in 56 hours. These cases indicated that DES could no longer be considered secure.

NIST published the Advanced Encryption Standard (AES) in 2001. Like DES, it is a symmetric encryption algorithm; however, it uses a key size of 128, 192, or 256 bits, and it is still considered secure and in use today. AES repeats the following four transformations multiple times:

  1. SubBytes(state): This transformation looks up each byte in a given substitution table (S-box) and substitutes it with the respective value. The state is 16 bytes, i.e., 128 bits, saved in a 4 by 4 array.
  2. ShiftRows(state): The second row is shifted by one place, the third row is shifted by two places, and the fourth row is shifted by three places. This is shown in the figure below.
  3. MixColumns(state): Each column is multiplied by a fixed matrix (4 by 4 array).
  4. AddRoundKey(state): A round key is added to the state using the XOR operation.
Illustration of the ShiftRows function when applied on a four by four array

The total number of transformation rounds depends on the key size.

Donโ€™t worry if you find this cryptic because it is! Our purpose is not to learn the details of how AES works nor to implement it as a programming library; the purpose is to appreciate the difference in complexity between ancient encryption algorithms and modern ones. If you are curious to dive into details, you can check the AES specifications, including pseudocode and examples in its published standard, FIPS PUB 197.

In addition to AES, many other symmetric encryption algorithms are considered secure. Here is a list of symmetric encryption algorithms supported by GPG (GnuPG) 2.37.7, for example:

Encryption AlgorithmNotes
AES, AES192, and AES256AES with a key size of 128, 192, and 256 bits
IDEAInternational Data Encryption Algorithm (IDEA)
3DESTriple DES (Data Encryption Standard) and is based on DES. We should note that 3DES will be deprecated in 2023 and disallowed in 2024.
CAST5Also known as CAST-128. Some sources state that CASE stands for the names of its authors: Carlisle Adams and Stafford Tavares.
BLOWFISHDesigned by Bruce Schneier
TWOFISHDesigned by Bruce Schneier and derived from Blowfish
CAMELLIA128, CAMELLIA192, and CAMELLIA256Designed by Mitsubishi Electric and NTT in Japan. Its name is derived from the flower camellia japonica.

All the algorithms mentioned so far are block cipher symmetric encryption algorithms. A block cipher algorithm converts the input (plaintext) into blocks and encrypts each block. A block is usually 128 bits. In the figure below, we want to encrypt the plaintext โ€œTANGO HOTEL MIKEโ€, a total of 16 characters. The first step is to represent it in binary. If we use ASCII, โ€œTโ€ is 0x54 in hexadecimal format, โ€œAโ€ is 0x41, and so on. Every two hexadecimal digits constitute 8 bits and represent one byte. A block of 128 bits is practically 16 bytes and is represented in a 4 by 4 array. The 128-bit block is fed as one unit to the encryption method.

Example of a block cipher encryption algorithm applied on a four by four array

The other type of symmetric encryption algorithm is stream ciphers, which encrypt the plaintext byte by byte. Consider the case where we want to encrypt the message โ€œTANGO HOTEL MIKEโ€; each character needs to be converted to its binary representation. If we use ASCII, โ€œTโ€ is 0x54 in hexadecimal, while โ€œAโ€ is 0x41, and so on. The encryption method will process one byte at a time. This is represented in the figure below.

Example of a stream cipher encryption algorithm applied on an array of bytes

Symmetric encryption solves many security problems discussed in the Security Principles room. Letโ€™s say that Alice and Bob met and chose an encryption algorithm and agreed on a specific key. We assume that the selected encryption algorithm is secure and that the secret key is kept safe. Letโ€™s take a look at what we can achieve:

  • Confidentiality: If Eve intercepted the encrypted message, she wouldnโ€™t be able to recover the plaintext. Consequently, all messages exchanged between Alice and Bob are confidential as long as they are sent encrypted.
  • Integrity: When Bob receives an encrypted message and decrypts it successfully using the key he agreed upon with Alice, Bob can be sure that no one could tamper with the message across the channel. When using secure modern encryption algorithms, any minor modification to the ciphertext would prevent successful decryption or would lead to gibberish as plaintext.
  • Authenticity: Being able to decrypt the ciphertext using the secret key also proves the authenticity of the message because only Alice and Bob know the secret key.

We are just getting started, and we know how to maintain confidentiality, check the integrity and ensure the authenticity of the exchanged messages. More practical and efficient approaches will be presented in later tasks. The question, for now, is whether this is scalable.

With Alice and Bob, we needed one key. If we have Alice, Bob, and Charlie, we need three keys: one for Alice and Bob, another for Alice and Charlie, and a third for Bob and Charlie. However, the number of keys grows quickly; communication between 100 users requires almost 5000 different secret keys. (If you are curious about the mathematics behind it, thatโ€™s 99 + 98 + 97 + โ€ฆ + 1 = 4950).

Moreover, if one system gets compromised, they need to create new keys to be used with the other 99 users. Another problem would be finding a secure channel to exchange the keys with all the other users. Obviously, this quickly grows out of hand.

In the next task, we will cover asymmetric encryption. One of the problems solved with asymmetric encryption is when 100 users only need to share a total of 100 keys to communicate securely. (As explained earlier, symmetric encryption would require around 5000 keys to secure the communications for 100 users.)

There are many programs available for symmetric encryption. We will focus on two, which are widely used for asymmetric encryption as well:

  • GNU Privacy Guard
  • OpenSSL Project

GNU Privacy Guard

The GNU Privacy Guard, also known as GnuPG or GPG, implements the OpenPGP standard.

We can encrypt a file using GnuPG (GPG) using the following command:

gpg --symmetric --cipher-algo CIPHER message.txt, where CIPHER is the name of the encryption algorithm. You can check supported ciphers using the command gpg --version. The encrypted file will be saved as message.txt.gpg.

The default output is in the binary OpenPGP format; however, if you prefer to create an ASCII armoured output, which can be opened in any text editor, you should add the option --armor. For example, gpg --armor --symmetric --cipher-algo CIPHER message.txt.

You can decrypt using the following command:

gpg --output original_message.txt --decrypt message.gpg

OpenSSL Project

The OpenSSL Project maintains the OpenSSL software.

We can encrypt a file using OpenSSL using the following command:

openssl aes-256-cbc -e -in message.txt -out encrypted_message

We can decrypt the resulting file using the following command:

openssl aes-256-cbc -d -in encrypted_message -out original_message.txt

To make the encryption more secure and resilient against brute-force attacks, we can add -pbkdf2 to use the Password-Based Key Derivation Function 2 (PBKDF2); moreover, we can specify the number of iterations on the password to derive the encryption key using -iter NUMBER. To iterate 10,000 times, the previous command would become:

openssl aes-256-cbc -pbkdf2 -iter 10000 -e -in message.txt -out encrypted_message

Consequently, the decryption command becomes:

openssl aes-256-cbc -pbkdf2 -iter 10000 -d -in encrypted_message -out original_message.txt

In the following questions, we will use gpg and openssl on the AttackBox to carry out symmetric encryption.

The necessary files for this task are located underย /root/Rooms/cryptographyintro/task02.ย The zip file attached to this task can be used to tackle the questions of tasks 2, 3, 4, 5, and 6.

Task 3ย ย Asymmetric Encryption

Symmetric encryption requires the users to find a secure channel to exchange keys. By secure channel, we are mainly concerned with confidentiality and integrity. In other words, we need a channel where no third party can eavesdrop and read the traffic; moreover, no one can change the sent messages and data.

Asymmetric encryption makes it possible to exchange encrypted messages without a secure channel; we just need a reliable channel. By reliable channel, we mean that we are mainly concerned with the channelโ€™s integrity and not confidentiality.

When using an asymmetric encryption algorithm, we would generate a key pair: a public key and a private key. The public key is shared with the world, or more specifically, with the people who want to communicate with us securely. The private key must be saved securely, and we must never let anyone access it. Moreover, it is not feasible to derive the private key despite the knowledge of the public key.

How does this key pair work?

If a message is encrypted with one key, it can be decrypted with the other. In other words:

  • If Alice encrypts a message using Bobโ€™s public key, it can be decrypted only using Bobโ€™s private key.
  • Reversely, if Bob encrypts a message using his private key, it can only be decrypted using Bobโ€™s public key.

Confidentiality

We can use asymmetric encryption to achieve confidentiality by encrypting the messages using the recipientโ€™s public key. In the following two figures, we can see that:

Alice wants to ensure confidentiality in her communication with Bob. She encrypts the message using Bobโ€™s public key, and Bob decrypts them using his private key. Bobโ€™s public key is expected to be published on a public database or on his website, for instance.

When using asymmetric encryption, Alice encrypts the messages using Bob's public key before sending them to Bob. Bob decrypts the messages using his private key.

When Bob wants to reply to Alice, he encrypts his messages using Aliceโ€™s public key, and Alice can decrypt them using her private key.

When using asymmetric encryption, Bob encrypts the messages using Alice's public key before sending them to Alice. Alice decrypts the messages using her private key.

In other words, it becomes easy to communicate with Alice and Bob while ensuring the confidentiality of the messages. The only requirement is that all parties have their public keys available for interested senders.

Note: In practice, symmetric encryption algorithms allow faster operations than asymmetric encryption; therefore, we will cover later how we can use the best of both worlds.

Integrity, Authenticity, and Nonrepudiation

Beyond confidentiality, asymmetric encryption can solve integrity, authenticity and nonrepudiation. Letโ€™s say that Bob wants to make a statement and wants everyone to be able to confirm that this statement indeed came from him. Bob needs to encrypt the message using his private key; the recipients can decrypt it using Bobโ€™s public key. If the message decrypts successfully with Bobโ€™s public key, it means that the message was encrypted using Bobโ€™s private key. (In practice, he would encrypt a hash of the original message. We will elaborate on this later.)

Being decrypted successfully using Bobโ€™s public key leads to a few interesting conclusions.

  • First, the message was not altered across the way (communication channel); this proves the message integrity.
  • Second, knowing that no one has access to Bobโ€™s private key, we can be sure that this message did indeed come from Bob; this proves the message authenticity.
  • Finally, because no one other than Bob has access to Bobโ€™s private key, Bob cannot deny sending this message; this establishes nonrepudiation.
To prove authenticity using asymmetric encryption, Bob encrypts the message using his private key and the recipients can decrypt it using Bobโ€™s public key.

We have seen how asymmetric encryption can help establish confidentiality, integrity, authenticity, and nonrepudiation. In real-life scenarios, asymmetric encryption can be relatively slow to encrypt large files and vast amounts of data. In another task, we will see how we can use asymmetric encryption in conjunction with symmetric encryption to achieve these security objectives relatively faster.

RSA

RSA got its name from its inventors, Rivest, Shamir, and Adleman. It works as follows:

  1. Choose two random prime numbers, p and q. Calculate Nโ€„=โ€„pโ€…ร—โ€…q.
  2. Choose two integers e and d such that eโ€…ร—โ€…dโ€„=โ€„1 mod ฯ•(N), where ฯ•(N)โ€„=โ€„Nโ€…โˆ’โ€…pโ€…โˆ’โ€…qโ€…+โ€…1. This step will let us generate the public key (N,e) and the private key (N,d).
  3. The sender can encrypt a value x by calculating yโ€„=โ€„xe mod N. (Modulus)
  4. The recipient can decrypt y by calculating xโ€„=โ€„yd mod N. Note that ydโ€„=โ€„xedโ€„=โ€„xkฯ•(N)โ€…+โ€…1โ€„=โ€„(xฯ•(N))kโ€…ร—โ€…xโ€„=โ€„x. This step explains why we put a restriction on the choice of e and d.

Donโ€™t worry if the above mathematical equations looked too complicated; you donโ€™t need mathematics to be able to use RSA, as it is readily available via programs and programming libraries.

RSA security relies on factorization being a hard problem. It is easy to multiply p by q; however, it is time-consuming to find p and q given N. Moreover, for this to be secure, p and q should be pretty large numbers, for example, each being 1024 bits (thatโ€™s a number with more than 300 digits). It is important to note that RSA relies on secure random number generation, as with other asymmetric encryption algorithms. If an adversary can guess p and q, the whole system would be considered insecure.

Letโ€™s consider the following practical example.

  1. Bob chooses two prime numbers: pโ€„=โ€„157 and qโ€„=โ€„199. He calculates Nโ€„=โ€„31243.
  2. With ฯ•(N)โ€„=โ€„Nโ€…โˆ’โ€…pโ€…โˆ’โ€…qโ€…+โ€…1โ€„=โ€„31243โ€…โˆ’โ€…157โ€…โˆ’โ€…199โ€…+โ€…1โ€„=โ€„30888, Bob selects eโ€„=โ€„163 and dโ€„=โ€„379 where eโ€…ร—โ€…dโ€„=โ€„163โ€…ร—โ€…379โ€„=โ€„61777 and 61777 mod 30888โ€„=โ€„1. The public key is (31243,163) and the private key is (31243,379).
  3. Letโ€™s say that the value to encrypt is xโ€„=โ€„13, then Alice would calculate and send yโ€„=โ€„xe mod Nโ€„=โ€„13163 mod 31243โ€„=โ€„16342.
  4. Bob will decrypt the received value by calculating xโ€„=โ€„yd mod Nโ€„=โ€„16341379 mod 31243โ€„=โ€„13.

The previous example was to understand the mathematics behind it better. To see real values forย pย andย q, letโ€™s create a real keypair using a tool such asย openssl.

We executed three commands:

  • openssl genrsa -out private-key.pem 2048: With openssl, we used genrsa to generate an RSA private key. Using -out, we specified that the resulting private key is saved as private-key.pem. We added 2048 to specify a key size of 2048 bits.
  • openssl rsa -in private-key.pem -pubout -out public-key.pem: Using openssl, we specified that we are using the RSA algorithm with the rsa option. We specified that we wanted to get the public key using -pubout. Finally, we set the private key as input using -in private-key.pem and saved the output using -out public-key.pem.
  • openssl rsa -in private-key.pem -text -noout: We are curious to see real RSA variables, so we used -text -noout. The values of pqNe, and d are prime1prime2moduluspublicExponent, and privateExponent, respectively.

If we already have the recipientโ€™s public key, we can encrypt it with the command openssl pkeyutl -encrypt -in plaintext.txt -out ciphertext -inkey public-key.pem -pubin

The recipient can decrypt it using the commandย openssl pkeyutl -decrypt -in ciphertext -inkey private-key.pemย -out decrypted.txt

Task 4ย ย Diffie-Hellman Key Exchange

Alice and Bob can communicate over an insecure channel. By insecure, we mean that there are eavesdroppers who can read the messages exchanged on this channel. How can Alice and Bob agree on a secret key in such a setting? One way would be to use the Diffie-Hellman key exchange.

Diffie-Hellman is an asymmetric encryption algorithm. It allows the exchange of a secret over a public channel. We will skip the modular arithmetic background and provide a simple numeric example. We will need two mathematical operations: power and modulus. xp, i.e., x raised to the power p, is x multiplied by itself p times. Furthermore, x mod m, i.e., x modulus m, is the remainder of the division of x by m.

  1. Alice and Bob agree on q and g. For this to work, q should be a prime number, and g is a number smaller than q that satisfies certain conditions. (In modular arithmetic, g is a generator.) In this example, we take qโ€„=โ€„29 and gโ€„=โ€„3.
  2. Alice chooses a random number a smaller than q. She calculates Aโ€„=โ€„(ga) mod q. The number a must be kept a secret; however, A is sent to Bob. Letโ€™s say that Alice picks the number aโ€„=โ€„13 and calculates Aโ€„=โ€„313%29โ€„=โ€„19 and sends it to Bob.
  3. Bob picks a random number b smaller than q. He calculates Bโ€„=โ€„(gb) mod q. Bob must keep b a secret; however, he sends B to Alice. Letโ€™s consider the case where Bob chooses the number bโ€„=โ€„15 and calculates Bโ€„=โ€„315%29โ€„=โ€„26. He proceeds to send it to Alice.
  4. Alice receives B and calculates keyโ€„=โ€„Ba mod q. Numeric example keyโ€„=โ€„2613 mod 29โ€„=โ€„10.
  5. Bob receives A and calculates keyโ€„=โ€„Ab mod q. Numeric example keyโ€„=โ€„1915 mod 29โ€„=โ€„10.

We can see that Alice and Bob reached the same key.

Although an eavesdropper has learned the values of q,โ€†g,โ€†A, and B, they wonโ€™t be able to calculate the secret key that Alice and Bob have exchanged. The above steps are summarized in the figure below.

Graphical illustration showing a numeric example of the five steps of Diffie-Hellman

Although the numbers we have chosen make it easy to find a and b, even without using a computer, real-world examples would select a q of 256 bits in length. In decimal numbers, thatโ€™s 115 with 75 zeroes to its right (I donโ€™t know how to read that either, but I was told it is read as 115 quattuorvigintillion). Such a large q will make it infeasible to find a or b despite knowledge of q,โ€†g,โ€†A, and B.

Letโ€™s take a look at actual Diffie-Hellman parameters. We can use openssl to generate them; we need to specify the option dhparam to indicate that we want to generate Diffie-Hellman parameters along with the specified size in bits, such as 2048 or 4096.

In the console output below, we can view the prime numberย Pย and the generatorย Gย using the commandย openssl dhparam -in dhparams.pem -text -noout. (This is similar to what we did with the RSA private key.)

Diffie-Hellman key exchange algorithm allows two parties to agree on a secret over an insecure channel. However, the discussed key exchange is prone to a Man-in-the-Middle (MitM) attack; an attacker might reply to Alice pretending to be Bob and reply to Bob pretending to be Alice. We discuss a solution to this problem in Task 6.

Task 5ย ย Hashing

A cryptographic hash function is an algorithm that takes data of arbitrary size as its input and returns a fixed size value, called message digest or checksum, as its output. For example, sha256sum calculates the SHA256 (Secure Hash Algorithm 256) message digest. SHA256, as the name indicates, returns a checksum of size 256 bits (32 bytes). This checksum is usually written using hexadecimal digits. Knowing that a hexadecimal digit represents 4 bits, the 256 bits checksum can be represented as 64 hexadecimal digits.

In the terminal output below, we calculate the SHA256 hash values for three files of varying sizes: 4 bytes, 275 MB, and 5.2 GB. Usingย sha256sumย to calculate the message digest for each of the three files, we get three completely different values that appear random. It is worth stressing that the length of the resulting message digest or checksum is the same, no matter how small or big the file is. In particular, the four-byte fileย abc.txtย and the 5.2 GB file resulted in message digests of equal length independent of the file size.

But why would we need such a function? There are many uses, in particular:

  • Storing passwords: Instead of storing passwords in plaintext, a hash of the password is stored instead. Consequently, if a data breach occurs, the attacker will get a list of password hashes instead of the original passwords. (In practice, passwords are also โ€œsaltedโ€, as discussed in a later task.)
  • Detecting modifications: Any minor modification to the original file would lead to a drastic change in hash value, i.e. checksum.

In the following terminal output, we have two files,ย text1.txtย andย text2.txt, which are almost identical except for (literally) one bit being different; the lettersย Tย andย tย are different in one bit in their ASCII representation. Even though we have flipped only a single bit, it is evident that the SHA256 checksums are entirely different. Consequently, if we use a secure hash function algorithm, we can easily confirm whether any modifications have taken place. This can help protect against both intentional tampering and file transfer errors.

Some of the hashing algorithms in use and still considered secure are:

  • SHA224, SHA256, SHA384, SHA512
  • RIPEMD160

Some older hash functions, such as MD5 (Message Digest 5) and SHA-1, are cryptographically broken. By broken, we mean that it is possible to generate a different file with the same checksum as a given file. This means that we can create a hash collision. In other words, an attacker can create a new message with a given checksum, and detecting file or message tampering wonโ€™t be possible.

HMAC

Hash-based message authentication code (HMAC) is a message authentication code (MAC) that uses a cryptographic key in addition to a hash function.

According to RFC2104, HMAC needs:

  • secret key
  • inner pad (ipad) a constant string. (RFC2104 uses the byte 0x36 repeated B times. The value of B depends on the chosen hash function.)
  • outer pad (opad) a constant string. (RFC2104 uses the byte 0x5C repeated B times.)

Calculating the HMAC follows the following steps as shown in the figure:

  1. Append zeroes to the key to make it of length B, i.e., to make its length match that of the ipad.
  2. Using bitwise exclusive-OR (XOR), represented by โŠ•, calculate keyโ€…โŠ•โ€…ipad.
  3. Append the message to the XOR output from step 2.
  4. Apply the hash function to the resulting stream of bytes (in step 3).
  5. Using XOR, calculate keyโ€…โŠ•โ€…opad.
  6. Append the hash function output from step 4 to the XOR output from step 5.
  7. Apply the hash function to the resulting stream of bytes (in step 6) to get the HMAC.
Graphical illustration showing how an HMAC is calculated

The figure above represents the steps expressed in the following formula: H(KโŠ•opad,H(KโŠ•ipad,text)).

To calculate the HMAC on a Linux system, you can use any of the available tools such asย hmac256ย (orย sha224hmac,ย sha256hmac,ย sha384hmac, andย sha512hmac, where the secret key is added after the optionย --key). Below we show an example of calculating the HMAC usingย hmac256ย andย sha256hmacย with two different keys.

Task 6ย ย PKI and SSL/TLS

Using a key exchange such as the Diffie-Hellman key exchange allows us to agree on a secret key under the eyes and ears of eavesdroppers. This key can be used with a symmetric encryption algorithm to ensure confidential communication. However, the key exchange we described earlier is not immune to Man-in-the-Middle (MITM) attack. The reason is that Alice has no way of ensuring that she is communicating with Bob, and Bob has no way of ensuring that he is communicating with Alice when exchanging the secret key.

Consider the figure below. It is an attack against the key exchange explained in the Diffie-Hellman Key Exchange task. The steps are as follows:

  1. Alice and Bob agree on q and g. Anyone listening on the communication channel can read these two values, including the attacker, Mallory.
  2. As she would normally do, Alice chooses a random variable a, calculates A ( Aโ€„=โ€„(ga) mod q) and sends A to Bob. Mallory has been waiting for this step, and she has selected a random variable m and calculated the respective M. As soon as Mallory receives A, she sends M to Bob, pretending she is Alice.
  3. Bob receives M thinking that Alice sent it. Bob has already picked a random variable b and calculated the respective B; he sends B to Alice. Similarly, Mallory intercepts the message, reads B and sends M to Alice instead.
  4. Alice receives M and calculates keyโ€„=โ€„Ma mod q.
  5. Bob receives M and calculates keyโ€„=โ€„Mb mod q.

Alice and Bob continue to communicate, thinking that they are communicating directly, unaware that they are communicating with Mallory, who can read and modify the messages before sending them to the intended recipient.

Illustration showing the Man-in-the-Middle attack against Diffie-Hellman Key Exchange

This susceptibility necessitates some mechanism that would allow us to confirm the other partyโ€™s identity. This brings us to Public Key Infrastructure (PKI).

Consider the case where you are browsing the website example.org over HTTPS. How can you be confident that you are indeed communicating with the example.org server(s)? In other words, how can you be sure that no man-in-the-middle intercepted the packets and altered them before they reached you? The answer lies in the website certificate.

The figure below shows the page we get when browsing example.org. Most browsers represent the encrypted connection with some kind of a lock icon. This lock icon indicates that the connection is secured over HTTPS with a valid certificate.

Screenshot of a browser showing a lock icon for an encrypted connection

At the time of writing, example.org uses a certificate signed by DigiCert Inc., as shown in the figure below. In other words, DigiCert confirms that this certificate is valid (till a certain date).

Screenshot showing the validity of a website certificate

For a certificate to get signed by a certificate authority, we need to:

  1. Generate Certificate Signing Request (CSR): You create a certificate and send your public key to be signed by a third party.
  2. Send your CSR to a Certificate Authority (CA): The purpose is for the CA to sign your certificate. The alternative and usually insecure solution would be to self-sign your certificate.

For this to work, the recipient should recognize and trust the CA that signed the certificate. And as we would expect, our browser trusts DigiCert Inc as a signing authority; otherwise, it would have issued a security warning instead of proceeding to the requested website.

Screenshot showing the certificate authorities trusted by a web browser

You can use openssl to generate a certificate signing request using the command openssl req -new -nodes -newkey rsa:4096 -keyout key.pem -out cert.csr. We used the following options:

  • req -new create a new certificate signing request
  • -nodes save private key without a passphrase
  • -newkey generate a new private key
  • rsa:4096 generate an RSA key of size 4096 bits
  • -keyout specify where to save the key
  • -out save the certificate signing request

Then you will be asked to answer a series of questions, as shown in the console output below.

Once the CSR file is ready, you can send it to a CA of your choice to get it signed and ready to use on your server.

Once the client, i.e., the browser, receives a signed certificate it trusts, the SSL/TLS handshake takes place. The purpose would be to agree on the ciphers and the secret key.

We have just described how PKI applies to the web and SSL/TLS certificates. A trusted third party is necessary for the system to be scalable.

For testing purposes, we have created a self-signed certificate. For example, the following command will generate a self-signed certificate.

openssl req -x509 -newkey -nodes rsa:4096 -keyout key.pem -out cert.pem -sha256 -days 365

The -x509 indicates that we want to generate a self-signed certificate instead of a certificate request. The -sha256 specifies the use of the SHA-256 digest. It will be valid for one year as we added -days 365.

To answer the questions below, you need to inspect the certificate file cert.pem in the task06 directory. You can use the following command to view your certificate:

openssl x509 -in cert.pem -text

Task 7ย ย Authenticating with Passwords

Letโ€™s see how cryptography can help increase password security. With PKI and SSL/TLS, we can communicate with any server and provide our login credentials while ensuring that no one can read our passwords as they move across the network. This is an example of protecting data in transit. Letโ€™s explore how we can safeguard passwords as they are saved in a database, i.e., data at rest.

The least secure method would be to save the username and the password in a database. This way, any data breach would expose the usersโ€™ passwords. No effort is required beyond reading the database containing the passwords.

Usernamepassword
aliceqwerty
bobdragon
charlieprincess

The improved approach would be to save the username and a hashed version of the password in a database. This way, a data breach will expose the hashed versions of the passwords. Since a hash function is irreversible, the attacker needs to keep trying different passwords to find the one that would result in the same hash. The table below shows the MD5 sum of the passwords. (We chose MD5 just to keep the password field small for the example; otherwise, we would have used SHA256 or something more secure.)

UsernameHash(Password)
aliced8578edf8458ce06fbc5bb76a58c5ca4
bob8621ffdbc5698829397d97767ac13db3
charlie8afa847f50a716e64932d995c8e7435a

The previous approach looks secure; however, the availability of rainbow tables has made this approach insecure. A rainbow table contains a list of passwords along with their hash value. Hence, the attacker only needs to look up the hash to recover the password. For example, it would be easy to look up d8578edf8458ce06fbc5bb76a58c5ca4 to discover the original password of alice. Consequently, we need to find more secure approaches to save passwords securely; we can add salt. A salt is a random value we can append to the password before hashing it. An example is shown below.

UsernameHash(Password + Salt)Salt
alice8a43db01d06107fcad32f0bcfa651f2f12742
bobaab2b680e6a1cb43c79180b3d1a38beb22861
charlie3a40d108a068cdc8e7951b82d312129b16056

The table above used hash(password + salt); another approach would be to use hash(hash(password) + salt). Note that we used a relatively small salt along with the MD5 hash function. We should switch to a (more) secure hash function and a large salt for better security if this were an actual setup.

Another improvement we can make before saving the password is to use a key derivation function such as PBKDF2 (Password-Based Key Derivation Function 2). PBKDF2 takes the password and the salt and submits it through a certain number of iterations, usually hundreds of thousands.

We recommend you check theย Password Storage Cheat Sheetย if you like to learn about other techniques related to password storage.

Task 8ย ย Cryptography and Data – Example

In this task, we would like to explore what happens when we log into a website over HTTPS.

  1. Client requests serverโ€™s SSL/TLS certificate
  2. Server sends SSL/TLS certificate to the client
  3. Client confirms that the certificate is valid

Cryptographyโ€™s role starts with checking the certificate. For a certificate to be considered valid, it means it is signed. Signing means that a hash of the certificate is encrypted with the private key of a trusted third party; the encrypted hash is appended to the certificate.

If the third party is trusted, the client will use the third partyโ€™s public key to decrypt the encrypted hash and compare it with the certificateโ€™s hash. However, if the third party is not recognized, the connection will not proceed automatically.

Once the client confirms that the certificate is valid, an SSL/TLS handshake is started. This handshake allows the client and the server to agree on the secret key and the symmetric encryption algorithm, among other things. From this point onward, all the related session communication will be encrypted using symmetric encryption.

The final step would be to provide login credentials. The client uses the encrypted SSL/TLS session to send them to the server. The server receives the username and password and needs to confirm that they match.

Following security guidelines, we expect the server to save a hashed version of the password after appending a random salt to it. This way, if the database were breached, the passwords would be challenging to recover.

Task 9ย ย Conclusion

Cryptography is a vast topic. In this room, we tried to focus on the core concepts that would help you understand the commonly used terms in cryptography. This knowledge is vital to understanding the configuration options of systems that use encryption and hashing.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *