RSA algorithm

RSA is a practical public-key cryptographic algorithm, which is widely used on modern computers to communicate securely over large distances.


The acronym of the algorithm stands for Ron Rivest, Adi Shamir and Leonard Adleman, which first published the algorithm in 1978.

# Algorithm overview

  • Choose `p` and `q` as distinct prime numbers
  • Compute `n` as `n = p*q`
  • Compute `\phi(n)` as `\phi(n) = (p-1) * (q-1)`
  • Choose `e` such that `1 < e < \phi(n)` and `e` and `\phi(n)` are coprime
  • Compute the value of `d` as `d ≡ e^(-1) mod \phi(n)`
  • Public key is `(e, n)`
  • Private key is `(d, n)`
  • The encryption of `m` as `c`, is `c ≡ m^e mod n`
  • The decryption of `c` as `m`, is `m ≡ c^d mod n`

# Generating `p` and `q`

In order to generate a public and a private key, the algorithm requires two distinct prime numbers `p` and `q`, which are randomly chosen and should have, roughly, the same number of bits. By today standards, it is recommended that each prime number to have at least 2048 bits.

In Perl, there is a module for this, called ntheory, created by Dana Jacobsen, which can construct such prime numbers:

use 5.010;
use strict;
use warnings;

use ntheory qw(random_strong_prime);

my $p = random_strong_prime(2048);
my $q = random_strong_prime(2048);

# Computing `n`

Once we have the prime numbers `p` and `q`, we can compute the value of `n`, which is simply `p*q`.

my $n = ($p * $q);

# Computing `\phi(n)`

The next step is to find `\phi(n)`. This function is called Euler's totient function, and it counts how many numbers bellow `n` are relatively prime to `n`.

For example: `\phi(6) = 2`, because there are only two numbers bellow `6` which are relatively prime to `6` (i.e. they don't share any common factors with `6`); those numbers are `1` and `5`.

For a prime number `p`, it is easy to see that `\phi(p) = p-1`, as all the numbers bellow `p` are relatively prime to `p`.

One important property of the `\phi(n)` function, is the fact that it is multiplicative, which means that:

`\phi(p*q) = \phi(p) * \phi(q)`

In our case, as `p` and `q` are prime numbers, it reduces to:

`\phi(p*q) = (p-1) * (q-1)`

which in code is:
my $phi = ($p - 1) * ($q - 1);

# Choosing `e`

Now we can choose a value for the public exponent `e`, which must have the following two properties:

  • `1 < e < \phi(n)`
  • `gcd(e, \phi(n)) = 1`

where `\gcd(a, b)` can be implemented as (using Euclid's algorithm):

sub gcd($$) {
    my ($u, $v) = @_;
    while ($v) {
        ($u, $v) = ($v, $u % $v);
    }
    return abs($u);
}

For efficiency reasons, `e` is commonly chosen to have a short bit-length and a small Hamming weight value. Usually, `e = 2^k + 1`, where `k`, in most cases, is `16`.

In general, we can do:
my $e = 0;
for (my $k = 16 ; gcd($e, $phi) != 1 ; ++$k) {
    $e = 2**$k + 1;
}

When efficiency is not important, we can randomly choose `e` in the interval `[3, \phi(n)-1]`, with the property that `e` is coprime to `\phi(n)`.

One important thing to consider, is the fact that there exists values of `e` and `m`, such that `m ≡ m^e mod n`. To ensure that this happens only for a minimum number of messages, the chosen value of `e` should also satisfy the following two properties:

  • `gcd(e-1, p-1) = 2`
  • `gcd(e-1, q-1) = 2`

For every value of `e`, coprime to `\phi(n)`, that passes the above two tests, there exists only `9` messages `m` (where `0 <= m < n`), for which `m ≡ m^e mod n`.

# Finding `d`

The next thing is to find a value for the private decryption exponent, `d`.

From the definition:

`d ≡ e^(-1) mod \phi(n)`

which is the modular multiplicative inverse of `e` modulo `\phi(n)`, and can efficiently be implemented as:

sub invmod($$) {
    my ($a, $n) = @_;
    my ($t, $nt, $r, $nr) = (0, 1, $n, $a % $n);
    while ($nr != 0) {
        my $quot = int(($r - ($r % $nr)) / $nr);
        ($nt, $t) = ($t - $quot * $nt, $nt);
        ($nr, $r) = ($r - $quot * $nr, $nr);
    }
    return if $r > 1;
    $t += $n if $t < 0;
    return $t;
}


This gives us a value for `d`, as:
my $d = invmod($e, $phi);

# Exporting the keys

From `e`, `d` and `n`, we create the private and public key as following:
  • `(e, n)` is the public key
  • `(d, n)` is the private key

# Encryption

If we have a message `m` (which is a number), we can encrypt it with the public key, as:

`c ≡ m^e mod n`

This operation is called a modular exponentiation, and can efficiently be computed with the following algorithm:

sub expmod($$$) {
    my ($a, $b, $n) = @_;
    my $c = 1;
    do {
        ($c *= $a) %= $n if $b & 1;
        ($a *= $a) %= $n;
    } while ($b >>= 1);
    return $c;
}

For security reasons, it is highly recommended `m` to have roughly the same size in bits as `n`, in particular when `e` is small (with the inequality `1 < m < n-1`).

Just for illustration, if our message `m` is `1234`, we encrypt it as:

my $m = 1234;
my $c = expmod($m, $e, $n);

# Decryption

From the encrypted message `c`, we can retrieve the original message `m`, using our private decryption exponent, `d`:

`m ≡ c^d mod n`

In code, this is:
my $M = expmod($c, $d, $n);

# Creating `m`

A given string-message `s`, can efficiently be converted into a number `m`, by converting the string into a sequence of bits, which are then converted into a number:

use Math::AnyNum;

# String to number
my $s = 'RSA';
my $b = '1'.unpack('b*', $s);
my $m = Math::AnyNum->new($b, 2);

# Number to string
my $S = pack('b*', substr($m->as_bin, 1));

This is just one out of many ways for how this can be done.

# Security

The security of the RSA encryption is based on the assumption that nobody knows how to factor a large semiprime into its original two prime factors in a reasonable amount of time. For example, factoring a 4096-bit semiprime into its 2048-bit prime factors, can take hundreds or thousands of years on a classical computer, therefore the algorithm is considered secure in practice.

However, if we will ever find an efficient algorithm to factor such semiprimes, then the algorithm will no longer be of any use, as once we can factor `n`, we can then compute `d`, which would allow us to decrypt any message that was encrypted with the public key `(e,n)`.

An alternative attack strategy, is trying to find the value of `m` in the following equation:

`c ≡ m^e mod n`

which requires computing the `e`-th root of `c` modulo `n`, where all the values are known, except for `m`.

However, finding the value of `m` in such an equation, for example:

`1653 ≡ m^65537 mod 2279`

it is assumed to be at least as hard as factoring `n` (see example).

# Conclusion

The public-key property of the RSA algorithm, is exactly what is makes it so special. This property allows us to communicate securely with somebody from the other side of the world, without exchanging any private information in advance.

# Implementations

Bellow we have two implementations of the RSA algorithm. This first implementation is a basic example, which can be used for reference, while the second implementation is a little bit more advanced and can be used to encrypt and decrypt small files.

Basic example
https://github.com/trizen/perl-scripts/blob/master/Math/RSA_example.pl

General purpose
https://github.com/trizen/perl-scripts/blob/master/Encryption/RSA_encryption.pl

# See also

Comments

  1. Hi Daniel, you are currently reviewing one of my sequences :) One thing that got me is I see you are using Euler's Totient function above; when the Carmichael Lamba is preferred as the CML is a divisor of the Totient. Here is an RSA implementation I wrote in Python that uses AES for the bulk data. I LOVE your website!! LOVE! You are a level above. https://codereview.stackexchange.com/questions/214375/generate-public-private-keys-encrypt-decrypt-sign-verify/214437#214437

    ReplyDelete

Post a Comment