Special-purpose factorization algorithms

In this post, we will take a brief look at a nice collection of special-purpose factorization algorithms. Most of them old and well-known. Some of them new.

By a "special-purpose factorization algorithm", we mean an algorithm that can find one or more factors of a large integer `n` that satisfies a specific special form, without strongly depending on the size of `n`.

When the special conditions are met, a factor is usually found very fast. On the other hand, when the number does not have the required special form, the algorithm may fail to produce a factor in a reasonable amount of time, or fail entirely.

Special-purpose factorization algorithms are commonly used in general-purpose factorization routines as a pre-factorization test, trying to reduce the size of `n` by finding easy factors, before sending the number to the heavy general-purpose modern machineries, such as the Quadratic Sieve (QS) or the Number Field Sieve (NFS) algorithms, which strongly depend on the size of `n`.

The most commonly used special-purpose factorization algorithms are trial division, Pollard's rho method, Pollard's p-1 method and Lenstra's elliptic-curve factorization method (ECM).

# Perfect power check

If `n = r^k`, for some positive integer `r` and some integer `k >= 2`, then, in order to find the factorization of `n`, we only need to factorize `r` and duplicate each prime factor accordingly.

This can be efficiently checked by iterating over each `k`, starting from `k = floor(log_2(n))`, down to `k=2`, and checking if `n^(1/k)` is an integer.

For checking if `n^(1/k)` is an integer, let `r = floor(n^(1/k))`, which can be computed efficiently using Newton's method. If `n = r^k`, then `n` is a perfect power.

This is not considered a factorization algorithm, but it's an import thing to check when factoring an unknown integer.

# Trial division

The simplest way to try to find a small factor of a given integer `n`, is to set a bound `B`, at most `B = floor(sqrt(n))`, and iterate over the prime numbers `p` in the range `[2, B]` and check if any such `p` divides `n`.

This algorithm is of practical use only for numbers that have at least a small factor (say, less than `10^8`).

If multiple numbers need to be checked for small factors, a simple optimization would be computing the primorial of `B` only once, which is the product of all the primes in the range `[2, B]`, say `P = B#`, and using the pre-computed value of `P` to compute `d = gcd(P, n)` for multiple values of `n`, where `d` is a divisor of `n` (not necessarily prime), or `1` if `n` has no prime factors in the range `[2,B]`.

# Difference of squares

By representing a given integer `n` as a difference of two squares:

`n = a^2 - b^2`

it can then be factorized as:

`n = (a + b) (a - b)`

In Fermat's factorization method, we start by setting `a = floor(sqrt(n))` and checking if `(a+k)^2 - n` is a perfect square, for some `k >= 1`.

This method is effective when `n` has a factor near `sqrt(n)`.

# Congruence of squares

An improvement over Fermat's factorization method, noticed by Maurice Kraitchik, is to try to find a congruence of squares modulo `n`, of the form `x^2 ≡ y^2 mod n`, with the condition that `x ≢ ±y mod n`, which factorizes `n` as:

`n = gcd(x+y, n) * gcd(x-y, n)`

Same as in Fermat's factorization method,  we can try to find a congruence of squares by setting `a = floor(sqrt(n))` and checking if `(a + k)^2 mod n` is a perfect square, for some `k >= 1`.

However, a slightly better approach would be to use the continued fraction convergents `A_k/B_k` of `sqrt(n)` and testing if `A_k^2 mod n` is a perfect square, for some `k >= 1`.

When the continued fraction convergents `A_k/B_k` of `sqrt(n)` are used, we call it the "Pell factorization method", because it uses the same technique as in solving the Pell equation `x^2 - n y^2 = 1`, which has solutions of the form `A_k^2 - n B_k^2 = 1`, for some `k >= 1`.

This method is effective when `n` has a factor near `sqrt(n)`.

The idea of creating a congruence of squares, is used in many modern general-purpose factorization algorithms, including the CFRAC method, which we described in an earlier post.

# Difference of powers

A generalization of Fermat's factorization method, would be to check if `n` has the form `n = a^k ± b^k`, for some positive integers `a` and `b`, with odd `k >= 3`. Then either `gcd(a-b, n)` or `gcd(a+b, n)` is a non-trivial factor of `n`.

This method is based on the observation that, for odd `k >= 3`:

`(a^x - b^y)` divides `(a^(k x) - b^(k y))`
`(a^x + b^y)` divides `(a^(k x) + b^(k y))`

Without loss of generality, let's assume that:

`n = r_1^(e_1) ± r_2^(e_2)`

We can try to find such a representation by fixing `B = floor(log_2(n))` and iterating over each `k` in the range `[2, B]`.

For each `k`, we check if any of the following values is a perfect power (let's call it `r_2^(e_2)`):

`{n - k^x, k^(x+1) - n, n - y^k, (1+y)^k - n}`

where `x = floor(log_(k)(n))` and `y = floor(n^(1/k))`.

For each such perfect power found, we can represent `n` as `n = r_1^(e_1) ± r_2^(e_2)`.

Next, we iterate over each divisor `d_1` of `e_1` and over each divisor `d_2` of `e_2`, checking if `gcd(r_1^(d_1) ± r_2^(d_2), n)` is a non-trivial factor of `n`.

This method is effective for numbers of the form `a^x ± b^y`, where `x` and `y` have many common divisors.

# Congruence of powers

In this method, we generalize Kraitchik's idea to find congruence of powers of the form `a^k ≡ b^k mod n`, for some positive integers `a` and `b`, with odd `k >= 3`. Then either `gcd(a-b, n)` or `gcd(a+b, n)` is a non-trivial factor of `n`.

Same as in the previous method, we try to find such a representation by fixing `B = floor(log_2(n))` and iterating over each `k` in the range `[2, B]`.

For each `k`, we check if either one of the following congruences is true:

`x^k ≡ b^k mod n`

`(x+1)^k ≡ b^k mod n`

where `x = floor(n^(1/k))`.

For each such congruence found, we have `a^k ≡ b^k mod n`, where either `gcd(a + b, n)` or `gcd(a - b, n)` may be a non-trivial factor of `n`.

# Cyclotomic polynomials

For some numbers, we can also use the cyclotomic polynomials `Φ_n(x)` for finding factors of special form.

Given a composite integer `n` and a fixed value of `b >= 2`, we iterate over each `k` in the range `2 <= k <= floor(1+\log_b(n))`, checking if `gcd(Φ_k(b), n)` is a non-trivial factor of `n`.

In practice, we can check each value of `b` in the range `2 <= b <= floor(log_2(n))`.

# HOLF method

In Hart's One-Line Factoring Algorithm, we try to find a congruence of squares of the form:

`a^2 ≡ b^2 mod n`

where `a = 1 + floor(sqrt(n k))`, for some positive integer `k`.

Then `gcd(a - b, n)` may be a non-trivial factor of `n`.

This method is effective for numbers of the form `n = x * (x k ± z)`,  where `x` can be arbitrary large, but `k` and `z` are small.

# Carmichael method

In some sense, this is a generalization of the HOLF method and can factorize numbers of the form:

`n = x * \prod_{k=1}^r ((x±1) a_k ± 1)`

for a given value `r` and a small range `[L,R]` that includes all the values `a_k`, where the set `{a_k}` is a combination of `r` elements in the given range `[L,R]`.

This method can be efficiently implemented using the binary search algorithm to find the value `x` in the range:

`2 <= x <= floor(n^(1/(r+1)))`

Numbers of this form include the Carmichael numbers and the Lucas pseudoprimes, and can be factorized relatively fast by this method when `r` is small and the range `[L,R]` is not very wide.

# Miller-Rabin `n-1` method

The Miller-Rabin primality test can sometimes produce a factor of `n`. By writing `n-1` as:

`n-1 = 2^s d`

with `s >= 1` and `d` odd, we try to find `x ≡ a^d mod n`, for some base `a >= 2`, such that `x ≢ ±1 mod n`, and `x^(2^k d) ≡ 1 mod n`, for some `k` in the range `1 <= k < s`.

If such congruence is found, then `y = x^(2^(k-1) d) mod n` factorizes `n` as:

`n = gcd(y+1, n) * gcd(y-1, n)`

Alternatively, a simpler version would be to compute `x = a^(2^k d)  mod n` and check if `x ± 1` has a common factor with `n`, for some `k` in the range `0 <= k < s` and some random value of `a`.

This method works best in factoring Carmichael numbers and some Fermat pseudoprimes.

# Lucas-Miller `n+1` method

In the same spirit as the Miller-Rabin factorization method, we can take advantage of the Lucas sequence `U_n(P,Q)`, writing `n+1` as:

`n+1 = 2^s d`

with `s >= 1` and `d` odd, and checking if `U_(d 2^k)(P,Q)` (computed modulo `n`) has a common factor with `n`, for some `k` in the range `0 <= k < s` and some random values of `P` and `Q`.

This method works best in factoring Lucas-Carmichael numbers and some Lucas pseudoprimes.

Another version of this method, is the `n-1` variation, in which we write `n-1 = 2^s d`, with `d` odd, and we check `V_(d 2^k)(P,Q)` and `V_(d 2^k)(P,Q) - P`, which is able to factorize Carmichael numbers and Fermat pseudoprimes.

# FLT factorization method

Inspired by Fermat's Little Theorem (FLT), we can try to find a non-trivial factor of `n` by checking `gcd((2^n mod n) - (2^k mod n), n)` for some small odd `k >= 1`.

With `n = p*q`, let:

`a = 2^p mod n`
`b = 2^q mod n`

then:

`2^n ≡ a+b-2 mod n`

where:

`gcd((2^n mod n) - (2^(q-p + 1) mod n), n)`

is a non-trivial factor of `n`.
 
The method can be efficiently implemented iteratively, by choosing a base `b >= 2` and setting:
 
`z = b^n mod n`
`w_0 = b`
 
If `z != b`, then for `k >= 1`, we iteratively do:
 
`w_k = w_{k-1} * b^2 mod n`

at each step checking if `gcd(z - w_k, n)` is a non-trivial factor of `n`.

This method is effective, in particular, when `n` has all of its prime factors close to each other.
 
More generally, we can consider any C-finite sequence `f(n)`, trying to find a factor of `n` by checking:

`gcd((f(n) mod n) - (f(k) mod n), n)`
 
or
 
`gcd((f(n) mod n) - (f(1+floor(n/k)) mod n), n)`
 
for several small `k >= 1`.

# Pollard rho method

Pollard's rho method tries to find a congruence of the form `f(x_i) ≡ f(x_j) mod p`, for some polynomial `f(x)`, where `p` is a prime factor of `n`.

For example, let: `f(x) = x^2 + 1`. If `x_i ≡ x_j mod p`, then `f(x_i) ≡ f(x_j) mod p`.

Since we do not know the value of `p`, we have to work modulo `n` and assume that there exists some `p` that divides `n` which satisfies the wanted congruence.

If `f(x_i) ≡ f(x_j) mod p`, this means that `gcd(f(x_i) - f(x_j), n)` is a divisor of `n`.

This method can be efficiently implemented by using a cycle detection algorithm, such as Floyd's cycle detection algorithm, starting by initializing the following two sequences modulo `n`, for some value of `a`:

`x_0 = f(a) mod n`
`y_0 = f(f(a)) mod n`

then, for `i >= 0`, we iteratively do:

`x_(i+1) = f(x_i) mod n`
`y_(i+1) = f(f(y_i)) mod n`

at each step checking if `gcd(x_i - y_i, n)` is a non-trivial factor of `n`.

By the birthday paradox, the Pollard rho method is expected to find a prime factor `p` of `n` in `O(sqrt(p))` steps.

The polynomial `f(x)` can be optimized based on what is known about the factors of `n`. In general, the most used polynomial is `f(x) = x^2 + c`, for some constant `c`.

A variation of this method, that we call the "rho-exp" method, uses the following interesting polynomial:

`f(x) = x^k + 2 k - 1`

where `k = lcm(1..B)`, for some small bound `B`, which is, in some sense, a fusion between Pollard's rho and Pollard's p-1 methods.

# Pollard `p-1` method

Pollard's p-1 method is based on Fermat's little theorem which states that:

`a^(p-1) ≡ 1 mod p`

for all prime numbers `p` with `gcd(a, p) = 1`.

Based on the identity `(a^b)^c = a^(b c)`, we have:

`(a^(p-1))^k ≡ a^((p-1) k) ≡ 1 mod p`

Therefore, if we can compute a multiple of `p-1`, then `gcd((a^((p-1) k) mod n) - 1, n)` may be a non-trivial factor of `n`, where `p` is a prime factor of `n`.

This method is effective when `p-1` has only small factors (i.e.: `p-1` is `B`-smooth, for some small bound `B`).

Then `(a^(B!) mod n) - 1` is very likely to be a non-trivial factor of `n`, where `a` is any integer coprime to `n`.

A more optimized implementation, which does not require computing `B!` explicitly, is by setting:

`x_0 = 2`   (where `2` can be any other integer that is coprime to `n`)

then, for `k >= 1`, performing:

`x_k = x_(k-1)^k mod n`

at each step checking if `gcd(x_k - 1, n)` is a non-trivial factor of `n`.

Technically, we only need to be able to compute a multiple of the smallest divisor `d` of `p-1`, satisfying `a^d ≡ 1 mod p`. This value of `d` is called the multiplicative order of `a` modulo `p`.

Then `gcd((a^(d m) mod n) - 1, n)` is a divisor of `n` for all `m >= 1`.

# Fibonacci `p±1` factorization method

The Fibonacci factorization method is based on the well-known property satisfied by all the prime numbers `p`:

`F_(p-k) ≡ 0 mod p`

where `k = (5/p)` is the Legendre symbol and `F_n` is the `n`-th Fibonacci number.

Therefore, if we can compute a multiple of the smallest divisor `d` of `p - (5/p)`, satisfying `F_d ≡ 0 mod p`, where `p` is a prime factor of `n`, then `gcd(F_(d m) mod n, n)` is a divisor of `n` for all `m >= 1`.

This method is similar in flavor to Pollard's p-1 and Williams' p+1 algorithms, and can efficiently find a prime factor `p` of `n`, if the smallest divisor `d` of `p-(5/p)`, satisfying `F_d ≡ 0 mod p`, is `B`-smooth for some small bound `B`.

# Lucas `p±1` factorization method

A generalization of the Fibonacci factorization method, would be to use the Lucas sequence `U_n(P,Q)`, for which the following congruence is satisfied for all the prime numbers `p`:

`U_(p - k)(P,Q) ≡ 0 mod p`

where `k = (D/p)` is the Legendre symbol and `D = P^2 - 4Q`, with the condition that `0 < P < p` and `Q < p` and `D != 0`.

We can take advantage of this property by computing `L = lcm(1..B)` for a given small bound `B`, and iterating over `P` in a given range `[a,b]`, then checking if `gcd(U_L(P,Q) mod n, n)` is a non-trivial factor of `n`, where `Q` is a random integer value satisfying `-n < Q < n` with the condition that `P^2 - 4Q != 0`.

This method is effective when the smallest divisor `d` of `p - (D/p)`, satisfying `U_d(P,Q) ≡ 0 mod p`, is `B`-smooth for some small bound `B`.

If we can compute a multiple of `d`, then `gcd(U_(d m)(P,Q) mod n, n)` is a divisor of `n` for all `m >= 1`.

# Chebyshev `p±1` factorization method

The Chebyshev factorization method uses the following nesting property of the Chebyshev polynomials `T_n(x)`:

`T_m(T_n(x)) = T_{m n}(x)`

By writing the Chebyshev polynomials `T_n(x)` in terms of the Lucas sequence `V_n(P, Q)`, we have:

`T_n(x) = \frac{V_n(2x, 1)}{2}`

This method is effective in finding a prime factor, `p` of `n`, when `p-1` or `p+1` is `B`-smooth, for some small value of `B`.

In factoring `n`, we start by setting an initial value for `x_0`, then for each `k >= 1`, we perform the following iteration:

`x_k = T_k(x_{k-1}) mod n`

at each step checking if `gcd(x_k - 1, n)` is a non-trivial factor `n`.

# Quadratic `p±1` factorization method

The quadratic `p±1` factorization method is based on quadratic integers of the form `a + b sqrt(w)`, for some `w >= 2`.

By representing a quadratic integer `a + b sqrt(w) = (a,b)`, for some fixed `w >= 2`, we have the following multiplication properties:

`(a,b) * (c,d) = ((a c + b d w), (a d + b c))`

`(a,b)^2 = ((a^2 + b^2 w), (2 a b))`

This allows us to compute `(a, b)^L mod n` using exponentiation by squaring, where `L = lcm(1..B)` for some small bound `B`.

More efficiently, we can start by setting `(a,b)` to some random integers coprime to `n`, then, for each prime `p` in the range `[2,B]`, we perform the following iteration:

`(a,b) := (a,b)^(p^(floor(log_p(B)))) mod n`

at each step checking if either `gcd(a-1, n)` or `gcd(b, n)` is a non-trivial factor of `n`.

This method is effective when `n` has a prime factor `p` such that `p-1` or `p+1` is `B`-smooth.

# Sophie Germain's factorization identity

Sophie Germain's factorization identity is given as:

`x^4 + 4y^4 = (x^2 + 2xy + 2y^2) (x^2 - 2xy + 2y^2)`

This identity can also be used in factoring integers of the form `a^4 + 4^(2b+1)`, which can be written as `x^4 + 4y^4`, for `x = a` and `y = 2^b`, as illustrated bellow:

`a^4 + 4^(2b+1) = a^4 + 4 * 4^(2b) = a^4 + 4 * (2^b)^4`

The OEIS sequence of integers that can be represented as `x^4 + 4y^4` is A227855.

# Other factorization identities

Another special factorization identity can be created for integers of the form `2^k (2^k - k + 1) + 1`, for special values of `k`,  similar in flavor to Sophie Germain's factorization identity.

For example, let: `a(n) = 2^f(n) (2^f(n) - f(n) + 1) + 1`, where `f(n) = 4 n (n+1)`.

Then `a(n)` can be factorized as:

`a(n) = (x - y - z + 1) (x + y + z + 1)`

where:

`x = 2^f(n)`
`y = 2^(f(n)/2)`
`z = n * 2^(f(n)/2 + 1)`

Based on this identity, we can also conclude that `a(n)` is composite for all `n >= 1`.

Also of interest may be the following identity (see: A231735):

`(2n+1)^2 (2k)^(2k) - 1 = (2^(k+1) k^k n + 2^k k^k - 1) (2^(k+1) k^k n + 2^k k^k + 1)`

# Implementations

This section includes source-code implementations of the presented methods (+ some additional methods), implemented in the Perl and Sidef programming languages.

:: Sidef

:: Perl

Comments