Special-purpose factorization algorithms

In this post, we will take a brief look at a nice collection of special-purpose factorization algorithms. Most of them old and well-known. Some of them new.

By a "special-purpose factorization algorithm", we mean an algorithm that can find one or more factors of a large integer n that satisfies a specific special form, without strongly depending on the size of n.

When the special conditions are met, a factor is usually found very fast. On the other hand, when the number does not have the required special form, the algorithm may fail to produce a factor in a reasonable amount of time, or fail entirely.

Special-purpose factorization algorithms are commonly used in general-purpose factorization routines as a pre-factorization test, trying to reduce the size of n by finding easy factors, before sending the number to the heavy general-purpose modern machineries, such as the Quadratic Sieve (QS) or the Number Field Sieve (NFS) algorithms, which strongly depend on the size of n.

The most commonly used special-purpose factorization algorithms are trial division, Pollard's rho method, Pollard's p-1 method and Lenstra's elliptic-curve factorization method (ECM).

# Perfect power check

If n=rk, for some positive integer r and some integer k2, then, in order to find the factorization of n, we only need to factorize r and duplicate each prime factor accordingly.

This can be efficiently checked by iterating over each k, starting from k=log2(n), down to k=2, and checking if n1k is an integer.

For checking if n1k is an integer, let r=n1k, which can be computed efficiently using Newton's method. If n=rk, then n is a perfect power.

This is not considered a factorization algorithm, but it's an import thing to check when factoring an unknown integer.

# Trial division

The simplest way to try to find a small factor of a given integer n, is to set a bound B, at most B=n, and iterate over the prime numbers p in the range [2,B] and check if any such p divides n.

This algorithm is of practical use only for numbers that have at least a small factor (say, less than 108).

If multiple numbers need to be checked for small factors, a simple optimization would be computing the primorial of B only once, which is the product of all the primes in the range [2,B], say P=B#, and using the pre-computed value of P to compute d=gcd(P,n) for multiple values of n, where d is a divisor of n (not necessarily prime), or 1 if n has no prime factors in the range [2,B].

# Difference of squares

By representing a given integer n as a difference of two squares:

n=a2-b2

it can then be factorized as:

n=(a+b)(a-b)

In Fermat's factorization method, we start by setting a=n and checking if (a+k)2-n is a perfect square, for some k1.

This method is effective when n has a factor near n.

# Congruence of squares

An improvement over Fermat's factorization method, noticed by Maurice Kraitchik, is to try to find a congruence of squares modulo n, of the form x2y2modn, with the condition that x±ymodn, which factorizes n as:

n=gcd(x+y,n)gcd(x-y,n)

Same as in Fermat's factorization method,  we can try to find a congruence of squares by setting a=n and checking if (a+k)2modn is a perfect square, for some k1.

However, a slightly better approach would be to use the continued fraction convergents AkBk of n and testing if A2kmodn is a perfect square, for some k1.

When the continued fraction convergents AkBk of n are used, we call it the "Pell factorization method", because it uses the same technique as in solving the Pell equation x2-ny2=1, which has solutions of the form A2k-nB2k=1, for some k1.

This method is effective when n has a factor near n.

The idea of creating a congruence of squares, is used in many modern general-purpose factorization algorithms, including the CFRAC method, which we described in an earlier post.

# Difference of powers

A generalization of Fermat's factorization method, would be to check if n has the form n=ak± bk, for some positive integers a and b, with odd k3. Then either gcd(a-b,n) or gcd(a+b,n) is a non-trivial factor of n.

This method is based on the observation that, for odd k3:

(ax-by) divides (akx-bky)
(ax+by) divides (akx+bky)

Without loss of generality, let's assume that:

n=re11± re22

We can try to find such a representation by fixing B=log2(n) and iterating over each k in the range [2,B].

For each k, we check if any of the following values is a perfect power (let's call it re22):

{n-kx,kx+1-n,n-yk,(1+y)k-n}

where x=logk(n) and y=n1k.

For each such perfect power found, we can represent n as n=re11± re22.

Next, we iterate over each divisor d1 of e1 and over each divisor d2 of e2, checking if gcd(rd11± rd22,n) is a non-trivial factor of n.

This method is effective for numbers of the form ax± by, where x and y have many common divisors.

# Congruence of powers

In this method, we generalize Kraitchik's idea to find congruence of powers of the form akbkmodn, for some positive integers a and b, with odd k3. Then either gcd(a-b,n) or gcd(a+b,n) is a non-trivial factor of n.

Same as in the previous method, we try to find such a representation by fixing B=log2(n) and iterating over each k in the range [2,B].

For each k, we check if either one of the following congruences is true:

xkbkmodn

(x+1)kbkmodn

where x=n1k.

For each such congruence found, we have akbkmodn, where either gcd(a+b,n) or gcd(a-b,n) may be a non-trivial factor of n.

# Cyclotomic polynomials

For some numbers, we can also use the cyclotomic polynomials Φ for finding factors of special form.

Given a composite integer n and a fixed value of b >= 2, we iterate over each k in the range 2 <= k <= floor(1+\log_b(n)), checking if gcd(Φ_k(b), n) is a non-trivial factor of n.

In practice, we can check each value of b in the range 2 <= b <= floor(log_2(n)).

# HOLF method

In Hart's One-Line Factoring Algorithm, we try to find a congruence of squares of the form:

a^2 ≡ b^2 mod n

where a = 1 + floor(sqrt(n k)), for some positive integer k.

Then gcd(a - b, n) may be a non-trivial factor of n.

This method is effective for numbers of the form n = x * (x k ± z),  where x can be arbitrary large, but k and z are small.

# Carmichael method

In some sense, this is a generalization of the HOLF method and can factorize numbers of the form:

n = x * \prod_{k=1}^r ((x±1) a_k ± 1)

for a given value r and a small range [L,R] that includes all the values a_k, where the set {a_k} is a combination of r elements in the given range [L,R].

This method can be efficiently implemented using the binary search algorithm to find the value x in the range:

2 <= x <= floor(n^(1/(r+1)))

Numbers of this form include the Carmichael numbers and the Lucas pseudoprimes, and can be factorized relatively fast by this method when r is small and the range [L,R] is not very wide.

# Miller-Rabin n-1 method

The Miller-Rabin primality test can sometimes produce a factor of n. By writing n-1 as:

n-1 = 2^s d

with s >= 1 and d odd, we try to find x ≡ a^d mod n, for some base a >= 2, such that x ≢ ±1 mod n, and x^(2^k d) ≡ 1 mod n, for some k in the range 1 <= k < s.

If such congruence is found, then y = x^(2^(k-1) d) mod n factorizes n as:

n = gcd(y+1, n) * gcd(y-1, n)

Alternatively, a simpler version would be to compute x = a^(2^k d)  mod n and check if x ± 1 has a common factor with n, for some k in the range 0 <= k < s and some random value of a.

This method works best in factoring Carmichael numbers and some Fermat pseudoprimes.

# Lucas-Miller n+1 method

In the same spirit as the Miller-Rabin factorization method, we can take advantage of the Lucas sequence U_n(P,Q), writing n+1 as:

n+1 = 2^s d

with s >= 1 and d odd, and checking if U_(d 2^k)(P,Q) (computed modulo n) has a common factor with n, for some k in the range 0 <= k < s and some random values of P and Q.

This method works best in factoring Lucas-Carmichael numbers and some Lucas pseudoprimes.

Another version of this method, is the n-1 variation, in which we write n-1 = 2^s d, with d odd, and we check V_(d 2^k)(P,Q) and V_(d 2^k)(P,Q) - P, which is able to factorize Carmichael numbers and Fermat pseudoprimes.

# FLT factorization method

Inspired by Fermat's Little Theorem (FLT), we can try to find a non-trivial factor of n by checking gcd((2^n mod n) - (2^k mod n), n) for some small odd k >= 1.

With n = p*q, let:

a = 2^p mod n
b = 2^q mod n

then:

2^n ≡ a+b-2 mod n

where:

gcd((2^n mod n) - (2^(q-p + 1) mod n), n)

is a non-trivial factor of n.
 
The method can be efficiently implemented iteratively, by choosing a base b >= 2 and setting:
 
z = b^n mod n
w_0 = b
 
If z != b, then for k >= 1, we iteratively do:
 
w_k = w_{k-1} * b^2 mod n

at each step checking if gcd(z - w_k, n) is a non-trivial factor of n.

This method is effective, in particular, when n has all of its prime factors close to each other.
 
More generally, we can consider any C-finite sequence f(n), trying to find a factor of n by checking:

gcd((f(n) mod n) - (f(k) mod n), n)
 
or
 
gcd((f(n) mod n) - (f(1+floor(n/k)) mod n), n)
 
for several small k >= 1.

# Pollard rho method

Pollard's rho method tries to find a congruence of the form f(x_i) ≡ f(x_j) mod p, for some polynomial f(x), where p is a prime factor of n.

For example, let: f(x) = x^2 + 1. If x_i ≡ x_j mod p, then f(x_i) ≡ f(x_j) mod p.

Since we do not know the value of p, we have to work modulo n and assume that there exists some p that divides n which satisfies the wanted congruence.

If f(x_i) ≡ f(x_j) mod p, this means that gcd(f(x_i) - f(x_j), n) is a divisor of n.

This method can be efficiently implemented by using a cycle detection algorithm, such as Floyd's cycle detection algorithm, starting by initializing the following two sequences modulo n, for some value of a:

x_0 = f(a) mod n
y_0 = f(f(a)) mod n

then, for i >= 0, we iteratively do:

x_(i+1) = f(x_i) mod n
y_(i+1) = f(f(y_i)) mod n

at each step checking if gcd(x_i - y_i, n) is a non-trivial factor of n.

By the birthday paradox, the Pollard rho method is expected to find a prime factor p of n in O(sqrt(p)) steps.

The polynomial f(x) can be optimized based on what is known about the factors of n. In general, the most used polynomial is f(x) = x^2 + c, for some constant c.

A variation of this method, that we call the "rho-exp" method, uses the following interesting polynomial:

f(x) = x^k + 2 k - 1

where k = lcm(1..B), for some small bound B, which is, in some sense, a fusion between Pollard's rho and Pollard's p-1 methods.

# Pollard p-1 method

Pollard's p-1 method is based on Fermat's little theorem which states that:

a^(p-1) ≡ 1 mod p

for all prime numbers p with gcd(a, p) = 1.

Based on the identity (a^b)^c = a^(b c), we have:

(a^(p-1))^k ≡ a^((p-1) k) ≡ 1 mod p

Therefore, if we can compute a multiple of p-1, then gcd((a^((p-1) k) mod n) - 1, n) may be a non-trivial factor of n, where p is a prime factor of n.

This method is effective when p-1 has only small factors (i.e.: p-1 is B-smooth, for some small bound B).

Then (a^(B!) mod n) - 1 is very likely to be a non-trivial factor of n, where a is any integer coprime to n.

A more optimized implementation, which does not require computing B! explicitly, is by setting:

x_0 = 2   (where 2 can be any other integer that is coprime to n)

then, for k >= 1, performing:

x_k = x_(k-1)^k mod n

at each step checking if gcd(x_k - 1, n) is a non-trivial factor of n.

Technically, we only need to be able to compute a multiple of the smallest divisor d of p-1, satisfying a^d ≡ 1 mod p. This value of d is called the multiplicative order of a modulo p.

Then gcd((a^(d m) mod n) - 1, n) is a divisor of n for all m >= 1.

# Fibonacci p±1 factorization method

The Fibonacci factorization method is based on the well-known property satisfied by all the prime numbers p:

F_(p-k) ≡ 0 mod p

where k = (5/p) is the Legendre symbol and F_n is the n-th Fibonacci number.

Therefore, if we can compute a multiple of the smallest divisor d of p - (5/p), satisfying F_d ≡ 0 mod p, where p is a prime factor of n, then gcd(F_(d m) mod n, n) is a divisor of n for all m >= 1.

This method is similar in flavor to Pollard's p-1 and Williams' p+1 algorithms, and can efficiently find a prime factor p of n, if the smallest divisor d of p-(5/p), satisfying F_d ≡ 0 mod p, is B-smooth for some small bound B.

# Lucas p±1 factorization method

A generalization of the Fibonacci factorization method, would be to use the Lucas sequence U_n(P,Q), for which the following congruence is satisfied for all the prime numbers p:

U_(p - k)(P,Q) ≡ 0 mod p

where k = (D/p) is the Legendre symbol and D = P^2 - 4Q, with the condition that 0 < P < p and Q < p and D != 0.

We can take advantage of this property by computing L = lcm(1..B) for a given small bound B, and iterating over P in a given range [a,b], then checking if gcd(U_L(P,Q) mod n, n) is a non-trivial factor of n, where Q is a random integer value satisfying -n < Q < n with the condition that P^2 - 4Q != 0.

This method is effective when the smallest divisor d of p - (D/p), satisfying U_d(P,Q) ≡ 0 mod p, is B-smooth for some small bound B.

If we can compute a multiple of d, then gcd(U_(d m)(P,Q) mod n, n) is a divisor of n for all m >= 1.

# Chebyshev p±1 factorization method

The Chebyshev factorization method uses the following nesting property of the Chebyshev polynomials T_n(x):

T_m(T_n(x)) = T_{m n}(x)

By writing the Chebyshev polynomials T_n(x) in terms of the Lucas sequence V_n(P, Q), we have:

T_n(x) = \frac{V_n(2x, 1)}{2}

This method is effective in finding a prime factor, p of n, when p-1 or p+1 is B-smooth, for some small value of B.

In factoring n, we start by setting an initial value for x_0, then for each k >= 1, we perform the following iteration:

x_k = T_k(x_{k-1}) mod n

at each step checking if gcd(x_k - 1, n) is a non-trivial factor n.

# Quadratic p±1 factorization method

The quadratic p±1 factorization method is based on quadratic integers of the form a + b sqrt(w), for some w >= 2.

By representing a quadratic integer a + b sqrt(w) = (a,b), for some fixed w >= 2, we have the following multiplication properties:

(a,b) * (c,d) = ((a c + b d w), (a d + b c))

(a,b)^2 = ((a^2 + b^2 w), (2 a b))

This allows us to compute (a, b)^L mod n using exponentiation by squaring, where L = lcm(1..B) for some small bound B.

More efficiently, we can start by setting (a,b) to some random integers coprime to n, then, for each prime p in the range [2,B], we perform the following iteration:

(a,b) := (a,b)^(p^(floor(log_p(B)))) mod n

at each step checking if either gcd(a-1, n) or gcd(b, n) is a non-trivial factor of n.

This method is effective when n has a prime factor p such that p-1 or p+1 is B-smooth.

# Sophie Germain's factorization identity

Sophie Germain's factorization identity is given as:

x^4 + 4y^4 = (x^2 + 2xy + 2y^2) (x^2 - 2xy + 2y^2)

This identity can also be used in factoring integers of the form a^4 + 4^(2b+1), which can be written as x^4 + 4y^4, for x = a and y = 2^b, as illustrated bellow:

a^4 + 4^(2b+1) = a^4 + 4 * 4^(2b) = a^4 + 4 * (2^b)^4

The OEIS sequence of integers that can be represented as x^4 + 4y^4 is A227855.

# Other factorization identities

Another special factorization identity can be created for integers of the form 2^k (2^k - k + 1) + 1, for special values of k,  similar in flavor to Sophie Germain's factorization identity.

For example, let: a(n) = 2^f(n) (2^f(n) - f(n) + 1) + 1, where f(n) = 4 n (n+1).

Then a(n) can be factorized as:

a(n) = (x - y - z + 1) (x + y + z + 1)

where:

x = 2^f(n)
y = 2^(f(n)/2)
z = n * 2^(f(n)/2 + 1)

Based on this identity, we can also conclude that a(n) is composite for all n >= 1.

Also of interest may be the following identity (see: A231735):

(2n+1)^2 (2k)^(2k) - 1 = (2^(k+1) k^k n + 2^k k^k - 1) (2^(k+1) k^k n + 2^k k^k + 1)

# Implementations

This section includes source-code implementations of the presented methods (+ some additional methods), implemented in the Perl and Sidef programming languages.

:: Sidef

:: Perl

Comments