Technical details
Most of the software is written in C, with additional parts in 32-bit (Intel) assembly language. Using assembly for the time critical parts rather than plain C accounts for a speed gain of up to 500%.
Since searches for convergence, glides and path records are several orders of magnitude faster than searches for class records two different algorithms are used.
The convergence algorithm
The algorithm that checks for convergence, glides and path records
uses a sieve: only those numbers are checked that can not be shown
to either converge or join the path of a lower number within a few steps.
The gain from this method is substantial. Using a sieve of 65536
(216) for instance leaves only 1720 out of every 65536 numbers
to be checked. Currently a sieve of 232 is used which means
the fraction of numbers that can be skipped is over 99.2 %.
Apart from that the number is checked to see whether it may occur
in the path of a lower number: all numbers that are congruent
The class record algorithm
In the algorithm that searches for high delays several techniques are
combined to speed up calculations. The main reduction is achieved by
using a sieve to eliminate many numbers beforehand.
Different from the one above the sieve in this case checks whether
paths come together rather than for straightforward convergence. If two paths
join then obviously the upper one can never yield a class record and can
be skipped. Currently the sievesize used is 218 elements wide,
which in effect makes calculations of over 73 % of all numbers unnecessary.
Example: any number of the form 8k+4 will reach 6k+4 after 3 steps,
and so will any number of the form 8k+5. Therefore no number of the form
8k+5 can be a class record (5 itself being the only exception).
Numbers of the form 8k+5 therefore do not need to be checked,
and all positions of the form 8k+5 in the sieve contain a zero.
Even numbers and all numbers congruent
A cut-off technique is also applied at several points to either ensure
a number is still a candidate for a high delay or else to terminate
further processing of this number. In most cases this implies the
calculation can be cut short after only a fraction of its trajectory.
To see that this technique works assume that two consecutive Delay Records
(N1 and N2) have delays D1 and D2 respectively. Once a number drops below N2
its delay can not be higher than D1 + the current iteration count. If this sum
is below a certain threshold (e.g. the delay of the lowest class record still
missing) processing this number can be abandoned.
Which delay records are used obviously depends on the exact numbers
and in fact different intervals require different cut-off points to
obtain optimal performance. During the current interval (where starting
numbers are approximately 254) trajectories are checked
against the delay records having delays of 1820, 1549, 1408, 1307,
the 32-bit limit and the 16-bit limit. The latter two limits are used
because they ensure that storage during the further trajectory will
never require more than 64 bits or 32 bits respectively.
The performance gain of this technique depends largely on the magnitude
of the numbers concerned, but is usually roughly a factor 10.
Finally a table is used to prevent having to calculate the delays of small numbers too often. In the table all delays of the numbers 2048-4095 are stored. When a number drops below 4096 the 'tail' of the delay is looked up in this table.
Back to the general 3x+1 page.