Technical details

Most of the software is written in C, with additional parts in 32-bit (Intel) assembly language. Using assembly for the time critical parts rather than plain C accounts for a speed gain of up to 500%.

Since searches for convergence, glides and path records are several orders of magnitude faster than searches for class records two different algorithms are used.

The convergence algorithm

The algorithm that checks for convergence, glides and path records uses a sieve: only those numbers are checked that can not be shown to either converge or join the path of a lower number within a few steps. The gain from this method is substantial. Using a sieve of 65536 (216) for instance leaves only 1720 out of every 65536 numbers to be checked. Currently a sieve of 232 is used which means the fraction of numbers that can be skipped is over 99.2 %. Apart from that the number is checked to see whether it may occur in the path of a lower number: all numbers that are congruent 2, 4, 5 or 8 mod 9 can easily be shown to occur in such a path and can therefore be skipped as well. This accounts for 44.4 % of all remaining numbers. To speed up the search even further the code uses an "interlaced" technique, searching 212 numbers of the same congruence class modulo 232. This means the sieve can be calculated rather than stored and also enables simultaneous calculation of the iteration path up to the 32nd division for the entire congruence class. This results in a further time reduction of approximately 50%.

The class record algorithm

In the algorithm that searches for high delays several techniques are combined to speed up calculations. The main reduction is achieved by using a sieve to eliminate many numbers beforehand. Different from the one above the sieve in this case checks whether paths come together rather than for straightforward convergence. If two paths join then obviously the upper one can never yield a class record and can be skipped. Currently the sievesize used is 218 elements wide, which in effect makes calculations of over 73 % of all numbers unnecessary.
Example: any number of the form 8k+4 will reach 6k+4 after 3 steps, and so will any number of the form 8k+5. Therefore no number of the form 8k+5 can be a class record (5 itself being the only exception). Numbers of the form 8k+5 therefore do not need to be checked, and all positions of the form 8k+5 in the sieve contain a zero.

Even numbers and all numbers congruent 2 modulo 3 are skipped as well, since potential class records of this type can easily be derived from other, earlier class records. Combining these last ideas with the sieve the calculations of over 90% of all numbers can be shown to be redundant, effectively speeding up the calculations by a factor 10.

A cut-off technique is also applied at several points to either ensure a number is still a candidate for a high delay or else to terminate further processing of this number. In most cases this implies the calculation can be cut short after only a fraction of its trajectory. To see that this technique works assume that two consecutive Delay Records (N1 and N2) have delays D1 and D2 respectively. Once a number drops below N2 its delay can not be higher than D1 + the current iteration count. If this sum is below a certain threshold (e.g. the delay of the lowest class record still missing) processing this number can be abandoned.
Which delay records are used obviously depends on the exact numbers and in fact different intervals require different cut-off points to obtain optimal performance. During the current interval (where starting numbers are approximately 254) trajectories are checked against the delay records having delays of 1820, 1549, 1408, 1307, the 32-bit limit and the 16-bit limit. The latter two limits are used because they ensure that storage during the further trajectory will never require more than 64 bits or 32 bits respectively. The performance gain of this technique depends largely on the magnitude of the numbers concerned, but is usually roughly a factor 10.

Finally a table is used to prevent having to calculate the delays of small numbers too often. In the table all delays of the numbers 2048-4095 are stored. When a number drops below 4096 the 'tail' of the delay is looked up in this table.

Back to the general 3x+1 page.