[Previous] [Up] [Next]

A Tour of NTL: NTL Implementation and Portability


NTL is designed to be portable, fast, and relatively easy to use and extend.

To make NTL portable, no assembly code is used (well, almost none, see below). This is highly desirable, as architectures are constantly changing and evolving, and maintaining assembly code is quite costly. By avoiding assembly code, NTL should remain usable, with virtually no maintenance, for many years.

Minimal platform requirements

When the configuration flags NTL_CLEAN_INT and NTL_CLEAN_PTR are both on (this is not the default, see below), NTL makes two requirements of its platform, neither of which are guaranteed by the C++ language definition, but are essentially universal:
  1. int and long quantities, respectively, are represented using a 2's complement representation whose width is equal to the width of unsigned int and unsigned long, respectively.
  2. Double precision floating point conforms to the IEEE floating point standard.

NTL makes very conservative requirements of the C++ compiler:

The NTL_CLEAN_INT flag

The configuration flag NTL_CLEAN_INT is currently off by default.

When this flag is off, NTL makes another requirement of its platform; namely, that conversions from unsigned long to long convert the bit pattern without change to the corresponding 2's complement signed integer. Note that the C++ standard defines the behavior of converting unsigned to signed values as implementation defined when the value cannot be represented in the range of nonnegative signed values. Nevertheless, this behavior is essentially universal, and more importantly, is is not undefined behavior: implementation-defined behavior must be documented and respected by the compiler, while undefined behavior can be exploited by the compiler in some surprising ways.

Actually, with NTL_CLEAN_INT off, it is also assumed that right shifts of signed integers are consistent, in the sense that if it is sometimes an arithmetic shift, then it is always an arithmetic shift (the installation scripts check if right shift appears to be arithmetic, and if so, this assumption is made elsewhere). Arithmetic right shift is also implementation defined behavior that is essentially universal.

It seems fairly unlikely that one would ever have to turn the NTL_CLEAN_INT flag on, but it seems a good idea to make this possible, and at the very least to identify and isolate the code that relies on these asumptions. Actually, the most recent versions of NTL (especially since v10.0), there is very little such code remaining, and it is not really all that critical to performance any more. Eventually, all such code may disappear completely.

The NTL_CLEAN_PTR flag

The configuration flag NTL_CLEAN_PTR is currently off by default.

When this flag is off, NTL makes another requirement of its platform; namely, that the address space is "flat", and in particular, that one can test if an object pointed to by a pointer p is located in a array of objects v[0..n-1] by testing if p >= v and p < v + n. The C++ standard does not guarantee that such a test will work; the only way to perform this test in a standard-conforming way is to iteratively test if p == v, p == v+1, etc.

This assumption of a "flat" address space is essentially universally valid, and making this assumption leads to more efficicient code. For this reason, the NTL_CLEAN_PTR is off by default, but one can always turn it on, and in fact, the overall performance penalty should be negligible for most applications.

Some floating point issues

NTL uses floating point arithmetic in a few places, including a number of exact computations, where one might not expect to see floating point. Relying on floating point may seem prone to errors, but with the guarantees provided by the IEEE standard, one can prove the correctness of the NTL code that uses floating point.

Briefly, the IEEE floating point standard says that basic arithmetic operations on doubles should work as if the operation were performed with infinite precision, and then rounded to p bits, where p is the precision (typically, p = 53).

Throughout most of NTL, correctness follows from weaker assumptions, namely

The most recent versions of NTL in fact make much less use of floating point than older versions. Also, with few exceptions, the current version of NTL lets an aggressive optimizing compiler (such as Intel's icc) get away with quite a lot: computations may be regrouped rather arbitrarily, and x / y may be computed as x * (1/y) . The only exception to this is the quad_float module (see below).

One big problem with the IEEE standard is that it allows intermediate quantities to be computed in a higher precision than the standard double precision. Most platforms today implement the "strict" IEEE standard, with no excess precision. Up until recently, the Intel x86 machine with the GCC compiler was a notable exception to this: on older x86 machines, floating point was performed using the x87 FPU instructions, which operate on 80-bit, extended precision numbers; nowadays, most compilers use the SSE instructions, which operate on the standard, 64-bit numbers.

Historically, NTL went out of its way to ensure that its code is correct with both "strict" and "loose" IEEE floating point. This is achieved in a portable fashion throughout NTL, except for the quad_float module, where some desperate hacks, including assembly code, may be used to try to work around problems created by "loose" IEEE floating point [more details]. But note that even if the quad_float package does not work correctly because of these problems, the only other routines that are affected are the LLL_QP routines in the LLL module -- the rest of NTL should work fine. Hopefully, because of the newer SSE instructions, this whole strict/loose issue is a thing of the past.

Besides the quad_float module, there are a few other places in NTL where doubles are forced to memory to get rid of excess precsion. This is done by passing a pointer to the double to an externally defined function. Theoretically, link-time optimizers could mess this up, and this solution would have to be revisited. That said, hopefully this whole strict/loose issue is no longer relevant, and also, the code that relies on this is specialized floating point code that is not really on a critical path. So it seems unlikely that this will ever be an issue.

Another problem is that some hardware (especially newer Intel chips) support fused multiply-add (FMA) instructions. Again, this is only a problem for quad_float, and some care is taken to detect the problem and to work around it. The rest of NTL will work fine regardles.

Mostly, NTL does not require that the IEEE floating point special quantities "infinity" and "not a number" are implemented correctly. This is certainly the case for core code where floating point arithmetic is used for exact (but fast) computations, as the numbers involved never get too big (or small). However, the behavior of certain explicit floating point computations (e.g., the xdouble and quad_float classes, and the floating point versions of LLL) will be much more predictable and reliable if "infinity" and "not a number" are implemented correctly.

Algorithms

NTL makes fairly consistent use of asymptotically fast algorithms.

Long integer multiplication is implemented using the classical algorithm, crossing over to Karatsuba for very big numbers. Long integer division is currently only implemented using the classical algorithm -- unless you use NTL with GMP (version 3 or later), which employs an algorithm that is about twice as slow as multiplication for very large numbers.

Polynomial multiplication and division is carried out using a combination of the classical algorithm, Karatsuba, the FFT using small primes, and the FFT using the Schoenhagge-Strassen approach. The choice of algorithm depends on the coefficient domain.

Many algorithms employed throughout NTL are inventions of the author (Victor Shoup) and his colleagues Joachim von zur Gathen and Erich Kaltofen, as well as John Abbott and Paul Zimmermann.

Thread safety

As of v7.0, NTL is thread safe. That said, there are several things to be aware of:

To obtain thread safety, I used the following strategies: The overall structure of the code has been modified so that the code base is nearly identical for regular and thread-safe builds: there are just a few ifdef's on the NTL_THREADS flag.

Thread Boosting

As of v9.5.0, NTL provides a thread boosting feature. With this feature, certain code within NTL will use available threads to speed up computations on a multicore machine. This feature is enabled by setting NTL_THREAD_BOOST=on during configuration. See BasicThreadPool.txt for more information.

This feature is a work in progress. Currently, basic ZZ_pX and Mat<zz_p> arithmetic has been thread boosted. More code will be boosted later.

Error Handling and Exceptions

As of v8.0, NTL provides error handling through exceptions. To enable exptions, you have to configure NTL with NTL_EXCEPTIONS flag turned on. By default, exceptions are not enabled, and NTL reverts to its old error handling method: abort with an error message.

If exceptions are enabled, then instead of aborting your program, and appropriate exception is thrown. More details ion the programming interface of this feature are available here.

If you enable exceptions, you must use a C++11 compiler. Specifically, your compiler will need support for lambdas (which are used to conveniently implement the "scope guard" idiom), and your compiler should implement the new default exception specification semantics (namely, that destructors are "noexcept" by default).

Implementation of this required a top-to-bottom scrub of NTL's code, replacing a lot of old-fashioned code with more modern, RAII-oriented code (RAII = "resource acquisition is initialization").

[Previous] [Up] [Next]