A Tour of NTL: Examples: Thread Pools

If you have built NTL with NTL_THREAD_BOOST=on, then not only is NTL thread safe, but certain parts of NTL are designed to use multiple threads to speed things up. To implement this, NTL makes use of a thread pool, which is a collection of threads that are created once and then used over and over again, to avoid the significant overhead of thread creation and destruction. You can also use this same thread pool to speed up NTL client code.

To use this feature, you have to include the header file NTL/BasicThreadPool.h. In your main program, you should also indicate how many threads you want in the pool. If you want, say, 8 threads, you so this by calling the function SetNumThreads(8).

If you do this, then certain parts of NTL will use these threads when possible (this is a working in progress). To use these threads in your own code, the easiest way to do this is with a parallel for loop, illustrated in the following example. See BasicThreadPool.txt for more details. Consider the following routine:

   void mul(ZZ *x, const ZZ *a, const ZZ *b, long n)
   {
      for (long i = 0; i < n; i++)
         mul(x[i], a[i], b[i]);
   }

We can parallelize it as follows:

   void mul(ZZ *x, const ZZ *a, const ZZ *b, long n)
   {
      NTL_EXEC_RANGE(n, first, last)

         for (long i = first; i < last; i++)
            mul(x[i], a[i], b[i]);

      NTL_EXEC_RANGE_END
   }

NTL_EXEC_RANGE and NTL_EXEC_RANGE_END are macros that just do the right thing. If there are nt threads available, the interval [0..n) will be partitioned into (up to) nt subintervals, and a different thread will be used to process each subinterval. You still have to write the for loop yourself: the macro just declares and initializes variables first and last (or whatever you want to call them) of type long that represent the subinterval [first..last) to be processed by one thread.

Note that the current thread participates as one of the nt available threads, and that the current thread will wait for all participating threads to finish their task before proceeding.

Withing the "body" of this construct, you can freely reference any variables that are visible at this point. This is implemented using the C++ lambda feature (capturing all variables by reference).

This construct will still work even if threads are disabled, in which case it runs single-threaded with first=0 and last=n.

Note that the code within the EXEC_RANGE body could call other routines that themselves attempt to execute an EXEC_RANGE: if this happens, the latter EXEC_RANGE will detect this and run single-threaded.

You may wish to do other things within the EXEC_RANGE body than just execute a loop. One thing you may want to do is to declare variables. Another thing you may want to do is setup a local context for a ZZ_p modulus (or other type of modulus). Here is an example of doing this:

   void mul(ZZ_p *x, const ZZ_p *a, const ZZ_p *b, long n)
   {
      ZZ_pContext context;
      context.save();

      NTL_EXEC_RANGE(n, first, last)

         context.restore();

         for (long i = first; i < last; i++)
            mul(x[i], a[i], b[i]);

      NTL_EXEC_RANGE_END
   }

A lower-level set of tools is available, which allow for more fine-grained control. See BasicThreadPool.txt for more details.