NTL 10.3.0

An NTL discussion forum

NTL 10.3.0

Postby victorshoup » Mon Nov 21, 2016 4:04 pm

Go to http://shoup.net/ntl for full details.
Below is the latest entry in the change log.

I've also updated the NTL/FLINT comparison document:
http://shoup.net/ntl/benchmarks.pdf
The only significant change is to the ZZ_pX factoring benchmarks
(the new code in NTL speeds up mat_ZZ_p multiplication and
indirectly CompMod and CanZass).

======= Changes in 10.3.0 =======

* Implementation of a multi-modular strategy for matrix multiplication
over ZZ_p. Here are some benchmarks that compare the new strategy to
the old (naive) one. Here, we consider n-by-n matrices modulo an
n-bit prime.
o n=128, speedup=6.8x
o n=256, speedup=8.6x
o n=512, speedup=9.3x
o n=1024, speedup=18x
o n=2048, speedup=37x

I also compared NTL's new mat_ZZ_p multiplication to FLINT's
fmpz_mat multiplication. The latter also uses a multi-modular
strategy. For n=128,256,512,1024 as above, NTL's code is between
2.7 and 3.1 times faster than FLINT's (and we did not count the time
it would take to reduce the entries mod p for the FLINT code).

Part of this may be due to the AVX-enhanced small-prime matrix
multiplication code used by NTL, and part may be due to better CRT
code.

* I also instrumented both the plain and multi-modular matrix
multiplication for ZZ_p so that they are both "thread boosted"
(i.e., will automatcally exploit multiple cores when possible).

* As an initial application of this faster matrix multiplication, I
implemented a new version of Brent/Kung modular composition for
ZZ_pX, which is now between 2 and 5 times faster than the old one
(depending on parameters). This is done with a new class called
ZZ_pXNewArgument, which supersedes ZZ_pXArgument (retained for
compatibility). This also makes the CanZass factoring algorithm for
ZZ_pX faster (sometimes by a factor of almost 2, depending on
parameters). It also makes CanZass more memory intensive (but while
the overall memory usage increases, the memory access pattern is
fairly cache friendly).

* I would like to see if this faster matrix multiplication can be used
to get faster linear algebra (determinants, inverses, more general
Gaussian elimination) via reduction to matrix multiplication. If
anyone wants to volunteer to work on this, please let me know.
Presumably, the FLINT nmod_mat code could be repurposed for this. I
won't have time to work on this for a few months, but would be glad
to discuss ideas with anyone who wanted to do the work. Note that
the "plain" versions of these routines also need some work.

* I also added move methods to the Vec and Mat classes, and made a
slight tweak to the Lazy class.
victorshoup
Site Admin
 
Posts: 32
Joined: Mon Jan 13, 2014 3:18 am

Return to NTL

Who is online

Users browsing this forum: No registered users and 1 guest

cron