Architecture support to boost NTL?

An NTL discussion forum

Architecture support to boost NTL?

Postby Barukh » Sat Jan 30, 2016 5:08 pm

Sincere Victor et. al.,

I would like to discuss possible extensions of processors' instruction set to boost various NTL algorithms.

Victor, in your latest NTL release there is a "more aggressive" use of Intel's PCLMUL instruction, which is an indication that specific algorithms may benefit from it.
Also, Daniel Bernstein published a relevant post at his weblog recently: http://blog.cr.yp.to/20140517-insns.html.

So, in general, do you see any specific extensions that might be beneficial? In particular, I am interested in extensions for Elliptic Curves algorithms (which are on Victor's wish list).
Barukh
 
Posts: 3
Joined: Sat Jan 30, 2016 4:54 pm

Re: Architecture support to boost NTL?

Postby victorshoup » Sat Jan 30, 2016 10:09 pm

This is an interesting topic. My nearer term goals are to more fully exploit CPU features
that have recently emerged, or will emerge in the near future.

The latest NTL releases (9.6.3/9.6.4) automatically detects the availability of
AVX and AVX2/FMA. Right now, all I have done with these is to use them
to speed up the CompMod routines for zz_pX, when p is small enough that we
can compute inner products exactly in double precision. I have also been working
on doing the same type of thing for matrix multiplication mod small p. This is
a part of another project, but I hope to fold this code into NTL. For both the
CompMod and matrix multiplication code, cache friendliness is also an issue
that I have been working on.

All of this is closely related to the work that the
fflas people are doing. I would like to get some of this stuff implemented
directly into NTL...including an implementation for matrix multiplication
that exploits multicore. This could even be extended (as in fflas) to
matrix multiplication mod big p via chinese remaindering.
I think having this natively implemented in NTL would be nice, rather than
depending on things like OpenBLAS and OpenMP.

Looking forward, Intel's AVX512 instruction set includes IFMA52 instructions: these are
SIMD instructions that allow 52x52 -> 104 bit integer multiplication. I fear that
is all Intel is going to implement any time soon. It would be great if there was
a true 64x64 -> 128 bit multiplication instruction, but that seems unlikely.

The IFMA52 instructions are documented, and should appear soon in Skylake Xeon
machines. It will be interesting to see what we can do with those.
One thing for sure is that if we restrict single-precision moduli to 50 bits,
then all kinds of things can exploit this (like the small-prime FFT), so that
should be interesting.

One can in principle already use AVX2 with FMA to do this. There is some nice work
by the Mathemagix people who show show to do this with floating point FMA.
But I am patient and will just wait for IFMA52 to come around: that should
be give much more efficient code.

I wonder what the impact of IFMA52 will be on other things. I would think that
GMP could exploit it. However, the GMP folks I talked to don't seem to be very
optimistic about it. But I think the jury is still out on that.

If only Intel would implement a true 64x64 -> 128 bit SIMD integer mul....
that would be the best....
victorshoup
Site Admin
 
Posts: 32
Joined: Mon Jan 13, 2014 3:18 am

Re: Architecture support to boost NTL?

Postby Barukh » Wed Feb 03, 2016 5:21 pm

Victor,

Thanks for this insightful reply.

I've talk to Intel guys who were in charge of defining this new ISA stuff. They expressed a very high confidence that iFMA52 instructions may be very efficient in certain computations (namely, where many accumulations occur).

They also pointed out that integer 64*64 -> 128 scalar multiplication was introduced in SkyLake processors family. As for the SIMD version - this is indeed very unlikely.
Barukh
 
Posts: 3
Joined: Sat Jan 30, 2016 4:54 pm

Re: Architecture support to boost NTL?

Postby victorshoup » Wed Feb 03, 2016 5:56 pm

wait... 64x64 -> 128 scalar has always been there, with the mul/imul instructions.
do you mean the newer mulx instructions?
but those have been around for a while, too.
I believe AVX512 is supposed to have 64x64 -> 64 SIMD muls.

I've also been talking to a fellow named Shay Gueron, who is involved with these
things at Intel. ifma52 looks intriguing, but we still have to wait a while for actual hardware.
victorshoup
Site Admin
 
Posts: 32
Joined: Mon Jan 13, 2014 3:18 am

Re: Architecture support to boost NTL?

Postby Barukh » Tue Feb 09, 2016 6:17 pm

Victor,

Sorry, my mistake... This wide IMUL instruction was indeed introduced a way ago.
Barukh
 
Posts: 3
Joined: Sat Jan 30, 2016 4:54 pm

Re: Architecture support to boost NTL?

Postby past08 » Thu Mar 17, 2016 1:41 pm

Sincere Victor et. al.,

I would like to discuss possible extensions of processors' instruction set to boost various NTL algorithms.

Victor, in your latest NTL release there is a "more aggressive" use of Intel's PCLMUL instruction, which is an indication that specific algorithms may benefit from it.
Also, Daniel Bernstein published a relevant post at his weblog recently: http://blog.cr.yp.to/20140517-insns.html.

So, in general, do you see any specific extensions that might be beneficial? In particular, I am interested in extensions for Elliptic Curves algorithms (which are on Victor's wish list).


This is an interesting topic.
past08
 
Posts: 1
Joined: Thu Mar 17, 2016 1:22 pm

Re: Architecture support to boost NTL?

Postby victorshoup » Thu Mar 17, 2016 2:43 pm

That would be great if you could contribute!
Regarding PCLMUL: I believe that the main application there would
be to the GF2X arithmetic. There are probably many ways that GF2X
arithmetic could be improved. However, one should bear in mind
that the gf2x library already has a lot of improvements and can be
plugged into NTL as a "back end" (sort of like GMP, except that right
now, the gf2x library only does multiplication).

Here is the link to the gf2x project:
https://gforge.inria.fr/projects/gf2x/
You can talk to one of the developers there, Paul Zimmermann for example.

But maybe you have have other ideas :-)
victorshoup
Site Admin
 
Posts: 32
Joined: Mon Jan 13, 2014 3:18 am


Return to NTL

Who is online

Users browsing this forum: No registered users and 2 guests

cron