Efficiency Improvements in Maple 14
Maple 14 includes numerous efficiency improvements. Some highlights where dramatic speed-ups were achieved are described below.
|
Intel (R) MKL 10.0 (Windows)
|
|
•
|
The Intel (R) Math Kernel Library (Intel (R) MKL) version 10.0 replaces a prior version on 32-bit Windows, and replaces generic BLAS on 64-bit Windows. These highly optimized core routines are used in various places throughout Maple.
|
•
|
The following example highlights how performance in Maple 14 is significantly improved.
|
>
|
M:=LinearAlgebra:-RandomMatrix(2000,datatype=float[8]):
|
On a 2.13GHz Core2 Duo, running 32-bit Windows, Maple 14 takes 2.44 seconds now compared to 7.95 seconds in Maple 13.02.
On a 2.00GHz dual quad core Xeon, running 64-bit Windows, Maple 14 takes 2.437 seconds now compared to 25.593 seconds in Maple 13.02.
|
|
Cyclotomic Polynomials
|
|
•
|
A new implementation of numtheory[cyclotomic] constructs cyclotomic polynomials significantly faster than the previous implementation for large prime factors.
|
•
|
As a comparison, consider the following example:
|
>
|
numtheory[cyclotomic](255255,x):
|
|
This is about 1000 times as fast as in Maple 13. The amount of speedup increases with the size of the problem.
|
|
|
Polynomial Arithmetic
|
|
•
|
Polynomial multiplication and division over integers can be performed faster than in Maple 13 because of an improved implementation. Multiple threads may also be used for large polynomial multiplications.
|
•
|
The following example is about twenty times as fast as in Maple 13:
|
>
|
f,g := seq(randpoly([x,y,z],dense,degree=30),i=1..2):
|
•
|
Higher level commands for polynomial arithmetic, such as factor, benefits from the improvements described above. The following example is about 10 times as fast as in Maple 13:
|
>
|
f := expand((1+x+y+z+t)^11)+1:
|
•
|
Maple 14 adds improved algorithms for multiplication, division, inversion, and gcd of recursive dense polynomials, which is the representation used to compute with polynomials over algebraic number fields.
|
•
|
The new algorithms preallocate memory and run in place, which makes them significantly faster when multiple extensions are present.
|
•
|
The following example is about twice as fast as in Maple 13:
|
>
|
alpha := RootOf(Randprime(2,x) mod p):
|
>
|
beta := RootOf(Randprime(5,x,alpha) mod p):
|
>
|
cofs := ()->randpoly([alpha,beta],degree=4,dense):
|
>
|
f,g,h := seq(Expand(randpoly(x,degree=40,dense,coeffs=cofs)) mod p,i=1..3):
|
>
|
time(assign('a'=Expand(f*g) mod p));
|
>
|
time(assign('b'=Expand(g*h) mod p));
|
>
|
time(assign('g'=Gcd(a,b) mod p));
|
|
|
Fetching Information of Polynomials via has, sort, degree, type
|
|
•
|
The commands has, sort, degree, and type are now much faster when algebraic numbers or functions are present. A new implementation reduces the time complexity of these commands.
|
•
|
For the examples below, Maple 14's implementation is 3 seconds faster than Maple 13 on the commands concerning F:
|
>
|
r := RootOf(numtheory[cyclotomic](997,Z),Z):
|
>
|
F := [seq(add(c()*Z^c()*x^i, i=0..10^4-1), j=1..10)]:
|
>
|
time(map(degree, eval(F,1), x));
|
>
|
time(map(degree, eval(G,1), x));
|
>
|
time(map(type, eval(F,1), polynom(anything,x)));
|
>
|
time(map(type, eval(G,1), polynom(anything,x)));
|
|
|
Faster GraphTheory Algorithms
|
|
•
|
For the example below, Maple 14 is about 200 times faster and takes 150 MB less memory than Maple 13 to perform the computation.
|
>
|
bt:=CompleteBinaryTree(8):
|
>
|
time(BipartiteMatching(bt));
|
•
|
For the example below, Maple 14 takes 9 fewer seconds and 250 MB less memory than Maple 13 to perform the computation.
|
>
|
cgc:=GraphComplement(cg):
|
>
|
time(ConnectedComponents(cgc));
|
•
|
For the example below, Maple 14 is 9 times faster than Maple 13.
|
>
|
time(MaxFlow(sbw,1,20));
|
•
|
In addition, basic GraphTheory commands (including CompleteGraph, DisjointUnion, and AddVertex) have been made more efficient for large graphs having many vertices and edges.
|
|
|
CUDA Acceleration
|
|
•
|
We have added support for accelerating LinearAlgebra routines using NVIDIA's CUDA technology. Matrix multiplication can be accelerated for a variety of datatypes and shapes. See the CUDA package and CUDA,supported_routines for more information. Note: the following example will only show a speed up if the computer on which it is run supports CUDA. See CUDA,supported_hardware for more information.
|
>
|
m1 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):
|
>
|
m2 := LinearAlgebra:-RandomMatrix( 2000,2000,outputoptions=[datatype=float[4]] ):
|
>
|
to N do mNoCuda := m1.m2: end:
|
>
|
tNoCuda := time[real]()-t;
|
>
|
to N do mCuda := m1.m2: end:
|
>
|
tCuda := time[real]()-t;
|
|
|
. (dot) Operator Improvements
|
|
We reduced the overhead of calling the . operator. This is particularly significant when working with small matrices.
•
|
In Maple 13, the following commands took over 9 seconds to complete.
|
>
|
A := Matrix([[9,5],[6,4]]):
|
>
|
B := Matrix([[6,7],[3,10]]):
|
>
|
to n do to n do A.B; od; od:
|
|
|
Direct Access to Solvers
|
|
|
|
|
|
|
|