Efficiency Improvements in Maple 13
Maple 13 includes numerous efficiency improvements. Some highlights where dramatic speed-ups were achieved are described below.
Linear systems over Q and over cyclotomic fields
More LinearAlgebra efficiency improvements
Modular multivariate polynomial multiplication and division
New sparse modular algorithm for Greatest Common Divisors over algebraic number and function fields
Efficiency improvements to lcoeff and tcoeff
heap[extract] is more efficient
Multithreaded performance
Element-wise operations
Improvements to GetSubImage in the ImageTools package
Decreased garbage collection frequency
A new dense p-adic lift algorithm has improved the efficiency of LinearAlgebra[LinearSolve] and LinearAlgebra[Modular][IntegerLinearSolve]. See Enhancements to LinearAlgebra for details.
Maple 13 includes a fast modular algorithm for solving linear systems over cyclotomic fields. For example, if a linear system involves RootOf⁡f,x where f⁡x is a cyclotomic polynomial of degree > 1, then the fast code will be used. See also numtheory[cyclotomic].
As a comparison, consider the following example:
r := RootOf(x^4+x^3+x^2+x+1);
r≔RootOf⁡_Z4+_Z3+_Z2+_Z+1
cf := ()->randpoly(r, degree = 3):
nv := 50:
vars := seq(x[i],i=1..nv):
sys := {seq(randpoly([vars],degree=1,coeffs=cf),i=1..nv)}:
tt := time():
solve(sys,{vars}):
time()-tt;
0.212
In Maple 12, this same problem took almost 7 times as long.
Copying of a Matrix to a result with a shape (indexing function) is faster. The following Cholesky decomposition operation is approximately twenty times as fast as in Maple 12.
with(LinearAlgebra):
N := 1000: M := RandomMatrix(N,N,generator=-0.5..0.5, outputoptions=[datatype=float[8]]):
L := Matrix(M,shape=triangular[lower]):
M := Matrix(L.Transpose(L)):
for i from 1 to N do M[i,i]:=i; end do:
for i from 1 to N do M[i,i] := i; end do:
st := time():
LUDecomposition(M,method=Cholesky):
time() - st;
0.02
The following LU decomposition operation is approximately ten times as fast and requires only 80% of the memory allocation as compared to Maple 12.
LUDecomposition(M):
time()-st;
0.03
The new thin option to SingularValues gives a result with only min⁡m,n right and left singular vectors. This produces a marked speedup for LeastSquares solving when the number of rows is much larger than the number of columns. The following singular values decomposition operation is approximately thirty times as fast and requires only 4% of the memory allocation as compared to the non-thin computation available in Maple 12.
m := 10000: n := 100: M := RandomMatrix(m,n,generator=-1.0..1.0, outputoptions=[datatype=float[8]]):
SingularValues(M,output=[U,S,Vt],thin=true):
1.78
New heap-based modular multivariate polynomial multiplication and division algorithms have been added to Maple which are faster than the prior multiplication and division implementations.
This expansion has sped up by approximately 50%:
p1 := randpoly([x,y,z],degree=20,terms=50):
p2 := modp(Expand(p1 ^ 3),13):
0.012
This division has sped up by approximately 60%:
modp(Divide(p2,p1),13);
true
For more information about these commands, see Expand and Divide.
A new algorithm has been added for evaluating Greatest Common Divisors over algebraic number and function fields, which is particularly effective for sparse polynomials. For example, this Gcd computation is about 95% faster:
alias(p=RootOf(z^3+t*z^2+s*t,z)):
alias(q=RootOf(z^2+2)):
g := (y+p*t+q)*(t*x+s*y*p+q):
a := t*x+s*y+p*s*t+p^2+q:
b := x+s*x*y-t*p^2+p+t+q*s:
A := evala(Expand(g*a)):
B := evala(Expand(g*b)):
evala(Gcd(A, B));
0.5
The efficiency of lcoeff and tcoeff has been improved. The following computation of the leading and trailing coefficients took more than twice as long and used considerably more memory in Maple 12:
vars := [seq(x[i],i=0..19)];
vars≔x0,x1,x2,x3,x4,x5,x6,x7,x8,x9,x10,x11,x12,x13,x14,x15,x16,x17,x18,x19
f := add(i*vars[i], i=1..20);
f≔x0+2⁢x1+3⁢x2+4⁢x3+5⁢x4+6⁢x5+7⁢x6+8⁢x7+9⁢x8+10⁢x9+11⁢x10+12⁢x11+13⁢x12+14⁢x13+15⁢x14+16⁢x15+17⁢x16+18⁢x17+19⁢x18+20⁢x19
g := expand(f^6):
(nops,length)(g);
177100,10915266
t := time(): (lcoeff,tcoeff)(g, vars); time() - t;
1,64000000
0.004
Other improvements to lcoeff and tcoeff are described in Enhancements to Symbolic Capabilities in Maple 13.
The heap[extract] function now performs fewer comparisons than in prior Maple versions. The number of comparisons performed in extracting all elements from an n element heap for Maple 13 versus Maple 12 is provided in the following table:
Number of
Elements
Maple 12
Maple 13
10
27
26
20
89
68
30
158
116
40
239
170
50
338
242
60
416
295
70
517
345
80
642
90
737
483
100
841
554
Many parts of the Maple evaluation engine have been updated to improve the performance when executing in parallel. Although there is still more work to be done, the performance in parallel is significantly faster.
p := proc(A::Vector,B::Vector,C::Vector,m::integer,n::integer) option hfloat; local i; for i from m to n do C[i] := A[i]^2+B[i]^2; end do; end proc: n := 10^7: m := n/2: A := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ): B := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ): C := Vector( 1..n, datatype=hfloat ):
tt := time[real](): p(A,B,C,1,m): p(A,B,C,m+1,n): time[real]() - tt;
33.089
tt := time[real](): id1 := Threads:-Create( p(A,B,C,1,m) ): p(A,B,C,m+1,n): Threads:-Wait( id1 ): time[real]() - tt;
22.873
In Maple 12, the parallel case takes 35.568 seconds, slightly longer than the single threaded case.
The elementwise operator syntax can be more efficient than zip or map in some cases. Specific operations have been optimized compared to the general code used by zip or map. In other cases, depending on the input, built-in hardware floating-point routines can be used directly.
A := LinearAlgebra:-RandomMatrix(1000):
time(zip(`*`,A,2));
0.608
time( A *~ 2 );
0.096
B := LinearAlgebra:-RandomMatrix(1000,outputoptions=[datatype=float[8]]):
time(zip(`*`,B,2));
1.684
time( B *~ 2 );
0.017
time(map(`sin`,B));
5.456
time( sin~(B));
0.116
The GetSubImage routine of the ImageTools package is faster for the in-place case where the output is specified. The following repeated GetSubImage operation is approximately twice as fast as in Maple 12.
N := 500:
n := 100:
image := Array(1..N,1..N,datatype=float[8],order=C_order):
container := Array(1..n,1..n,datatype=float[8],order=C_order):
for i from 1 to 1000 do ImageTools:-GetSubImage(image,10,10,n,n,output=container); end do:
0.220
In previous versions of Maple, a garbage collection cycle was initiated after allocating approximately 10^6 words of memory. Given the increase in the amount of memory available on current computers the frequency of garbage collection can be reduced without placing a significant burden on the memory subsystem. In Maple 13, the garbage collector is activated after allocating a minimum of 10^7 words. On average, this yields an improvement of 19% on 32-bit processors and about 10% for 64-bit processors with a modest increase in memory usage. The frequency of garbage collection can be tuned to achieve the best performance for your application by adjusting the gcfreq variable using the kernelopts mechanism. For example,
gcTest := proc() local A, n, resultList; for n to 30 do A := RandomMatrix(n,n); resultList:= LUDecomposition(A,output=['P','L','U1','R']); end do; end proc:
kernelopts(gcfreq=10^6);
10000000,0.1
gc(): start := time(): gcTest(): t1 := time() - start;
t1≔0.924
kernelopts(gcfreq=10^7);
1000000
gc(): start := time(): gcTest(): t2:= time() - start;
t2≔0.860
See Also
Index of New Maple 13 Features
Download Help Document