|
Linear systems over Q and over cyclotomic fields
|
|
•
|
Maple 13 includes a fast modular algorithm for solving linear systems over cyclotomic fields. For example, if a linear system involves where is a cyclotomic polynomial of degree > 1, then the fast code will be used. See also numtheory[cyclotomic].
|
|
As a comparison, consider the following example:
|
>
|
r := RootOf(x^4+x^3+x^2+x+1);
|
| (1) |
>
|
cf := ()->randpoly(r, degree = 3):
|
>
|
vars := seq(x[i],i=1..nv):
|
>
|
sys := {seq(randpoly([vars],degree=1,coeffs=cf),i=1..nv)}:
|
|
In Maple 12, this same problem took almost 7 times as long.
|
|
|
More LinearAlgebra efficiency improvements
|
|
Copying of a Matrix to a result with a shape (indexing function) is faster. The following Cholesky decomposition operation is approximately twenty times as fast as in Maple 12.
>
|
N := 1000:
M := RandomMatrix(N,N,generator=-0.5..0.5, outputoptions=[datatype=float[8]]):
|
>
|
L := Matrix(M,shape=triangular[lower]):
|
>
|
M := Matrix(L.Transpose(L)):
|
>
|
for i from 1 to N do M[i,i]:=i; end do:
|
>
|
for i from 1 to N do M[i,i] := i; end do:
|
>
|
LUDecomposition(M,method=Cholesky):
|
The following LU decomposition operation is approximately ten times as fast and requires only 80% of the memory allocation as compared to Maple 12.
>
|
N := 1000:
M := RandomMatrix(N,N,generator=-0.5..0.5, outputoptions=[datatype=float[8]]):
|
The new thin option to SingularValues gives a result with only right and left singular vectors. This produces a marked speedup for LeastSquares solving when the number of rows is much larger than the number of columns. The following singular values decomposition operation is approximately thirty times as fast and requires only 4% of the memory allocation as compared to the non-thin computation available in Maple 12.
>
|
m := 10000:
n := 100:
M := RandomMatrix(m,n,generator=-1.0..1.0, outputoptions=[datatype=float[8]]):
|
>
|
SingularValues(M,output=[U,S,Vt],thin=true):
|
|
|
Modular multivariate polynomial multiplication and division
|
|
New heap-based modular multivariate polynomial multiplication and division algorithms have been added to Maple which are faster than the prior multiplication and division implementations.
This expansion has sped up by approximately 50%:
>
|
p1 := randpoly([x,y,z],degree=20,terms=50):
|
>
|
p2 := modp(Expand(p1 ^ 3),13):
|
This division has sped up by approximately 60%:
>
|
modp(Divide(p2,p1),13);
|
For more information about these commands, see Expand and Divide.
|
|
New sparse modular algorithm for Greatest Common Divisors over algebraic number and function fields
|
|
A new algorithm has been added for evaluating Greatest Common Divisors over algebraic number and function fields, which is particularly effective for sparse polynomials. For example, this Gcd computation is about 95% faster:
>
|
alias(p=RootOf(z^3+t*z^2+s*t,z)):
|
>
|
alias(q=RootOf(z^2+2)):
|
>
|
g := (y+p*t+q)*(t*x+s*y*p+q):
|
>
|
a := t*x+s*y+p*s*t+p^2+q:
|
>
|
b := x+s*x*y-t*p^2+p+t+q*s:
|
>
|
A := evala(Expand(g*a)):
|
>
|
B := evala(Expand(g*b)):
|
|
|
Efficiency improvements to lcoeff and tcoeff
|
|
The efficiency of lcoeff and tcoeff has been improved. The following computation of the leading and trailing coefficients took more than twice as long and used considerably more memory in Maple 12:
>
|
vars := [seq(x[i],i=0..19)];
|
| (10) |
>
|
f := add(i*vars[i], i=1..20);
|
| (11) |
>
|
t := time():
(lcoeff,tcoeff)(g, vars);
time() - t;
|
Other improvements to lcoeff and tcoeff are described in Enhancements to Symbolic Capabilities in Maple 13.
|
|
heap[extract] is more efficient
|
|
The heap[extract] function now performs fewer comparisons than in prior Maple versions. The number of comparisons performed in extracting all elements from an element heap for Maple 13 versus Maple 12 is provided in the following table:
Number of
|
|
|
Elements
|
Maple 12
|
Maple 13
|
10
|
27
|
26
|
20
|
89
|
68
|
30
|
158
|
116
|
40
|
239
|
170
|
50
|
338
|
242
|
60
|
416
|
295
|
70
|
517
|
345
|
80
|
642
|
416
|
90
|
737
|
483
|
100
|
841
|
554
|
|
|
|
|
Multithreaded performance
|
|
Many parts of the Maple evaluation engine have been updated to improve the performance when executing in parallel. Although there is still more work to be done, the performance in parallel is significantly faster.
>
|
p := proc(A::Vector,B::Vector,C::Vector,m::integer,n::integer)
option hfloat;
local i;
for i from m to n
do
C[i] := A[i]^2+B[i]^2;
end do;
end proc:
n := 10^7:
m := n/2:
A := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ):
B := LinearAlgebra:-RandomVector( n, outputoptions=[datatype=hfloat] ):
C := Vector( 1..n, datatype=hfloat ):
|
>
|
tt := time[real]():
p(A,B,C,1,m):
p(A,B,C,m+1,n):
time[real]() - tt;
|
>
|
tt := time[real]():
id1 := Threads:-Create( p(A,B,C,1,m) ):
p(A,B,C,m+1,n):
Threads:-Wait( id1 ):
time[real]() - tt;
|
In Maple 12, the parallel case takes 35.568 seconds, slightly longer than the single threaded case.
|
|
Element-wise operations
|
|
The elementwise operator syntax can be more efficient than zip or map in some cases. Specific operations have been optimized compared to the general code used by zip or map. In other cases, depending on the input, built-in hardware floating-point routines can be used directly.
>
|
A := LinearAlgebra:-RandomMatrix(1000):
|
>
|
B := LinearAlgebra:-RandomMatrix(1000,outputoptions=[datatype=float[8]]):
|
|
|
Improvements to GetSubImage in the ImageTools package
|
|
The GetSubImage routine of the ImageTools package is faster for the in-place case where the output is specified. The following repeated GetSubImage operation is approximately twice as fast as in Maple 12.
>
|
image := Array(1..N,1..N,datatype=float[8],order=C_order):
|
>
|
container := Array(1..n,1..n,datatype=float[8],order=C_order):
|
>
|
for i from 1 to 1000 do
ImageTools:-GetSubImage(image,10,10,n,n,output=container);
end do:
|
|
|
Decreased garbage collection frequency
|
|
In previous versions of Maple, a garbage collection cycle was initiated after allocating approximately 10^6 words of memory. Given the increase in the amount of memory available on current computers the frequency of garbage collection can be reduced without placing a significant burden on the memory subsystem. In Maple 13, the garbage collector is activated after allocating a minimum of 10^7 words. On average, this yields an improvement of 19% on 32-bit processors and about 10% for 64-bit processors with a modest increase in memory usage. The frequency of garbage collection can be tuned to achieve the best performance for your application by adjusting the gcfreq variable using the kernelopts mechanism. For example,
>
|
gcTest := proc()
local A, n, resultList;
for n to 30 do
A := RandomMatrix(n,n);
resultList:= LUDecomposition(A,output=['P','L','U1','R']);
end do;
end proc:
|
>
|
kernelopts(gcfreq=10^6);
|
>
|
gc():
start := time():
gcTest():
t1 := time() - start;
|
>
|
kernelopts(gcfreq=10^7);
|
>
|
gc():
start := time():
gcTest():
t2:= time() - start;
|
|
|
|