Improved Parallel Garbage Collector
Multivariate Modular Gcds
More special functions implemented in evalhf
More consistent evaluation of elementary and special functions in evalf, evalhf, and hardware floats
The performance of several commands in the LinearAlgebra package for floating-point operations done at default double precision is improved in Maple 2015 on the 64-bit Windows platform. The improvement is due to updating the version and the usage of the Intel Math Kernel Library (MKL).
The following plots were computed in Windows 7 64-bit on a machine with a 3.50 GHz AMD FX-8320 8-core processor. The degree of improvement of some commands may vary with the machine hardware.
Maple garbage collector has been improved to perform more operations in parallel. This allows for greater parallelism during collection, leading to better performance. The following graphs compare the performance between Maple 18 and Maple 2015.
The graph in the left compares performance of the total time for the benchmark (as mentioned later) vs. the number of cores used. The graph in the right compares performance of the garbage collector vs. the number of cores. The improved collector in Maple 2015 shows better parallel performance, especially as additional cores are added, whereas Maple 18 tends to plateau quite quickly.
This benchmark can be re-run using the code from the following Code Edit Region. The graphs shown earlier were generated using 10 iterations and Array size of 106.
The performance for computing greatest common divisors of multivariate polynomials modulo of a machine size prime number is increased by orders of magnitude. The improvement is due to a new optimized C-level implementation based on evaluation and interpolation. On 64-bit platforms, the following examples are more than 20 times faster in Maple 2015 than in Maple 18 on the same machine.
f__2≔Expandx+y+270 mod p:
g__2≔Expandf__22 mod p:
h__2≔Expandf__2⋅f__2+1 mod p:
F__2≔CodeTools:-UsageGcdg__2,h__2 mod p:evalbF__2=f__2
memory used=109.80KiB, alloc change=0 bytes, cpu time=16.00ms, real time=20.00ms, gc time=0ns
f__3≔Expandx+y+z+215 mod p:
g__3≔Expandf__32 mod p:
h__3≔Expandf__3⋅f__3+1 mod p:
F__3≔CodeTools:-UsageGcdg__3,h__3 mod p:evalbF__3=f__3
memory used=16.70KiB, alloc change=0 bytes, cpu time=31.00ms, real time=30.00ms, gc time=0ns
Improvements are significant for smaller primes as well. Following are some benchmarks, taken on an AMD Opteron 62xx class CPU running at 2.1GHz. In all cases, the input polynomials are dense of degree d and the gcd has degree d/2.
The Airy functions, AiryAi and AiryBi, and the LambertW function are now implemented in evalhf, for fast hardware precision evaluation. For the Airy functions, this implementation is restricted to real arguments (evalhf will call back to the standard library for complex arguments with non-0 imaginary part). For the LambertW function, all branches are implemented in evalhf for real and complex arguments.
Much effort has been put into making the results of evaluating expressions involving floating point numbers consistent across the three evaluation modes: software float evaluation with evalf, hardware float evaluation with evalhf, and hardware float evaluation of expression involving HFloat objects (Maple objects which hold hardware floating point numbers). The correct and consistent evaluation of such expressions where branch cuts are involved is of particular concern. For example, consider the following:
arcsin1.5, arcsin1.5+0. I, arcsin1.5−0. I
evalhfarcsin1.5, evalhfarcsin1.5+0. I, evalhfarcsin1.5−0. I
arcsinHFloat1.5, arcsinHFloat1.5+0. I, arcsinHFloat1.5−0. I
In previous versions of Maple, the second and third sets of results were all the same, as the hardware float computation environments of Maple were not respecting the branch cut.
Download Help Document