# Difference between revisions of "Benchmarks"

### From Gambit wiki

(Document how recent gmp beats Gambit in bignum benchmarks.) |
|||

Line 6: | Line 6: | ||

The [http://shootout.alioth.debian.org/ Computer Language Benchmarks Game] compares various computer languages by benchmarking similar algorithms to a number of small problems in a number of language implementations. Here are a number of Gambit implementations of the programs in the [[programming language shootout]]. Please feel free to propose improvements before they're submitted. | The [http://shootout.alioth.debian.org/ Computer Language Benchmarks Game] compares various computer languages by benchmarking similar algorithms to a number of small problems in a number of language implementations. Here are a number of Gambit implementations of the programs in the [[programming language shootout]]. Please feel free to propose improvements before they're submitted. | ||

+ | |||

+ | ==Some simple operations on large bignums== | ||

+ | |||

+ | In a talk Paul Zimmerman defined <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> to be <tt>(expt 3 2095903)</tt>, <tt>(expt 7 1183294)</tt>, and <tt>(expt 11 1920505)</tt>, respectively, (presumably because <tt>a</tt> and <tt>b</tt> have roughly a million decimal digits, and <tt>c</tt> has roughly 2 million digits) and timed <tt>(* a b)</tt>, <tt>(quotient c a)</tt>, and <tt>(integer-sqrt c)</tt>, and <tt>(gcd a b)</tt>. With | ||

+ | <pre> | ||

+ | heine:~/Desktop> gsi -v | ||

+ | v4.4.4 20090618172941 x86_64-unknown-linux-gnu "./configure --enable-single-host --enable-multiple-versions" | ||

+ | </pre> | ||

+ | running on the machine | ||

+ | <pre> | ||

+ | Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz | ||

+ | </pre> | ||

+ | on Ubuntu 9.04 with gcc-4.3.3, we find the times | ||

+ | <pre> | ||

+ | > (define d (time (* a b))) | ||

+ | (time (* a b)) | ||

+ | 183 ms real time | ||

+ | 180 ms cpu time (172 user, 8 system) | ||

+ | 3 collections accounting for 2 ms real time (0 user, 0 system) | ||

+ | 26078656 bytes allocated | ||

+ | 4313 minor faults | ||

+ | no major faults | ||

+ | > (define e (time (quotient c a))) | ||

+ | (time (quotient c a)) | ||

+ | 808 ms real time | ||

+ | 808 ms cpu time (756 user, 52 system) | ||

+ | 9 collections accounting for 11 ms real time (4 user, 8 system) | ||

+ | 130659840 bytes allocated | ||

+ | 19657 minor faults | ||

+ | no major faults | ||

+ | > (define f (time (integer-sqrt c))) | ||

+ | (time (integer-sqrt c)) | ||

+ | 805 ms real time | ||

+ | 804 ms cpu time (796 user, 8 system) | ||

+ | 21 collections accounting for 12 ms real time (12 user, 4 system) | ||

+ | 160630136 bytes allocated | ||

+ | 6131 minor faults | ||

+ | no major faults | ||

+ | > (define g (time (gcd a b))) | ||

+ | (time (gcd a b)) | ||

+ | 11353 ms real time | ||

+ | 11353 ms cpu time (11325 user, 28 system) | ||

+ | 810 collections accounting for 304 ms real time (284 user, 4 system) | ||

+ | 5055362984 bytes allocated | ||

+ | 10674 minor faults | ||

+ | no major faults | ||

+ | </pre> | ||

+ | With gmp-4.3.1 these same operations take 20ms, 88ms, 68ms, and 524ms, respectively. | ||

+ | |||

+ | When <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> are replaced by <tt>(expt 3 20959032)</tt>, <tt>(expt 7 11832946)</tt>, and | ||

+ | <tt>(expt 11 19205051)</tt>, respectively, so <tt>a</tt> and <tt>b</tt> have about 10,000,000 decimal digits and <tt>c</tt> has about 20 million decimal digits, then the Gambit times are | ||

+ | <pre> | ||

+ | > (define d (time (* a b))) | ||

+ | (time (* a b)) | ||

+ | 1579 ms real time | ||

+ | 1572 ms cpu time (1452 user, 120 system) | ||

+ | 3 collections accounting for 18 ms real time (0 user, 16 system) | ||

+ | 209714208 bytes allocated | ||

+ | 51308 minor faults | ||

+ | no major faults | ||

+ | > (define e (time (quotient c a))) | ||

+ | (time (quotient c a)) | ||

+ | 7132 ms real time | ||

+ | 7124 ms cpu time (6772 user, 352 system) | ||

+ | 11 collections accounting for 66 ms real time (8 user, 68 system) | ||

+ | 1061819624 bytes allocated | ||

+ | 161607 minor faults | ||

+ | no major faults | ||

+ | > (define f (time (integer-sqrt c))) | ||

+ | (time (integer-sqrt c)) | ||

+ | 7565 ms real time | ||

+ | 7528 ms cpu time (7420 user, 108 system) | ||

+ | 20 collections accounting for 34 ms real time (4 user, 24 system) | ||

+ | 1291585184 bytes allocated | ||

+ | 44122 minor faults | ||

+ | no major faults | ||

+ | > (define g (time (gcd a b))) | ||

+ | (time (gcd a b)) | ||

+ | 138545 ms real time | ||

+ | 138441 ms cpu time (138149 user, 292 system) | ||

+ | 1152 collections accounting for 542 ms real time (532 user, 20 system) | ||

+ | 53377143992 bytes allocated | ||

+ | 42407 minor faults | ||

+ | no major faults | ||

+ | </pre> | ||

+ | and the GMP 4.3.1 times are 26ms, 1544ms, 1268ms, and 9992ms, respectively. | ||

+ | |||

+ | So, GMP 4.3.1 is decisively faster (and we are no longer living in "Backwards World"!). Some of the changes that went into GMP 4.3.0 are | ||

+ | |||

+ | Speedups: | ||

+ | |||

+ | * Vastly improved assembly code for x86-64 processors from AMD and Intel. | ||

+ | * Major improvements also for many other processor families, such as Alpha, PowerPC, and Itanium. | ||

+ | * New sub-quadratic mpn_gcd and mpn_gcdext, as well as improved basecase gcd code. | ||

+ | * The multiply FFT code has been slightly improved. | ||

+ | * Balanced multiplication now uses 4-way Toom in addition to schoolbook, Karatsuba, 3-way Toom, and FFT. | ||

+ | * Unbalanced multiplication has been vastly improved. | ||

+ | * Improved schoolbook division by means of faster quotient approximation. | ||

+ | * Several new algorithms for division and mod by single limbs, giving many-fold speedups. | ||

+ | * Improved nth root computations. | ||

+ | * The mpz_nextprime function uses sieving and is much faster. | ||

+ | * Countless minor tweaks. | ||

+ | |||

+ | Niels Möller introduced the new gcd algorithm; it was based on an algorithm of Schönhage that was never published but was reverse engineered by Brad Lucier for Gambit (after some hints from Schönhage in e-mail) in January 2004. In 2005 Lucier told Möller (who was working on a Schönhage-type gcd algorithm for GMP) about Schönhage's algorithm, and Möller integrated it into GMP's HGCD framework for gcd calculations (and got two publications out of it, too!). Now that it is actually part of released GMP, GMP's bignum code beats Gambit's bignum code for all relevant bignum operations. |

## Revision as of 21:53, 18 June 2009

## Gambit benchmarks

Marc Feeley has put together a collection of benchmarks, called the Gambit benchmarks. The performance of Gambit is compared to a number of other Scheme implementations on that page.

## The "Programming Language Shootout" benchmarks

The Computer Language Benchmarks Game compares various computer languages by benchmarking similar algorithms to a number of small problems in a number of language implementations. Here are a number of Gambit implementations of the programs in the programming language shootout. Please feel free to propose improvements before they're submitted.

## Some simple operations on large bignums

In a talk Paul Zimmerman defined `a`, `b`, and `c` to be `(expt 3 2095903)`, `(expt 7 1183294)`, and `(expt 11 1920505)`, respectively, (presumably because `a` and `b` have roughly a million decimal digits, and `c` has roughly 2 million digits) and timed `(* a b)`, `(quotient c a)`, and `(integer-sqrt c)`, and `(gcd a b)`. With

heine:~/Desktop> gsi -v v4.4.4 20090618172941 x86_64-unknown-linux-gnu "./configure --enable-single-host --enable-multiple-versions"

running on the machine

Intel(R) Core(TM)2 Quad CPU Q8200 @ 2.33GHz

on Ubuntu 9.04 with gcc-4.3.3, we find the times

> (define d (time (* a b))) (time (* a b)) 183 ms real time 180 ms cpu time (172 user, 8 system) 3 collections accounting for 2 ms real time (0 user, 0 system) 26078656 bytes allocated 4313 minor faults no major faults > (define e (time (quotient c a))) (time (quotient c a)) 808 ms real time 808 ms cpu time (756 user, 52 system) 9 collections accounting for 11 ms real time (4 user, 8 system) 130659840 bytes allocated 19657 minor faults no major faults > (define f (time (integer-sqrt c))) (time (integer-sqrt c)) 805 ms real time 804 ms cpu time (796 user, 8 system) 21 collections accounting for 12 ms real time (12 user, 4 system) 160630136 bytes allocated 6131 minor faults no major faults > (define g (time (gcd a b))) (time (gcd a b)) 11353 ms real time 11353 ms cpu time (11325 user, 28 system) 810 collections accounting for 304 ms real time (284 user, 4 system) 5055362984 bytes allocated 10674 minor faults no major faults

With gmp-4.3.1 these same operations take 20ms, 88ms, 68ms, and 524ms, respectively.

When `a`, `b`, and `c` are replaced by `(expt 3 20959032)`, `(expt 7 11832946)`, and
`(expt 11 19205051)`, respectively, so `a` and `b` have about 10,000,000 decimal digits and `c` has about 20 million decimal digits, then the Gambit times are

> (define d (time (* a b))) (time (* a b)) 1579 ms real time 1572 ms cpu time (1452 user, 120 system) 3 collections accounting for 18 ms real time (0 user, 16 system) 209714208 bytes allocated 51308 minor faults no major faults > (define e (time (quotient c a))) (time (quotient c a)) 7132 ms real time 7124 ms cpu time (6772 user, 352 system) 11 collections accounting for 66 ms real time (8 user, 68 system) 1061819624 bytes allocated 161607 minor faults no major faults > (define f (time (integer-sqrt c))) (time (integer-sqrt c)) 7565 ms real time 7528 ms cpu time (7420 user, 108 system) 20 collections accounting for 34 ms real time (4 user, 24 system) 1291585184 bytes allocated 44122 minor faults no major faults > (define g (time (gcd a b))) (time (gcd a b)) 138545 ms real time 138441 ms cpu time (138149 user, 292 system) 1152 collections accounting for 542 ms real time (532 user, 20 system) 53377143992 bytes allocated 42407 minor faults no major faults

and the GMP 4.3.1 times are 26ms, 1544ms, 1268ms, and 9992ms, respectively.

So, GMP 4.3.1 is decisively faster (and we are no longer living in "Backwards World"!). Some of the changes that went into GMP 4.3.0 are

Speedups:

- Vastly improved assembly code for x86-64 processors from AMD and Intel.
- Major improvements also for many other processor families, such as Alpha, PowerPC, and Itanium.
- New sub-quadratic mpn_gcd and mpn_gcdext, as well as improved basecase gcd code.
- The multiply FFT code has been slightly improved.
- Balanced multiplication now uses 4-way Toom in addition to schoolbook, Karatsuba, 3-way Toom, and FFT.
- Unbalanced multiplication has been vastly improved.
- Improved schoolbook division by means of faster quotient approximation.
- Several new algorithms for division and mod by single limbs, giving many-fold speedups.
- Improved nth root computations.
- The mpz_nextprime function uses sieving and is much faster.
- Countless minor tweaks.

Niels Möller introduced the new gcd algorithm; it was based on an algorithm of Schönhage that was never published but was reverse engineered by Brad Lucier for Gambit (after some hints from Schönhage in e-mail) in January 2004. In 2005 Lucier told Möller (who was working on a Schönhage-type gcd algorithm for GMP) about Schönhage's algorithm, and Möller integrated it into GMP's HGCD framework for gcd calculations (and got two publications out of it, too!). Now that it is actually part of released GMP, GMP's bignum code beats Gambit's bignum code for all relevant bignum operations.