Actions

Difference between revisions of "Benchmarks"

From Gambit wiki

m (→‎Some simple operations on large bignums: Spell Paul Zimmermann's name correctly (sorry!))
(→‎Some simple operations on large bignums: If you want to benchmark Gambit, it helps not to use a version of gcc that is known not to optimize Gambit-generated code well)
 
Line 18: Line 18:
 
Intel(R) Core(TM)2 Quad  CPU  Q8200  @ 2.33GHz
 
Intel(R) Core(TM)2 Quad  CPU  Q8200  @ 2.33GHz
 
</pre>
 
</pre>
on Ubuntu 9.04 with gcc-4.3.3, we find the times
+
on Ubuntu 9.04 with FSF gcc-4.2.4 (and adding back -fmove-loop-invariants), we find the times
 
<pre>
 
<pre>
 
> (define d (time (* a b)))
 
> (define d (time (* a b)))
 
(time (* a b))
 
(time (* a b))
     183 ms real time
+
     157 ms real time
     180 ms cpu time (172 user, 8 system)
+
     156 ms cpu time (144 user, 12 system)
 
     3 collections accounting for 2 ms real time (0 user, 0 system)
 
     3 collections accounting for 2 ms real time (0 user, 0 system)
 
     26078656 bytes allocated
 
     26078656 bytes allocated
     4313 minor faults
+
     4314 minor faults
 
     no major faults
 
     no major faults
 
> (define e (time (quotient c a)))
 
> (define e (time (quotient c a)))
 
(time (quotient c a))
 
(time (quotient c a))
     808 ms real time
+
     710 ms real time
     808 ms cpu time (756 user, 52 system)
+
     704 ms cpu time (668 user, 36 system)
     9 collections accounting for 11 ms real time (4 user, 8 system)
+
     9 collections accounting for 11 ms real time (8 user, 0 system)
 
     130659840 bytes allocated
 
     130659840 bytes allocated
     19657 minor faults
+
     19660 minor faults
 
     no major faults
 
     no major faults
 
> (define f (time (integer-sqrt c)))
 
> (define f (time (integer-sqrt c)))
 
(time (integer-sqrt c))
 
(time (integer-sqrt c))
     805 ms real time
+
     698 ms real time
     804 ms cpu time (796 user, 8 system)
+
     692 ms cpu time (656 user, 36 system)
     21 collections accounting for 12 ms real time (12 user, 4 system)
+
     21 collections accounting for 12 ms real time (4 user, 8 system)
     160630136 bytes allocated
+
     160630392 bytes allocated
     6131 minor faults
+
     6332 minor faults
 
     no major faults
 
     no major faults
 
> (define g (time (gcd a b)))
 
> (define g (time (gcd a b)))
 
(time (gcd a b))
 
(time (gcd a b))
     11353 ms real time
+
     10167 ms real time
     11353 ms cpu time (11325 user, 28 system)
+
     10137 ms cpu time (10105 user, 32 system)
     810 collections accounting for 304 ms real time (284 user, 4 system)
+
     810 collections accounting for 302 ms real time (320 user, 8 system)
 
     5055362984 bytes allocated
 
     5055362984 bytes allocated
     10674 minor faults
+
     10450 minor faults
 
     no major faults
 
     no major faults
 
</pre>
 
</pre>
With GMP 4.2.4, these same operations take 44ms, 220ms, 168ms, and 6052ms, respectively; with GMP 4.3.1 they take 20ms, 88ms, 68ms, and 524ms, respectively.
+
With GMP 4.2.4 (compiled with the default system compiler gcc-4.3.3), these same operations take 44ms, 220ms, 168ms, and 6052ms, respectively; with GMP 4.3.1 they take 20ms, 88ms, 68ms, and 524ms, respectively.
  
 
When <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> are replaced by <tt>(expt 3 20959032)</tt>, <tt>(expt 7 11832946)</tt>, and
 
When <tt>a</tt>, <tt>b</tt>, and <tt>c</tt> are replaced by <tt>(expt 3 20959032)</tt>, <tt>(expt 7 11832946)</tt>, and
 
<tt>(expt 11 19205051)</tt>, respectively, so <tt>a</tt> and <tt>b</tt> have about 10,000,000 decimal digits and <tt>c</tt> has about 20 million decimal digits, then the Gambit times are
 
<tt>(expt 11 19205051)</tt>, respectively, so <tt>a</tt> and <tt>b</tt> have about 10,000,000 decimal digits and <tt>c</tt> has about 20 million decimal digits, then the Gambit times are
 
<pre>
 
<pre>
> (define d (time (* a b)))                
+
> (define d (time (* a b)))
 
(time (* a b))
 
(time (* a b))
     1579 ms real time
+
     1421 ms real time
     1572 ms cpu time (1452 user, 120 system)
+
     1416 ms cpu time (1288 user, 128 system)
 
     3 collections accounting for 18 ms real time (0 user, 16 system)
 
     3 collections accounting for 18 ms real time (0 user, 16 system)
 
     209714208 bytes allocated
 
     209714208 bytes allocated
     51308 minor faults
+
     51312 minor faults
 
     no major faults
 
     no major faults
> (define e (time (quotient c a)))          
+
> (define e (time (quotient c a)))
 
(time (quotient c a))
 
(time (quotient c a))
     7132 ms real time
+
     6300 ms real time
     7124 ms cpu time (6772 user, 352 system)
+
     6292 ms cpu time (5988 user, 304 system)
     11 collections accounting for 66 ms real time (8 user, 68 system)
+
     8 collections accounting for 48 ms real time (8 user, 48 system)
     1061819624 bytes allocated
+
     1061819368 bytes allocated
     161607 minor faults
+
     121357 minor faults
 
     no major faults
 
     no major faults
> (define f (time (integer-sqrt c)))        
+
> (define f (time (integer-sqrt c)))
 
(time (integer-sqrt c))
 
(time (integer-sqrt c))
     7565 ms real time
+
     6596 ms real time
     7528 ms cpu time (7420 user, 108 system)
+
     6596 ms cpu time (6536 user, 60 system)
     20 collections accounting for 34 ms real time (4 user, 24 system)
+
     22 collections accounting for 23 ms real time (0 user, 16 system)
     1291585184 bytes allocated
+
     1291581376 bytes allocated
     44122 minor faults
+
     22536 minor faults
 
     no major faults
 
     no major faults
> (define g (time (gcd a b)))              
+
> (define g (time (gcd a b)))
 
(time (gcd a b))
 
(time (gcd a b))
     138545 ms real time
+
     123296 ms real time
     138441 ms cpu time (138149 user, 292 system)
+
     123256 ms cpu time (123156 user, 100 system)
     1152 collections accounting for 542 ms real time (532 user, 20 system)
+
     1152 collections accounting for 529 ms real time (492 user, 16 system)
     53377143992 bytes allocated
+
     53377146424 bytes allocated
     42407 minor faults
+
     37776 minor faults
 
     no major faults
 
     no major faults
 
</pre>
 
</pre>

Latest revision as of 18:03, 20 June 2009

Gambit benchmarks

Marc Feeley has put together a collection of benchmarks, called the Gambit benchmarks. The performance of Gambit is compared to a number of other Scheme implementations on that page.

The "Programming Language Shootout" benchmarks

The Computer Language Benchmarks Game compares various computer languages by benchmarking similar algorithms to a number of small problems in a number of language implementations. Here are a number of Gambit implementations of the programs in the programming language shootout. Please feel free to propose improvements before they're submitted.

Some simple operations on large bignums

In a talk Paul Zimmermann defined a, b, and c to be (expt 3 2095903), (expt 7 1183294), and (expt 11 1920505), respectively, (presumably because a and b have roughly a million decimal digits, and c has roughly 2 million digits) and timed (* a b), (quotient c a), and (integer-sqrt c), and (gcd a b). With

heine:~/Desktop> gsi -v
v4.4.4 20090618172941 x86_64-unknown-linux-gnu "./configure --enable-single-host --enable-multiple-versions"

running on the machine

Intel(R) Core(TM)2 Quad  CPU   Q8200  @ 2.33GHz

on Ubuntu 9.04 with FSF gcc-4.2.4 (and adding back -fmove-loop-invariants), we find the times

> (define d (time (* a b)))
(time (* a b))
    157 ms real time
    156 ms cpu time (144 user, 12 system)
    3 collections accounting for 2 ms real time (0 user, 0 system)
    26078656 bytes allocated
    4314 minor faults
    no major faults
> (define e (time (quotient c a)))
(time (quotient c a))
    710 ms real time
    704 ms cpu time (668 user, 36 system)
    9 collections accounting for 11 ms real time (8 user, 0 system)
    130659840 bytes allocated
    19660 minor faults
    no major faults
> (define f (time (integer-sqrt c)))
(time (integer-sqrt c))
    698 ms real time
    692 ms cpu time (656 user, 36 system)
    21 collections accounting for 12 ms real time (4 user, 8 system)
    160630392 bytes allocated
    6332 minor faults
    no major faults
> (define g (time (gcd a b)))
(time (gcd a b))
    10167 ms real time
    10137 ms cpu time (10105 user, 32 system)
    810 collections accounting for 302 ms real time (320 user, 8 system)
    5055362984 bytes allocated
    10450 minor faults
    no major faults

With GMP 4.2.4 (compiled with the default system compiler gcc-4.3.3), these same operations take 44ms, 220ms, 168ms, and 6052ms, respectively; with GMP 4.3.1 they take 20ms, 88ms, 68ms, and 524ms, respectively.

When a, b, and c are replaced by (expt 3 20959032), (expt 7 11832946), and (expt 11 19205051), respectively, so a and b have about 10,000,000 decimal digits and c has about 20 million decimal digits, then the Gambit times are

> (define d (time (* a b)))
(time (* a b))
    1421 ms real time
    1416 ms cpu time (1288 user, 128 system)
    3 collections accounting for 18 ms real time (0 user, 16 system)
    209714208 bytes allocated
    51312 minor faults
    no major faults
> (define e (time (quotient c a)))
(time (quotient c a))
    6300 ms real time
    6292 ms cpu time (5988 user, 304 system)
    8 collections accounting for 48 ms real time (8 user, 48 system)
    1061819368 bytes allocated
    121357 minor faults
    no major faults
> (define f (time (integer-sqrt c)))
(time (integer-sqrt c))
    6596 ms real time
    6596 ms cpu time (6536 user, 60 system)
    22 collections accounting for 23 ms real time (0 user, 16 system)
    1291581376 bytes allocated
    22536 minor faults
    no major faults
> (define g (time (gcd a b)))
(time (gcd a b))
    123296 ms real time
    123256 ms cpu time (123156 user, 100 system)
    1152 collections accounting for 529 ms real time (492 user, 16 system)
    53377146424 bytes allocated
    37776 minor faults
    no major faults

The GMP 4.2.4 times are 680ms, 3852ms, 3180ms, and 1008696ms, respectively; the GMP 4.3.1 times are 260ms, 1544ms, 1268ms, and 9992ms, respectively.

So, after the release of GMP 4.3.0 in April, 2009, GMP is decisively faster than Gambit (and Gambit is no longer a reproof to the permanent claim on GMP's home page that GMP is the "fastest bignum library on the planet!"). Some of the changes that went into GMP 4.3.0 are

Speedups:

  • Vastly improved assembly code for x86-64 processors from AMD and Intel.
  • Major improvements also for many other processor families, such as Alpha, PowerPC, and Itanium.
  • New sub-quadratic mpn_gcd and mpn_gcdext, as well as improved basecase gcd code.
  • The multiply FFT code has been slightly improved.
  • Balanced multiplication now uses 4-way Toom in addition to schoolbook, Karatsuba, 3-way Toom, and FFT.
  • Unbalanced multiplication has been vastly improved.
  • Improved schoolbook division by means of faster quotient approximation.
  • Several new algorithms for division and mod by single limbs, giving many-fold speedups.
  • Improved nth root computations.
  • The mpz_nextprime function uses sieving and is much faster.
  • Countless minor tweaks.

Niels Möller introduced the new gcd algorithm; it was based on an algorithm of Schönhage that was never published but was reverse engineered by Brad Lucier for Gambit (after some hints from Schönhage in e-mail) in January 2004. In 2005 Lucier told Möller (who was working on a Schönhage-type gcd algorithm for GMP) about Schönhage's algorithm, and Möller integrated it into GMP's HGCD framework for gcd calculations (and got two publications out of it, too!). Now that it is actually part of released GMP, GMP's bignum code beats Gambit's bignum code for all relevant bignum operations.