Arduino generating prime numbers

Another run (I'm looking for an optimal number)

100000000003 is prime. Took 4765 mS. Found 5 primes.
100000000019 is prime. Took 4909 mS. Found 6 primes.
100000000057 is prime. Took 5988 mS. Found 7 primes.
100000000063 is prime. Took 4776 mS. Found 8 primes.
100000000069 is prime. Took 4775 mS. Found 9 primes.
100000000073 is prime. Took 4775 mS. Found 10 primes.
100000000091 is prime. Took 4777 mS. Found 11 primes.
100000000103 is prime. Took 4786 mS. Found 12 primes.
100000000129 is prime. Took 4832 mS. Found 13 primes.
100000000171 is prime. Took 14435 mS. Found 14 primes.
100000000183 is prime. Took 4874 mS. Found 15 primes.
100000000193 is prime. Took 4780 mS. Found 16 primes.
100000000211 is prime. Took 4780 mS. Found 17 primes.
100000000223 is prime. Took 4776 mS. Found 18 primes.
100000000237 is prime. Took 4778 mS. Found 19 primes.
100000000253 is prime. Took 4803 mS. Found 20 primes.

Think this really starts to look what Nick is searching for.
TBC

next run

100000000003 is prime. Took 4412 mS. Found 5 primes.
100000000019 is prime. Took 4556 mS. Found 6 primes.
100000000057 is prime. Took 5635 mS. Found 7 primes.
100000000063 is prime. Took 4422 mS. Found 8 primes.
100000000069 is prime. Took 4423 mS. Found 9 primes.
100000000073 is prime. Took 4423 mS. Found 10 primes.
100000000091 is prime. Took 4424 mS. Found 11 primes.
100000000103 is prime. Took 4432 mS. Found 12 primes.
100000000129 is prime. Took 4480 mS. Found 13 primes.
100000000171 is prime. Took 13372 mS. Found 14 primes.
100000000183 is prime. Took 4521 mS. Found 15 primes.
100000000193 is prime. Took 4427 mS. Found 16 primes.
100000000211 is prime. Took 4428 mS. Found 17 primes.
100000000223 is prime. Took 4425 mS. Found 18 primes.
100000000237 is prime. Took 4425 mS. Found 19 primes.
100000000253 is prime. Took 4448 mS. Found 20 primes.

OK the steps become smaller

automatic search of the optimum is slightly slower

100000000003 is prime. Took 4423 mS. Found 5 primes.
100000000019 is prime. Took 4589 mS. Found 6 primes.
100000000057 is prime. Took 5689 mS. Found 7 primes.
100000000063 is prime. Took 4434 mS. Found 8 primes.
100000000069 is prime. Took 4434 mS. Found 9 primes.
100000000073 is prime. Took 4433 mS. Found 10 primes.
100000000091 is prime. Took 4435 mS. Found 11 primes.
100000000103 is prime. Took 4454 mS. Found 12 primes.
100000000129 is prime. Took 4536 mS. Found 13 primes.
100000000171 is prime. Took 13449 mS. Found 14 primes.
100000000183 is prime. Took 4543 mS. Found 15 primes.
100000000193 is prime. Took 4462 mS. Found 16 primes.
100000000211 is prime. Took 4449 mS. Found 17 primes.
100000000223 is prime. Took 4446 mS. Found 18 primes.
100000000237 is prime. Took 4448 mS. Found 19 primes.
100000000253 is prime. Took 4482 mS. Found 20 primes.

looking for other tweaks...

Sorry odometer, I had to reintroduce some divisions again :wink:

100000000003 is prime. Took 3879 mS. Found 5 primes.
100000000019 is prime. Took 4046 mS. Found 6 primes.
100000000057 is prime. Took 5147 mS. Found 7 primes.
100000000063 is prime. Took 3891 mS. Found 8 primes.
100000000069 is prime. Took 3890 mS. Found 9 primes.
100000000073 is prime. Took 3890 mS. Found 10 primes.
100000000091 is prime. Took 3893 mS. Found 11 primes.
100000000103 is prime. Took 3910 mS. Found 12 primes.
100000000129 is prime. Took 3992 mS. Found 13 primes.
100000000171 is prime. Took 12067 mS. Found 14 primes.
100000000183 is prime. Took 3999 mS. Found 15 primes.
100000000193 is prime. Took 3917 mS. Found 16 primes.
100000000211 is prime. Took 3906 mS. Found 17 primes.
100000000223 is prime. Took 3903 mS. Found 18 primes.
100000000237 is prime. Took 3904 mS. Found 19 primes.
100000000253 is prime. Took 3939 mS. Found 20 primes.

Average below 5 seconds.

final tweak for now, then backup and clean the code

100000000003 is prime. Took 3617 mS. Found 5 primes.
100000000019 is prime. Took 3783 mS. Found 6 primes.
100000000057 is prime. Took 4884 mS. Found 7 primes.
100000000063 is prime. Took 3628 mS. Found 8 primes.
100000000069 is prime. Took 3628 mS. Found 9 primes.
100000000073 is prime. Took 3629 mS. Found 10 primes.
100000000091 is prime. Took 3629 mS. Found 11 primes.
100000000103 is prime. Took 3647 mS. Found 12 primes.
100000000129 is prime. Took 3729 mS. Found 13 primes.
100000000171 is prime. Took 11431 mS. Found 14 primes.
100000000183 is prime. Took 3737 mS. Found 15 primes.
100000000193 is prime. Took 3655 mS. Found 16 primes.
100000000211 is prime. Took 3643 mS. Found 17 primes.
100000000223 is prime. Took 3640 mS. Found 18 primes.
100000000237 is prime. Took 3641 mS. Found 19 primes.
100000000253 is prime. Took 3676 mS. Found 20 primes.

not bad :wink:
the trick applied is to remove 64bit math and divisions for a range where 32bit math is sufficient.
(both quotient and remainder are 32 bit)

There is still room for optimizations in the code but first a refactor is needed.
I'll post asap.

Neat project Nick. Here is a photo of my version. You can't see the numbers too clearly in the picture. This is a version with an SCDQ5884 8x5x5 display.

primes.jpg

Got a speed improvement of an average factor of about 2.3 over the Nick's BigNumber-based code shown in reply #43, starting at 1000000000000001, 16 digits, by making these changes:

  • Added a trial division test using byte-sized primes, before other testing.
  • Modified the Miller-Rabin test to use 2, 3, 5, 7, 11, 13, 17, 19, and 23, rather than random numbers. This Wikipedia link - Strong pseudoprime - Wikipedia - says that there's no number less than 3825123056546413051, 19 digits, about 1.6 x 261, that will fool every single-pass Miller-Rabin tests with these bases.
  • [Edit: Add this] Modified loop parameters to wield only integers = 6k + 1 and 6k - 1, for integer k.

Most of the speed improvement appears to come from eliminating Miller-Rabin tests for numbers that divide by small primes. That suggests that there may be a sweet spot for the number of prime divisors in the trial division test that depends on the speed of the Miller-Rabin test, and it may be different from the one I chose. Lesser improvement comes from eliminating selection of random BigNumbers, and reducing the number of Miller-Rabin tests from ten to nine. [Edit: Add this] The 6K +- 1 trick doesn't do much, since it only eliminates multiples of 3, and those fail the trial division test almost immediately.

Average displayed time was about 11.5 mS for the primes shown in replies #44 and #45. It also ran about 3.5% slower than the values shown in reply #64 - calculation times were sometimes wildly different, but on average, they were almost as good.

Here's the code, based on the sketch in reply #43, minimally changed, auto-formatted:

#include <BigNumber.h>
PROGMEM prog_uchar primes[] = {
  2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 
  37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 
  83, 89, 97,101,103,107,109,113,127,131,137,
  139,149,151,157,163,167,173,179,181,191,193,
  197,199,211,223,227,229,233,239,241,251
};
const byte nPrimes = sizeof(primes)/sizeof(primes[0]);

const char FIRST_PRIME_TO_USE [] =  "1000000000000001";

BigNumber candidate;
BigNumber one (1);
BigNumber two (2);

byte bump = 2;

void setup() 
{
  BigNumber five (5);
  BigNumber six (6);

  Serial.begin(115200); 
  while (!Serial) { 
  }

  BigNumber::begin ();  

  Serial.println ();
  Serial.println ();
  Serial.println ("Starting.");

  candidate = BigNumber (FIRST_PRIME_TO_USE);

  BigNumber rem = candidate % six;
  while ((rem != one) && (rem != five)) {
    rem = rem + one;
    rem = rem % six;
    candidate = candidate + one;
  }
  if (rem == one) {
    bump = 4;
  }
}  // end of setup

void loop() 
{

  for ( ; candidate < BigNumber ("9999999999999999"); candidate += bump, bump = 6 - bump)
    showIfPrime (candidate);

  candidate = 3;  // back to start!
}  // end of loop

void rng (BigNumber & result)
{
  result = BigNumber (random ());  
}  // end of rng

bool Miller(BigNumber source)
{

  if(source == 2 || source == 3)
    return true;
  if(source < two || source % two == 0)
    return false;

  BigNumber d = source - one;
  int s = 0;

  while(d % two == 0)
  {
    d /= two;
    s += one;
  }

  const byte bases[] = { 2, 3, 5, 7, 11, 13, 17, 19, 23 };
  const byte nbases = sizeof(bases)/sizeof(bases[0]);
  for(int i = 0; i < nbases; i++)
  {
    //    do
    //    {
    //      rng (a);
    //    } 
    //    while(a < two || a >= source - two);

    BigNumber a = BigNumber (bases[i]);
    BigNumber x = a.powMod (d, source);
    if(x == one || x == source - one)
      continue;

    for(int r = 1; r < s; r++)
    {
      x = x.powMod(two, source);
      if(x == one)
        return false;
      if(x == source - one)
        break;
    }

    if(x != source - one)
      return false;
  }

  return true;
}  // end of Miller

const byte DIGITS = 16;

unsigned long start;
unsigned long lastShown;
long found = 1; // Number we found

void showIfPrime (BigNumber num) 
{
  if (!check(num))
    return;
  if (!Miller (num))
    return;

  char buf [20] = { 0 };
  long long temp = num;
  byte pos = 0;

  // make printable
  for (pos = 0; pos < DIGITS; pos++)
  {
    if (temp == 0)
      break;
    char digit = temp % 10;
    buf [DIGITS - 1 - pos] = digit | '0';
    temp /= 10;
  }

  // insert leading spaces
  for (; pos < DIGITS; pos++)
    buf [DIGITS - 1 - pos] = ' '; //  = 15;


  Serial.print(num);
  Serial.print(" is prime. Took ");
  Serial.print(millis () - start);
  Serial.print (" mS. Found ");
  Serial.print (found);
  Serial.println (" primes.");
  start = millis ();

  found++;

  lastShown = millis ();

}  // end of showIfPrime

bool check(BigNumber n) {
  for (byte i = 0;i < nPrimes; i++) {
    BigNumber f = BigNumber(pgm_read_byte_near(primes + i));
    if ((n % f) == 0) {
      return false;
    }
  }
  return true;
}

After some cleanup and refactoring, here the code - output see post #64

The new added trick works as follows.
If you know x/n and x%n you can derive x/(n+2) and x%(n+2) // or x/(n+1) if you want to.

First
find the range [n.. sqrt(x)+1] for which the following is true
-- x = 64 bit
-- sqrt(x) = 32 bit (by definition)
-- n = 32 bit
-- x/n = 32 bit.
For all odd numbers x/n and x%n are 32 bit, so for this range we can remove 64 bit math completely.

Second
given quotient = x/n and mod = x%n
then we know that x/(n+2) is smaller thanquotient
and that x%(n+2) is smaller than (n+2) AND smaller than quotient*2
with a simple while loop the next quotient and mod can be found
with one addition, one subtractions, one division by 2 (which is a shift) and a compare.

Third, to initialize the loop one division and one % is needed

(it is easier explained in a drawing I think)

        cnum--; // initial value should be ODD!!!
        uint32_t quotient = num / cnum;
        uint32_t mod = num - quotient*cnum;  //  %
        uint32_t prev = cnum;
        
        // test next cnum
        cnum += 2;
        while(cnum < upper)
        {
          // find next quot & mod
          while (mod/2 < quotient)
          {
            quotient--;
            mod += prev;
          }
          mod -= (quotient*2);
          // check
          if (mod == 0) return;

          // next cnum
          prev = cnum;
          cnum += 2;
        }

This trick tests all odd numbers as dividers and the nice part is that it becomes faster and faster when n approaches sqrt(n).

an example in numbers
suppose x= 29
start with n = 1
x/1 x%1 [q=29, m =0]
x/3 x%3 [q=9, m = 2 ] // while loops 20 times!
x/5 x%5 [q=5, m = 4] // while loops 4 times
x/7 x%7 [q =4, m = 1] // while loops once
!!beyond sqrt(x) so stop!!
it is clear that the # iterations of the while loop decreases fast, so every next mod is found faster than the previous.
one need to find the point where the number of iterations is as expensive as the original %.
this point is around sqrt(x)/5;

complete code is attached

primeLongLong.ino (7.78 KB)

@tmd3
Interesting variation and simpler code than mine (as I use 3 techniques),
wondering if my trick works well with bigNum . (maybe next week:)

One last tweak. This one includes a list of numbers that fool the Miller-Rabin test for bases 2, 3, 5, and 7. It compares each odd number to the next item in the list, and skips it if it finds them equal. The previous version, in reply #66, uses nine bases, as required to eliminate pseudoprimes below 1016. This one uses four and a table lookup, so it's a lot faster. It kind of feels like cheating, though - Nick's versions knew that 2 an 3 are prime, while this program has a lot of arcane knowledge about special numbers.

I was surprised to see BigNumber work so quickly compared to native number formats, as I'd expect it to carry substantial overhead. I suspect that its progenitors were developed, at least in part, for dealing with calculations of primes, and that may account for its fitness for this use.

My thanks to Nick for posting this project, and for coming up with the BigNumber library. Hunting for primes is easy; speeding up the process is really tricky. And, I learned how to spell, "pseudo-," I think, so my ability to eruditely describe things that aren't quite what they seem is much improved.

A list of strong psuedo fake primes to bases 2 and 3 is here - http://www.bell-labs.com/user/bleichen/diss/thesis.html. The CPAN module Math::Primality provided the engine for extracting the pseudoprimes that penetrate tests for bases 5 and 7. The included file containing the byte-wide bits of the list is attached - "Pseudoprimes.h".

The code has its oddities. Because there seems to be no way to store or access uint64_t values with PROGMEM, and because I couldn't get BigNumber to play nice with variables bigger than one byte, pseudoprimes are stored as 8 bytes, big-endian, and decoded into BigNumbers. Here's the sketch:

#include <BigNumber.h>
PROGMEM prog_uchar primes[] = {
    2,  3,  5,  7, 11, 13, 17, 19, 23, 29, 31, 
   37, 41, 43, 47, 53, 59, 61, 67, 71, 73, 79, 
   83, 89, 97,101,103,107,109,113,127,131,137,
  139,149,151,157,163,167,173,179,181,191,193,
  197,199,211,223,227,229,233,239,241,251
};
const byte nPrimes = sizeof(primes)/sizeof(primes[0]);

#include "Pseudoprimes.h";
const uint16_t nPSP = (sizeof(psps)/sizeof(psps[0])) >> 3;
  
BigNumber candidate;
BigNumber one (1);
BigNumber two (2);

unsigned long start;
long found = 0; // Number of primes found

void setup() 
{
  Serial.begin(115200); 
  Serial.println ("Starting.");

  BigNumber::begin ();  

  candidate = BigNumber ("100000000003");
}  // end of setup

void loop() 
{
  BigNumber nextPSP;
  int nextPSPindex;
  for (int i=0;i<nPSP;i++) {
    nextPSP = fetchPSP(i);
    if ((nextPSP == candidate) || (nextPSP > candidate)) {
      nextPSPindex = i;
      break;
    }
    nextPSP = 0;
    nextPSPindex = 0;
  }

  start = millis();
  
  for ( ; candidate < BigNumber("9999999999999999"); candidate += 2) {
    if (nextPSP != candidate) {
      showIfPrime (candidate);
    }
    else {
      nextPSPindex++;
      if (nextPSPindex < nPSP) {
        nextPSP = fetchPSP(nextPSPindex);
      }
    }
  }
  candidate = 3;  // back to start!
}  // end of loop

bool Miller(BigNumber source)
{

  if(source == 2 || source == 3)
    return true;
  if(source < two || source % two == 0)
    return false;

  BigNumber d = source - one;
  int s = 0;

  while(d % two == 0)
  {
    d /= two;
    s += one;
  }

  BigNumber a;
  const byte bases[] = {2, 3, 5, 7,};
  const byte nbases = sizeof(bases)/sizeof(bases[0]);
  for(int i = 0; i < nbases; i++)
  {
    a = BigNumber (bases[i]);
    BigNumber x = a.powMod (d, source);
    if(x == one || x == source - one)
      continue;

    bool stillOK = false;
    for(int r = 1; r < s; r++)
    {
      x = x.powMod(two, source);
      if(x == one)
        return false;
      if(x == source - one) {
        stillOK = true;
        break;
      }
    }

    if(!stillOK)
      return false;
  }

  return true;
}  // end of Miller

void showIfPrime (BigNumber n) 
{
  if (!check(n))
    return;
  if (!Miller (n))
    return;
  found++;
  Serial.print(n);
  Serial.print(" is prime. Took ");
  Serial.print(millis () - start);
  Serial.print (" mS. Found ");
  Serial.print (found);
  Serial.println (" primes.");
  start = millis ();

}  // end of showIfPrime

bool check(BigNumber n) {
  BigNumber f;
  for (byte i = 0;i < nPrimes; i++) {
    f = BigNumber(pgm_read_byte_near(primes+i));
    if ((n % f) == 0) {
      return false;
    }
  }
  return true;
}

BigNumber fetchPSP(int i) {
  // Progmem array psps contains strong pseudoprimes
  // 8-byte numbers, arranged as bytes, big-endian
  i <<= 3;
  BigNumber x256 ("256");
  BigNumber n = 0;
  for (byte k=0;k<8;k++) {
    n = n * x256;
    uint8_t b = pgm_read_byte_near(psps+i+k);
    n = n + BigNumber(b);
  }
  return n;
}

Here's some output:

100000000003 is prime. Took 1676 mS. Found 1 primes.
100000000019 is prime. Took 2798 mS. Found 2 primes.
100000000057 is prime. Took 3539 mS. Found 3 primes.
100000000063 is prime. Took 1770 mS. Found 4 primes.

Here's the output for a Due running the same code:

100000000003 is prime. Took 86 mS. Found 1 primes.
100000000019 is prime. Took 145 mS. Found 2 primes.
100000000057 is prime. Took 189 mS. Found 3 primes.
100000000063 is prime. Took 89 mS. Found 4 primes.

[Edit: Add this] And, the Due in the 16-digit zone:

1000000000000037 is prime. Took 416 mS. Found 1 primes.
1000000000000091 is prime. Took 236 mS. Found 2 primes.
1000000000000159 is prime. Took 529 mS. Found 3 primes.
1000000000000187 is prime. Took 261 mS. Found 4 primes.

Pseudoprimes.h (28.4 KB)

My apologies for not responding to some of the recent posts. I rely on the forum email notification which can be ... unreliable.

This looks pretty cool, I should fire up my Due and see how it goes!

Here are my test results. The "my sketch" below refers to:

#include <BigNumber.h>

const char FIRST_PRIME_TO_USE [] =  "1000000000054141";

BigNumber candidate;
BigNumber one (1);
BigNumber two (2);

void setup() {
  Serial.begin(115200); 

  while (!Serial) { 
  }

  BigNumber::begin ();  

  Serial.println ();
  Serial.println ();
  Serial.println ("Starting.");

  candidate = BigNumber (FIRST_PRIME_TO_USE);
  if (candidate % two == 0)
    candidate++;
}

void loop() 
{

  for ( ; candidate < BigNumber ("9999999999999999"); candidate += 2)
    showIfPrime (candidate);

  candidate = 3;  // back to start!
}  // end of loop

void rng (BigNumber & result)
{
  result = BigNumber (random ());  
}

bool Miller(BigNumber source, int certainty)
{

  // these tests not needed if we start > 3 and use odd numbers only
  /*
  if(source == 2 || source == 3)
    return true;
  if(source < two || source % two == 0)
    return false;
  */
  
  BigNumber d = source - one;
  int s = 0;

  while(d % two == 0)
  {
    d /= two;
    s += one;
  }

  BigNumber a;

  for(int i = 0; i < certainty; i++)
  {
    do
    {
      rng (a);
    } 
    while(a < two || a >= source - two);

    BigNumber x = a.powMod (d, source);
    if(x == one || x == source - one)
      continue;

    for(int r = 1; r < s; r++)
    {
      x = x.powMod(two, source);
      if(x == one)
        return false;
      if(x == source - one)
        break;
    }

    if(x != source - one)
      return false;
  }

  return true;
}  // end of Miller

unsigned long start;
unsigned long lastShown;
long found = 1; // Number we found

void showIfPrime (BigNumber num) 
{

  if (!Miller (num, 10))
    return;

  Serial.print(num);
  Serial.print(" is prime. Took ");
  Serial.print(millis () - start);
  Serial.print (" mS. Found ");
  Serial.print (found);
  Serial.println (" primes.");
  start = millis ();

  found++;
  lastShown = millis ();

}  // end of showIfPrime

I was fooling around with faster clock rates, so I had things set up to try with 25 MHz on an Atmega328P. That gave a bit of a boost to the original algorithm.

My sketch at 16 MHz

1000000000054141 is prime. Took 9463 mS. Found 1 primes.
1000000000054147 is prime. Took 11085 mS. Found 2 primes.
1000000000054259 is prime. Took 61677 mS. Found 3 primes.

My sketch at 25 MHz

1000000000054141 is prime. Took 5988 mS. Found 1 primes.
1000000000054147 is prime. Took 7014 mS. Found 2 primes.
1000000000054259 is prime. Took 39025 mS. Found 3 primes.

tmd3 sketch at 16 MHz

1000000000054141 is prime. Took 3692 mS. Found 1 primes.
1000000000054147 is prime. Took 3697 mS. Found 2 primes.
1000000000054259 is prime. Took 8454 mS. Found 3 primes.

tmd3 sketch at 25 MHz

1000000000054141 is prime. Took 2336 mS. Found 1 primes.
1000000000054147 is prime. Took 2339 mS. Found 2 primes.
1000000000054259 is prime. Took 5349 mS. Found 3 primes.

My sketch on a Due:

1000000000054141 is prime. Took 456 mS. Found 1 primes.
1000000000054147 is prime. Took 509 mS. Found 2 primes.
1000000000054259 is prime. Took 2702 mS. Found 3 primes.

tmd3 sketch on a Due:

1000000000054141 is prime. Took 191 mS. Found 1 primes.
1000000000054147 is prime. Took 191 mS. Found 2 primes.
1000000000054259 is prime. Took 433 mS. Found 3 primes.

The starting prime was chosen because I was wanting to try it out on a 16-digit display.

can you try my modified version on the Due?

tmd3:
I was surprised to see BigNumber work so quickly compared to native number formats, as I'd expect it to carry substantial overhead.

I don't understand why you seem to think that BigNumber is so quick.
I made some minor modifications to it which increased the speed significantly.
See: Arbitrary precision (big number) library port for Arduino - #69 by odometer - Programming Questions - Arduino Forum

robtillaart:
can you try my modified version on the Due?

Your sketch above modified to remove the LED stuff:

// Prime number generator using long long
//
// Author: Nick Gammon
// Date:   22nd October 2013

//
// Optimized by: Rob Tillaart
// .. 5th November 2013
//

#include <limits.h>

const unsigned long SHOW_EVERY = 1;  // how often to echo a prime to the serial port (and update EEPROM)
const unsigned long TIME_TO_WAIT = 500;  // mS
const long long FIRST_PRIME_TO_USE = 1000000000054141LL;   // was 99999999999LL

long long candidate = 3;
long found = 0; // Number we found

void setup() {
  Serial.begin(115200); 

  while (!Serial) { 
  }

  Serial.println ('.');
  Serial.println ();
  candidate = FIRST_PRIME_TO_USE;
 }

unsigned long start;
unsigned long lastShown;

void loop() 
{
  for ( ; candidate < 9999999999999999LL; candidate += 2)
  {
    showIfPrime(candidate);
  }

  candidate = 3;  // back to start!
}  // end of loop

void showIfPrime(uint64_t num) 
{
  if (fastModU64(num, 3) == 0) return;
  if (fastModU64(num, 5) == 0) return;
  if (fastModU64(num, 7) == 0) return;
  if (fastModU64(num, 11) == 0) return;
  if (fastModU64(num, 13) == 0) return;
  if (fastModU64(num, 17) == 0) return;
  if (fastModU64(num, 19) == 0) return;
  if (fastModU64(num, 23) == 0) return;
  if (fastModU64(num, 29) == 0) return;

  uint32_t upper = sqrt(num) + 6;

  // determine 32bit only math range;
  uint32_t mid = 1; 
  while (upper * mid < 500000000) mid++;  // 5E8 works bit better than 2E9
  mid = upper/mid;

  // 30k + i method
  // starting cnum is critical!!
  for (uint32_t cnum = 30; cnum <= upper; cnum += 30)
  {
    // this part is optimized % 
    // ~5 times faster, but only for a small range
    if (cnum < 0xFFE0) 
    {
      if (fastModU64(num, cnum+1) == 0) return;
      if (fastModU64(num, cnum+7) == 0) return;
      if (fastModU64(num, cnum+11) == 0) return;
      if (fastModU64(num, cnum+13) == 0) return;
      if (fastModU64(num, cnum+17) == 0) return;
      if (fastModU64(num, cnum+19) == 0) return;
      if (fastModU64(num, cnum+23) == 0) return;
      if (fastModU64(num, cnum+29) == 0) return;
    }
    else
    {
      // this part needs 64 bit math
      // TODO investigate optimizations possible
      // similar to below.
      if (cnum < mid) 
      {
        if (num % (cnum + 1) == 0) return; 
        if (num % (cnum + 7) == 0) return; 
        if (num % (cnum + 11) == 0) return; 
        if (num % (cnum + 13) == 0) return; 
        if (num % (cnum + 17) == 0) return; 
        if (num % (cnum + 19) == 0) return; 
        if (num % (cnum + 23) == 0) return; 
        if (num % (cnum + 29) == 0) return; 
      }
      else 
      {
        // new algorithm.
        // from here only 32bit math is needed. // no 30k+i algo!
        // as cnum and quotient are both 32 bit.
        // while loop uses the previous value, quotient and mod
        // to calculate the next quotient and mod.

        // initialize vars before loop
        cnum--; // initial value should be ODD!!!
        uint32_t quotient = num / cnum;
        uint32_t mod = num - quotient*cnum;
        uint32_t prev = cnum;
        
        // test next cnum
        cnum += 2;
        while(cnum < upper)
        {
          // find next quot & mod
          while (mod/2 < quotient)
          {
            quotient--;
            mod += prev;
          }
          mod -= (quotient*2);
          // check
          if (mod == 0) return;

          // next cnum
          prev = cnum;
          cnum += 2;
        } 
      }

    } // if (cnum < mid)
  }

  // end of checking

  char buf [20] = { 0 };
  long long temp = candidate;
  byte pos = 0;

  // make printable
  for (pos = 0; pos < 16; pos++)
  {
    if (temp == 0)
      break;
    char digit = temp % 10;
    buf [15 - pos] = digit | '0';
    temp /= 10;
  }

  // insert leading spaces
  for (; pos < 16; pos++)
    buf [15 - pos] = 15;

  found++;
  Serial.print(buf);
  Serial.print(" is prime. Took ");
  Serial.print(millis () - start);
  Serial.print (" mS. Found ");
  Serial.print (found);
  Serial.println (" primes.");
  start = millis ();
}  // end of showIfPrime

//
// fastest version so far
//
uint32_t fastModU64(uint64_t num, uint16_t cnum)
{
  //if (cnum < 0xFF) return fastModU64_8(num, cnum);

  union 
  {
    uint64_t ll;
    struct
    {
      uint32_t lb;
      uint32_t lh;
    } 
    a;
    struct
    {
      uint16_t ib;
      uint32_t lm;
      uint16_t ih;
    } 
    b;
  } 
  tt;

  tt.ll = num;
  uint16_t t = cnum;
  if (tt.b.ih != 0) tt.a.lh %= t;
  if (tt.a.lh != 0) tt.b.lm %= t;
  if (tt.a.lb >= t) tt.a.lb %= t;
  return tt.a.lb;
}

And changed to start with the same numbers above:

1000000000054141 is prime. Took 38308 mS. Found 1 primes.
1000000000054283 is prime. Took 57396 mS. Found 2 primes.
1000000000054307 is prime. Took 43718 mS. Found 3 primes.

Slower, and as you can see it is skipping some primes.

Starting from the number you had in the sketch:

100000000039 is prime. Took 257 mS. Found 1 primes.
100000000069 is prime. Took 126 mS. Found 2 primes.
100000000077 is prime. Took 122 mS. Found 3 primes.
100000000089 is prime. Took 125 mS. Found 4 primes.
100000000111 is prime. Took 125 mS. Found 5 primes.
100000000117 is prime. Took 123 mS. Found 6 primes.
100000000161 is prime. Took 259 mS. Found 7 primes.
100000000165 is prime. Took 126 mS. Found 8 primes.
100000000389 is prime. Took 252 mS. Found 9 primes.
100000000399 is prime. Took 123 mS. Found 10 primes.
100000000407 is prime. Took 129 mS. Found 11 primes.
100000000425 is prime. Took 131 mS. Found 12 primes.
100000000429 is prime. Took 123 mS. Found 13 primes.
100000000441 is prime. Took 123 mS. Found 14 primes.
100000000471 is prime. Took 155 mS. Found 15 primes.
100000000519 is prime. Took 228 mS. Found 16 primes.
100000000557 is prime. Took 194 mS. Found 17 primes.
100000000641 is prime. Took 211 mS. Found 18 primes.
100000000659 is prime. Took 124 mS. Found 19 primes.
100000000699 is prime. Took 125 mS. Found 20 primes.
100000000761 is prime. Took 242 mS. Found 21 primes.
100000000767 is prime. Took 126 mS. Found 22 primes.
100000000807 is prime. Took 139 mS. Found 23 primes.
100000000839 is prime. Took 157 mS. Found 24 primes.
100000000845 is prime. Took 122 mS. Found 25 primes.
100000000855 is prime. Took 126 mS. Found 26 primes.
100000000881 is prime. Took 124 mS. Found 27 primes.

yes it can loose primes if the starting number does not meet some criteria, the loops are optimized for specific starting numbers. like in the 6k+-1 method.
I'll check if there is an easy fix.

update: on 2nd thought, that is only true for cnum.
I'll check if I can replicate this loosing of primes. Weird ...

The larger the number the less it can use the last optimization (use 32 bit only).
Still thinking (in the background) of how to optimize the % for 64 bit numerator and 32 bit denominator,

@Nick Gammon:
I see you're overclocking.
Did you optimize the % 10 and / 10 out of the BigNumber library?
If not, try it and see what that does to your times.

No I didn't, the library basically uses the standard number.c file which I found somewhere.

I thought you said you modified that library for Arduino.
Like I said, if you kill all the % 10 and / 10 (which also appear as % BASE and / BASE), it will run much faster.

odometer:
I don't understand why you seem to think that BigNumber is so quick.

No offense intended. BigNumber is quick compared to my expectations. I'm surprised to see how well it does compared to operating with native integers. I had expected it to have significant overhead.

OK, here are results with BigNumber, and with Numbler4 as posted at Arbitrary precision (big number) library port for Arduino - #69 by odometer - Programming Questions - Arduino Forum. I transplanted powMod() from BigNumber into Numbler4. Tests ran on an Uno and a Due, and, just for fun, with a TI Stellaris Launchpad, compiled under the open-source Arduino-like IDE, "Energia. " Calculated primes agreed for all three devices. Results are total time in milliseconds for calculation of the first 100 primes after 10N-1, where N is 12 and 16:

                         12 digit                16 digit
           Uno:      102002    263207        254004    688223
           Due:       14431     13790         34571     32051
     Stellaris:       10349     10281         24643     24058

It ran significantly faster with Numbler4 on the Uno, with execution time less than 40% of the BigNumber version. It ran a little slower than BigNumber on the ARMs. On the Due, Numbler4 ran about 5-8% slower. There was less difference with the Stellaris, but it was still bit slower. I can qualitatively understand how eliminating a lot of modulo operations can speed things up on the Uno; my understanding of the internals of the ARMs is too thin to help me explain the results with the Due and Stellaris. I'm fool enough to speculate, though: TI says that the Stellaris' Cortex-M3 has hardware divide, and Atmel says the same thing about the SAM3X. My guess is that division on those processors is fast enough to be competitive with the alternative.

Uno and Due programs were identical, generally the code from reply 66, maybe with a minor change or two. The unused function rng(), for generating random bases for the Miller-Rabin test, was deleted. The IDE v1.5.4 wouldn't compile it for the Due - it didn't like the random() function without arguments, preferring rand() instead. v1.5.2 was OK with random(). I don't even want to think about about the machinations I had to go through to get the Stellaris to run the program; just as well, since that kind of talk doesn't belong on an Arduino forum.