Julia is a high level programming language designed for technical computing.

Random interesting features (language + ecosystem):

  • designed to be fast
  • syntax similar to python/ruby
  • dynamic typing
  • parametric types
  • multiple dispatch and syntactic macros (Cf. Common Lisp)
  • designed for parallelism
  • built in package manager
  • it has a backend for ipython/jupyter
  • great interoperability to C/Fortran/Python/…
  • great lib for vector graphics
  • you can create/run your notebooks online on JuliaBox. It is able to sync your directories with Google Drive or github repos. When you sign on it, you get some interactive tutorials.

This post contains the implementation of different prime number sieves, which can be used to solve problem 7 of project Euler.

Hello, Julia!

There are three ways to start experimenting with Julia:

  1. install command line tools
  2. Juno, a complete IDE
  3. JuliaBox.

For this post I chose the third option. You can clone my workspace in local, or let JuliaBox sync a local folder to it.

Euler 7

https://projecteuler.net/problem=7

By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13.

What is the 10 001st prime number?

The fastest solution is by using a sieve.

Upper bound

Before running a sieve, we need to determine its size. To do this, we can apply https://en.wikipedia.org/wiki/Prime-counting_function#Inequalities to determine an upper bound for the nth prime.

In [1]:

function get_upper_bound(nth_prime)
    ub::Integer = ceil(nth_prime * log(nth_prime * log(nth_prime)))
    max(ub, 7)
end
get_upper_bound (generic function with 1 method)

Julia has a dynamic type system, but we can add type annotations to variables, like ub, to generate more efficient code.

Moreover, get_upper_bound is a generic function. We can write several implementations of it, by adding different type specialization of its arguments (multimethods). Then, Julia chooses the concrete implementation to run by implementing multiple dispatch.

In [2]:

target_nth_prime = 10001
ub = get_upper_bound(target_nth_prime)
114320

The Sieve data structure

The sieve algorithms work on arrays of size , s.t. is true if is prime.

Julia provides a very convenient bit array for this purpose.

Julia arrays are indexed starting from 1, not 0.

In [3]:

v = trues(ub)
v[0]
BoundsError()
while loading In[3], in expression starting on line 2



 in getindex at ./bitarray.jl:350

Sieve of Eratosthenes

First of all I try the Sieve of Eratosthenes.

In [4]:

function eratosthenes_sieve(target_nth)
    n = get_upper_bound(target_nth)
    root_n::Integer = floor(sqrt(n))
    sieve = trues(n)
    sieve[1] = false

    for i = 2:root_n
        if sieve[i]
            # this is faster than v[i^2:i:n] = false
            # we start from i*i since i*(i-1), i*(i-2) etc
            # have been already marked as false when working
            # on divisors of (i-1), (i-2) etc
            for j=i^2:i:n
                sieve[j] = false
            end
            target_nth -= 1
            if 0 == target_nth
                return i
            end
        end
    end

    # I keep going past sqrt(n) until the target_nth
    for i = root_n+1:n
        if sieve[i]
            target_nth -= 1
            if 0 == target_nth
                return i
            end
        end
    end
end

@time eratosthenes_sieve(target_nth_prime)
elapsed time: 0.014452114 seconds (430144 bytes allocated)





104743

To measure the elapsed time, I use the macro time.

Macros are not functions, and this is made explicit with the @ prefix.

Sieve of Sundaram

The sieve of Sundaram is slightly more complex. Compared to Eratosthenes’s, it skips even numbers.

In [5]:

function sundaram_sieve(target_nth)
    ub = get_upper_bound(target_nth)
    sieve = trues(ub)
    n = div(ub, 2)
    target_nth -= 1 # it skips 2
    for i = 1:n
        step = i * 2 + 1
        for j=i+i*step:step:n
            sieve[j] = false
        end
        if sieve[i]
            target_nth -= 1
            if 0 == target_nth
                return i*2+1
            end
        end
    end
end


@time sundaram_sieve(target_nth_prime)
elapsed time: 0.010705828 seconds (293692 bytes allocated)





104743

Sieve of Atkin

The sieve of Atkin is a fast and modern algorithm for prime generation. It can be faster than the others, depending on how smart is your implementation.

For the following code, I applied ideas from this page, using the full mod 60 filters from wikipedia.

In [6]:

function get_bitmask(n, idx)
    f = falses(n)
    for i in idx
        f[i] = true
    end
    f
end

function atkin_sieve(target_nth)

    if target_nth < 4
        return eratosthenes_sieve(target_nth)
    end

    ub = get_upper_bound(target_nth)
    root_ub::Integer = floor(sqrt(ub))
    target_nth -= 3
    is_prime = get_bitmask(ub, [2, 3, 5])

    step31_filter = get_bitmask(60, [1, 13, 17, 29, 37, 41, 49, 53])
    step32_filter = get_bitmask(60, [7, 19, 31, 43])
    step33_filter = get_bitmask(60, [11, 23, 47, 59])

    x, x_square, x_step = 1, 1, 3

    while x <= root_ub

        y, y_square, y_step = 1, 1, 3
        while y <= root_ub
            # 4x^2 + y^2
            n = (x_square << 2) + y_square
            n_mod_60 = n % 60
            if n <= ub && n_mod_60 > 0 && step31_filter[n_mod_60]
                is_prime[n] = !is_prime[n]
            end

            # 3x^2 + y^2
            n -= x_square
            n_mod_60 = n % 60
            if n <= ub && n_mod_60 > 0 && step32_filter[n_mod_60]
                is_prime[n] = !is_prime[n]
            end

            if x > y
                n -= (y_square << 1)
                n_mod_60 = n % 60
                if n <= ub && n_mod_60 > 0 && step33_filter[n_mod_60]
                    is_prime[n] = !is_prime[n]
                end
            end

            y += 1
            y_square += y_step
            y_step += 2
        end
        x += 1
        x_square += x_step
        x_step += 2
    end

    for n = 7:root_ub
        if is_prime[n]
            target_nth -= 1
            if 0 == target_nth
                return n
            end

            n_squared = n^2
            for c = n_squared:n_squared:ub
                is_prime[c] = false
            end
        end
    end

    for n = root_ub+1:ub
        if is_prime[n]
            target_nth -= 1
            if 0 == target_nth
                return n
            end
        end
    end
end

@time atkin_sieve(target_nth_prime)
elapsed time: 0.031512008 seconds (1497144 bytes allocated)





104743

For the Euler 7 assignment, this implementation is the slowest of the three, but we want a benchmark on larger primes!

Benchmark

The function bench_sieve takes an ordinal identifying the prime to generate, the sieve to use, and the number of runs.

It runs the sieve a first time, just to be sure it is compiled by the Julia interpreter, and then it measure the elapsed time as a mean over the specified number of runs.

In [7]:

function bench_sieve(nth_prime, generator, runs=5)
    times = zeros(runs)
    # run the first time just to ensure it is compiled
    generator(nth_prime)
    for i = 1:runs
        times[i] = @elapsed generator(nth_prime)
    end
    mean(times)
end
bench_sieve (generic function with 2 methods)

Julia modules are imported with using. Here, I create a dictionary whose keys are the name of the algorithms (I used symbols, i.e. interned strings), and whose values are the functions implementing the corresponding algorithm.

Then I run the benchmark, arranging the results in a DataFrame.

In [8]:

using DataFrames

sieves = [:Erathostenes => eratosthenes_sieve,
          :Sundaram => sundaram_sieve,
          :Atkin => atkin_sieve]

df = DataFrame(nths = map(round, logspace(1, 7, 7)))
for (name, sieve) in sieves
    df[name] = map(nth -> bench_sieve(nth, sieve), df[:nths])
end

In [9]:

# to install Gadfly
# Pkg.add("Gadfly")
using Gadfly

set_default_plot_size(24cm, 16cm)
p = plot(melt(df, :nths), x = :nths,
         y = :value, color=:variable,
         Guide.xlabel("nth prime"),
         Guide.ylabel("elapsed time [s]"),
         Geom.point,
         Geom.line)
display(p)

julia bench

For larger primes, Atkin is faster than Sundaram.