Parallel Computing¶

Julia has multiple ways of doing parallel computations. There's experimental multi-threading support and support for distributed computing. We'll touch upon the basics here to give you an idea what's possible.

Threading¶

Threading is built-in nowadays, and we'll ignore the task part here, but go straight for speeding up computations. You can check whether this notebook actually already supports multiple threads:

In [1]:
Threads.nthreads()
Out[1]:
1

Each thread has its own id, and we can use these. Let's do it in parallel as well.

In [22]:
a = zeros(Threads.nthreads()*2)
Threads.@threads for i = 1:length(a)
   a[i] = Threads.threadid()
end
a
Out[22]:
16-element Vector{Float64}:
 1.0
 1.0
 6.0
 6.0
 3.0
 3.0
 7.0
 7.0
 4.0
 4.0
 2.0
 2.0
 8.0
 8.0
 5.0
 5.0

However, threads are not simple, because you introduce so called race conditions. Each thread on its will do its thing, without synchronizing with other threads. They can all modify the same value, or read values out of order, leading to unpredictable results.

In [32]:
total = 0
@Threads.threads for i in 1:1000
  global total += 1
end
total
Out[32]:
917

You can prevent this with setting the sum to be an Atomic entity, it should only be accessed by one thread at a time. Another way would be synchronizing (using locks), but that introduces more overhead.

In [14]:
total = Threads.Atomic{Int}(0)
@Threads.threads for i in 1:10000
    Threads.atomic_add!(total, 1)
end
total
Out[14]:
Base.Threads.Atomic{Int64}(10000)

Distributed¶

Instead of running threads, you can also run multiple Julia processes and let them communicate (or combine them). Threading knows about your local memory, but the next process doesn't.

https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1

Let's add two new worker processes, which can be used for computations.

In [3]:
using Distributed
In [4]:
addprocs(2)
Out[4]:
2-element Vector{Int64}:
 2
 3

We can use the @distributed macro to distribute this for loop over all worker processes. Workers make copies of the variables used in this loop. So if we want to write to the same Array on the master process, we need to use the package SharedArrays.

In [20]:
using SharedArrays

a = SharedArray(zeros(10))
@info a  # empty array

@distributed for i = 1:10
    a[i] = i
end
┌ Info: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
└ @ Main In[20]:4
Out[20]:
Task (runnable) @0x000000015e72f6b0
In [21]:
a
Out[21]:
10-element SharedVector{Float64}:
  1.0
  2.0
  3.0
  4.0
  5.0
  6.0
  7.0
  8.0
  9.0
 10.0

For longer running tasks, we can use pmap. It takes a function and an iterable. To use functions outside, we use @everywhere to copy these functions to all worker processes.

In [19]:
addprocs(100)  # don't repeat this cell too much!

@everywhere function slowtask(_)
    sleep(5)
    getpid()
end

A = rand(100)

@time pmap(slowtask, A)
  5.501511 seconds (7.83 k allocations: 339.500 KiB)
Out[19]:
100-element Vector{Int32}:
 50431
 50430
 50436
 50439
 50437
 50438
 50440
 50441
 50444
 50445
 50443
 50454
 50458
     ⋮
 50532
 50486
 50483
 50536
 50525
 50500
 50537
 50535
 50521
 50496
 50524
 50518
In [41]:
rmprocs(workers())
┌ Warning: rmprocs: process 1 not removed
└ @ Distributed /Users/administrator/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-1.0/build/default-macmini-aarch64-1-0/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:1048
Out[41]:
Task (done) @0x0000000283042e10
In [ ]: