Parallel Computing¶
Julia has multiple ways of doing parallel computations. There's experimental multi-threading support and support for distributed computing. We'll touch upon the basics here to give you an idea what's possible.
Threading¶
Threading is built-in nowadays, and we'll ignore the task part here, but go straight for speeding up computations. You can check whether this notebook actually already supports multiple threads:
Threads.nthreads()
1
Each thread has its own id, and we can use these. Let's do it in parallel as well.
a = zeros(Threads.nthreads()*2)
Threads.@threads for i = 1:length(a)
a[i] = Threads.threadid()
end
a
16-element Vector{Float64}: 1.0 1.0 6.0 6.0 3.0 3.0 7.0 7.0 4.0 4.0 2.0 2.0 8.0 8.0 5.0 5.0
However, threads are not simple, because you introduce so called race conditions. Each thread on its will do its thing, without synchronizing with other threads. They can all modify the same value, or read values out of order, leading to unpredictable results.
total = 0
@Threads.threads for i in 1:1000
global total += 1
end
total
917
You can prevent this with setting the sum to be an Atomic
entity, it should only be accessed by one thread at a time. Another way would be synchronizing (using locks), but that introduces more overhead.
total = Threads.Atomic{Int}(0)
@Threads.threads for i in 1:10000
Threads.atomic_add!(total, 1)
end
total
Base.Threads.Atomic{Int64}(10000)
Distributed¶
Instead of running threads, you can also run multiple Julia processes and let them communicate (or combine them). Threading knows about your local memory, but the next process doesn't.
https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1
Let's add two new worker processes, which can be used for computations.
using Distributed
addprocs(2)
2-element Vector{Int64}: 2 3
We can use the @distributed
macro to distribute this for loop over all worker processes. Workers make copies of the variables used in this loop. So if we want to write to the same Array on the master process, we need to use the package SharedArrays
.
using SharedArrays
a = SharedArray(zeros(10))
@info a # empty array
@distributed for i = 1:10
a[i] = i
end
┌ Info: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] └ @ Main In[20]:4
Task (runnable) @0x000000015e72f6b0
a
10-element SharedVector{Float64}: 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0
For longer running tasks, we can use pmap
. It takes a function and an iterable. To use functions outside, we use @everywhere to copy these functions to all worker processes.
addprocs(100) # don't repeat this cell too much!
@everywhere function slowtask(_)
sleep(5)
getpid()
end
A = rand(100)
@time pmap(slowtask, A)
5.501511 seconds (7.83 k allocations: 339.500 KiB)
100-element Vector{Int32}: 50431 50430 50436 50439 50437 50438 50440 50441 50444 50445 50443 50454 50458 ⋮ 50532 50486 50483 50536 50525 50500 50537 50535 50521 50496 50524 50518
rmprocs(workers())
┌ Warning: rmprocs: process 1 not removed └ @ Distributed /Users/administrator/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-1.0/build/default-macmini-aarch64-1-0/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:1048
Task (done) @0x0000000283042e10