Parallel Computing¶
Julia has multiple ways of doing parallel computations. There's experimental multi-threading support and support for distributed computing. We'll touch upon the basics here to give you an idea what's possible.
Threading¶
Threading is built-in nowadays, and we'll ignore the task part here, but go straight for speeding up computations. You can check whether this notebook actually already supports multiple threads:
Threads.nthreads()
1
Each thread has its own id, and we can use these. Let's do it in parallel as well.
a = zeros(Threads.nthreads()*2)
Threads.@threads for i = 1:length(a)
a[i] = Threads.threadid()
end
a
16-element Vector{Float64}:
1.0
1.0
6.0
6.0
3.0
3.0
7.0
7.0
4.0
4.0
2.0
2.0
8.0
8.0
5.0
5.0
However, threads are not simple, because you introduce so called race conditions. Each thread on its will do its thing, without synchronizing with other threads. They can all modify the same value, or read values out of order, leading to unpredictable results.
total = 0
@Threads.threads for i in 1:1000
global total += 1
end
total
917
You can prevent this with setting the sum to be an Atomic entity, it should only be accessed by one thread at a time. Another way would be synchronizing (using locks), but that introduces more overhead.
total = Threads.Atomic{Int}(0)
@Threads.threads for i in 1:10000
Threads.atomic_add!(total, 1)
end
total
Base.Threads.Atomic{Int64}(10000)
Distributed¶
Instead of running threads, you can also run multiple Julia processes and let them communicate (or combine them). Threading knows about your local memory, but the next process doesn't.
https://docs.julialang.org/en/v1/manual/parallel-computing/#Multi-Core-or-Distributed-Processing-1
Let's add two new worker processes, which can be used for computations.
using Distributed
addprocs(2)
2-element Vector{Int64}:
2
3
We can use the @distributed macro to distribute this for loop over all worker processes. Workers make copies of the variables used in this loop. So if we want to write to the same Array on the master process, we need to use the package SharedArrays.
using SharedArrays
a = SharedArray(zeros(10))
@info a # empty array
@distributed for i = 1:10
a[i] = i
end
┌ Info: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0] └ @ Main In[20]:4
Task (runnable) @0x000000015e72f6b0
a
10-element SharedVector{Float64}:
1.0
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
For longer running tasks, we can use pmap. It takes a function and an iterable. To use functions outside, we use @everywhere to copy these functions to all worker processes.
addprocs(100) # don't repeat this cell too much!
@everywhere function slowtask(_)
sleep(5)
getpid()
end
A = rand(100)
@time pmap(slowtask, A)
5.501511 seconds (7.83 k allocations: 339.500 KiB)
100-element Vector{Int32}:
50431
50430
50436
50439
50437
50438
50440
50441
50444
50445
50443
50454
50458
⋮
50532
50486
50483
50536
50525
50500
50537
50535
50521
50496
50524
50518
rmprocs(workers())
┌ Warning: rmprocs: process 1 not removed └ @ Distributed /Users/administrator/.julia/scratchspaces/a66863c6-20e8-4ff4-8a62-49f30b1f605e/agent-cache/default-macmini-aarch64-1.0/build/default-macmini-aarch64-1-0/julialang/julia-release-1-dot-8/usr/share/julia/stdlib/v1.8/Distributed/src/cluster.jl:1048
Task (done) @0x0000000283042e10