While this is not a self-contained example, I would like to summarize an issue I had so that others may find it (and my workaround) when they search for it.
Loading FastTransforms.jl (import FastTransforms or using FastTransforms or loading a package depending on FastTransforms.jl) made some FFTs (of small real vectors with 256 to 4096 entries) via FFTW.jl roughly 100x slower on my system (when running Julia with one thread). Debugging this via the profiling tools showed that a huge amount of time was spent in __psynch_cvwait and __psynch_cvsignal. The reason appears to be
|
function __init__() |
|
n = ceil(Int, Sys.CPU_THREADS/2) |
|
ft_set_num_threads(n) |
|
ccall((:ft_fftw_init_threads, libfasttransforms), Cint, ()) |
|
ft_fftw_plan_with_nthreads(n) |
|
end |
Reversing these settings by executing
FastTransforms.ft_set_num_threads(1)
ccall((:ft_fftw_init_threads, FastTransforms.libfasttransforms), Cint, ())
FastTransforms.ft_fftw_plan_with_nthreads(1)
after loading FastTransforms fixed the performance issue.
While this is not a self-contained example, I would like to summarize an issue I had so that others may find it (and my workaround) when they search for it.
Loading FastTransforms.jl (
import FastTransformsorusing FastTransformsor loading a package depending on FastTransforms.jl) made some FFTs (of small real vectors with 256 to 4096 entries) via FFTW.jl roughly 100x slower on my system (when running Julia with one thread). Debugging this via the profiling tools showed that a huge amount of time was spent in__psynch_cvwaitand__psynch_cvsignal. The reason appears to beFastTransforms.jl/src/libfasttransforms.jl
Lines 15 to 20 in 14a3118
Reversing these settings by executing
after loading FastTransforms fixed the performance issue.