The thread allocation issue related to the large-scale Problems

Hi, thanks for the very nice open source code. I am dealing with large-scale acoustic problems using Bempp-cl. Yesterday, when I tried the first benmark case with element number equal to 300 thousands, I am puzzled about the results. The following picture shows that the multiple threads have been activated(~128 threads), but I have not allocated the exact thread number in my python code(please see below). Is there any automative allocation mechanism in Bempp-cl to determine how many threads to utilize for large-scale problems? I will be very grateful if you can show me the related tutorial for this issue. (My linux cluster Information: ARMV8, CentOS)

my python code:

import bempp.api
bempp.api.DEFAULT_DEVICE_INTERFACE = 'numba'
bempp.api.VECTORIZATION_MODE = "vec4"
grid = bempp.api.shapes.sphere(h=0.01)
space = bempp.api.function_space(grid, "DP", 0)
slp = bempp.api.operators.boundary.laplace.single_layer(space, space, space)
def f(x, n, domain_index, result):
    result[0] = x[0] + 1

rhs = bempp.api.GridFunction(space, fun=f)
sol, info = bempp.api.linalg.gmres(slp, rhs)


the number of threads cannot be controlled from within Bempp. If you use Numba then they are controlled through the Numba runtime system. You can get more information here: The Threading Layers — Numba 0.50.1 documentation

One comment though. The Numba code is quite inefficient. Ideally, you would have an OpenCL runtime system for your ARM CPU cores so that you can enable the OpenCL functionality. Depending on the kernels this is typically 3 to 5 times faster (sometimes more).

Best wishes


Dear Dr. Betcke,
Hi, thanks for your reply. As you said, I also found that bempp running on Numba is really slow. For instance, if my matrix size is 60000x60000 and 192 threads are utilized, it takes about 0.6s and 5.33s for one GMRES iteration and ILU-GMRES iteration, respectively (I am studying a new mixed-precision algorithm for BEM, so FMM is not activated currently). I will adopt your suggestion and jump to the more suitable direction - OpenCL.

    Best wishes,