The thread allocation issue related to the large-scale Problems

LCheng · 27 May 2021 02:03

Hi, thanks for the very nice open source code. I am dealing with large-scale acoustic problems using Bempp-cl. Yesterday, when I tried the first benmark case with element number equal to 300 thousands, I am puzzled about the results. The following picture shows that the multiple threads have been activated(~128 threads), but I have not allocated the exact thread number in my python code(please see below). Is there any automative allocation mechanism in Bempp-cl to determine how many threads to utilize for large-scale problems? I will be very grateful if you can show me the related tutorial for this issue. (My linux cluster Information: ARMV8, CentOS)

my python code:

import bempp.api
bempp.api.DEFAULT_DEVICE_INTERFACE = 'numba'
bempp.api.VECTORIZATION_MODE = "vec4"
grid = bempp.api.shapes.sphere(h=0.01)
space = bempp.api.function_space(grid, "DP", 0)
slp = bempp.api.operators.boundary.laplace.single_layer(space, space, space)
@bempp.api.real_callable
def f(x, n, domain_index, result):
    result[0] = x[0] + 1

rhs = bempp.api.GridFunction(space, fun=f)
sol, info = bempp.api.linalg.gmres(slp, rhs)

timo · 15 June 2021 08:26

Hi,

the number of threads cannot be controlled from within Bempp. If you use Numba then they are controlled through the Numba runtime system. You can get more information here: The Threading Layers — Numba 0.50.1 documentation

One comment though. The Numba code is quite inefficient. Ideally, you would have an OpenCL runtime system for your ARM CPU cores so that you can enable the OpenCL functionality. Depending on the kernels this is typically 3 to 5 times faster (sometimes more).

Best wishes

Timo

LCheng · 21 June 2021 03:34

Dear Dr. Betcke,
Hi, thanks for your reply. As you said, I also found that bempp running on Numba is really slow. For instance, if my matrix size is 60000x60000 and 192 threads are utilized, it takes about 0.6s and 5.33s for one GMRES iteration and ILU-GMRES iteration, respectively (I am studying a new mixed-precision algorithm for BEM, so FMM is not activated currently). I will adopt your suggestion and jump to the more suitable direction - OpenCL.

    Best wishes, 
    Long

Topic		Replies	Views
The Cross-node Parallelism Issue Bempp	3	331	21 June 2021
Can bempp-cl be used to solve large size Maxwell's problems? Applications	0	332	6 January 2023
Segmentation fault in	4	367	6 January 2023
Memory utilization for large problems Applications	1	363	8 March 2022
At which grid size does FMM become 'worth it'? Bempp	4	1153	7 January 2021

The thread allocation issue related to the large-scale Problems

my python code:

Related topics