Slow Maxwell FMM

I’m using bempp-cl 0.2.2 and trying to use FMM for Electromagnetic scattering problems. I created a test based on the maxwell_dielectric tutorial example (I changed two spheres to one and changed some parameters). I observed that using FMM + gmres was a lot slower than dense assembly with numba + gmres (3.8h vs 90s). I am aware of the other post asking about on what scale we should choose FMM instead of dense assembly and I think my problem is too small for FMM to have an advantage. But I was still surprised that FMM was that slow and thought maybe I didn’t use it correctly. I copied my code below and I’m wondering if someone could point out if I made any mistakes in using FMM. Thank you!

import bempp.api
import numpy as np
from bempp.api.operators.boundary.maxwell import multitrace_operator
from bempp.api.operators.boundary.sparse import multitrace_identity
from bempp.api.linalg import gmres

radius = 0.5
sphere0 = bempp.api.shapes.sphere(r=radius, origin=(0., 0, 0), h=.2)

frequency = 300E6 # 300Mhz

vacuum_permittivity = 8.854187817E-12
vacuum_permeability = 4 * np.pi * 1E-7

eps_r = 1.55 * 1.55
mu_r = 1.0

k_ext = 2 * np.pi * frequency * np.sqrt(vacuum_permittivity * vacuum_permeability)
k_int = k_ext * np.sqrt(eps_r * mu_r)

theta = 0 # Incident wave travelling at a 0 degree angle
direction = np.array([np.cos(theta), np.sin(theta), 0])
polarization = np.array([0, 0, 1.0])

def plane_wave(point):
    return polarization * np.exp(1j * k_ext * np.dot(point, direction))

@bempp.api.complex_callable
def tangential_trace(point, n, domain_index, result):
    value = polarization * np.exp(1j * k_ext * np.dot(point, direction))
    result[:] =  np.cross(value, n)

@bempp.api.complex_callable
def neumann_trace(point, n, domain_index, result):
    value = np.cross(direction, polarization) * 1j * k_ext * np.exp(1j * k_ext * np.dot(point, direction))
    result[:] =  1./ (1j * k_ext) * np.cross(value, n)


#A0_int = multitrace_operator(sphere0, k_int, epsilon_r=eps_r, mu_r=mu_r, space_type='all_rwg', assembler='dense', device_interface="numba")
#A0_ext = multitrace_operator(sphere0, k_ext, space_type='all_rwg', assembler='dense', device_interface="numba")
A0_int = multitrace_operator(sphere0, k_int, epsilon_r=eps_r, mu_r=mu_r, space_type='all_rwg', assembler='fmm')
A0_ext = multitrace_operator(sphere0, k_ext, space_type='all_rwg', assembler='fmm')

A = bempp.api.GeneralizedBlockedOperator([[A0_int + A0_ext]])

rhs = [bempp.api.GridFunction(space=A.range_spaces[0], dual_space=A.dual_to_range_spaces[0], fun=tangential_trace),
       bempp.api.GridFunction(space=A.range_spaces[1], dual_space=A.dual_to_range_spaces[1], fun=neumann_trace)]

sol, info = gmres(A, rhs, tol=1E-5)

Hi, we already talked directly. But for reference I want to provide a public response. The FMM currently is not optimised for Maxwell. It works. But a single product with an electric field operator takes 4 FMM passes and the product with a magnetic field operator takes 3 FMM passes. Hence, as long as dense assembly fits into memory this usually gives better overall performance. For larger problems obviously the FMM becomes necessary.