Research

ScaleME

What is ScaleME?

In one sentence: ScaleME is a programmable, portable, scalable, application independent implementation of the electrodynamic/acoustic Multilevel Fast Multipole Algorithm (MLFMA) for distributed memory computers, using the Message Passing Interface (MPI).

The ScaleME project began in 1997 as an attempt at parallelizing the MLFMA on distributed memory computers. The original plan was to parallelize FISC, the fast solver developed at the UofI. However, we realized that it was "easier said than done" and as a result decided to write a new MLFMA code from the scratch, such that it can be designed to suit the message passing paradigm.

Since the fast multipole method is really an abstract prescription of an algorithm which is not tied to any specific application, we (Ah!) realized that it could be implemented as a library of user callable functions in the spirit of LAPACK. Such an implementation could be used as a plug-in in various applications. This abstraction, within the framework of method of moments (MoM), turned out to be remarkably simple.

Thus, ScaleME is an implementation of the abstract MLFMA. It is a library that can be retrofitted into existing MoM codes with minimal modifications to the MoM code. ScaleME provides a carefully designed interface, in the form of subroutines, for its initialization and running. As a consequence, it is one of the strengths of ScaleME that the user does not require the source code to use it. It is almost a black box. In other words, to use ScaleME, the developer needs to have only a minimal understanding of the MLFMA and parallel computing using MPI.

Who needs it?

ScaleME is useful for two categories of engineers:

Electrical engineers who already have a MoM code for electromagnetic scattering/radiation problems but which cannot handle large number of unknowns. For such codes, ScaleME provides a sequential host interface which facilitates a way to exploit the power of MLFMA and parallel computers with minor modifications to the original code. In this situation, the iterative solver of the MoM code runs on one computer, but the matrix vector products are computed using all the computers available. Essentially, the user has to replace the calls to his own routines for matrix vector products with that provided by ScaleME. In order to do this, he has to supply ScaleME with four subroutines which encapsulate the physics of the problem. Detailed techniques for writing these subroutines are described in the ScaleME User's Guide. We have demonstrated that this could be done in a few days time.
Engineers who are developing new electromagnetic solvers for parallel computers. The scenario we have in mind is the following: there is a MoM code which uses iterative methods on parallel computers. The iterative solver itself is parallelized. For this situation ScaleME provides the parallel host interface which, interestingly, is somewhat simpler than the sequential host version. This interface is currently under beta testing and has not yet been released for public use. Obviously, in order to exploit this, which provides better performance, the MoM code has to be modified appropriately.

ScaleME: What it is not

There are several misconceptions about ScaleME. Here I shall try to explain them.

It is not an electromagnetic MoM code: ScaleME is not a solver. It does not have an EM code built into it. It does not know anything about your integral equation formulation or the discretisation used. In its philosophy, it is very similar to that of LAPACK or BLAS.
It is not a parallelized matrix iterative solver: No, ScaleME does not contain a parallelized version of iterative solvers such as the CGNR or GMRES. The user has to find one for himself. The reason for not not having one built into ScaleME is that we do not want it to be tied to any specific solver.
ScaleME is not FISC: Although the fast Illinois solver code (FISC) has been around for a while and has been a very well tested code, ScaleME is not a parallelized version of FISC. It is a completely new code written from scratch. All they have in common is the underlying abstract fast multipole theory. Nothing more and nothing less.
ScaleME is not under GPL: Although ScaleME is a public domain code (restricted) it is not under the GNU General Public Licence. Neither is it a free software.

Technical Features

Kernels available: Two dimensional Helmholtz, Three dimensional Helmholtz, Three dimensional Laplace. All available through a unified interface.
Preconditioners: a built in block diagonal preconditioner based on self-interaction matrices of cells is available.
A powerful compression algorithm for reducing the memory requirements of the translation matrices of three dimensional Helmholtz MLFMA.
Interpolation/Anterpolation algorithms of arbitrary order.
Support for scalar and vector Helmholtz equations. For vector problems, both rectangular and spherical components are supported.
A powerful two stage initialization solves the geometric data structure replication problem of host programs.

Portability

ScaleME has been tested over the following platforms:

Network of Sun Ultra workstations running Solaris.
SGI Origin 2000, running IRIX 6.3
Network of DEC Alpha workstations running OSF 3.2
Linux clusters using AMD K6-2 processors and Slackware Linux.

Although we have not tested on other platforms, we do not anticipate any serious problems in porting to other systems.

Achievements so far

In a series of simulations conducted in May, 2001, ScaleME broke every known record in frequency domain computational electromagnetics. It solved the largest problem ever tried in the fastest possible manner. The simulation involved about 10.2 million unknowns modeling a 400 wavelength long, full scale aircraft. This was solved on 128 processors of an SGI Origin 2000 supercomputer at NCSA. The total solution time was about 7 and a half hours, beating the previous best of about 15 hours set by FISC (for slightly smaller problem) in 2000. In computing matrix-vector products, ScaleME was more than eight times faster than FISC, making the fastest implementation of MLFMA ever reported. See the techreport.
Further simulations also exploded another myth about dynamic MLFMA: That it is not scalable beyong 32 processors. ScaleME demonstrated that with proper parallelization methods, the algorithm can be made scalable. Indeed, we have results that show nearly 100% parallel efficiency for number of processors as large as 64. More results are currently being studied.
In November-December 1999, dpTRIMOM, a highly sophisticated MoM code (based on TRIMOM) solved a sequence of increasingly large scale problems involving realistic targets such as full scale air crafts. The largest being a full scale aircraft at 4GHz, having nearly 4 million unknowns on an SGI Origin 2000.
On December 31^st, 1998, a ScaleME accelerated MoM code, based on TRIMOM (developed by Lu, Song and Chew), solved an electromagnetic scattering problem involving 3.1 million unknowns, breaking the previous record of 2.4 million set by FISC in June 1997.
We have also been able to solve electromagnetic scattering problems involving hundreds of thousands of unknowns, on a routine basis, on a sub-$25,000 Linux cluster.

Off shoots

There are a couple of attempts to extend the original MLFMA implementation in ScaleME to more general cases. They are still in the research stage. Here are some directions:

Bin Hu and Weng Chew are developing a fast inhomogeneous plane wave algorithm for multilayered structures. This work is not yet complete.
K. Donepudi and J. Jin have been able to use ScaleME in their higher order MoM code which uses parametric elements. They are also trying to extend it for multiple dielectric structures (see footnote).
Lijun Jiang and Jose Schutt-Aine are using the Laplace kernel of ScaleME for their parameter extraction programs.

Papers and Research Reports

Sanjay Velamparambil and Weng Cho Chew. Parallelization of multilevel fast multipole algorithm on distributed memory computers. Technical Report 13-01, Center for Computational Electromagnetics, University of Illinois at Urbana-Champaign, 2001.
Sanjay Velamparambil and Weng Cho Chew. 10 million unknowns: Is it that large? Technical report, Center for Computational Electromagnetics, 2001. (Under Preparation)
Sanjay Velamparambil and Weng Cho Chew. Parallelization of MLFMA on distributed memory computers. In Roberto D. Graglia, editor, Proceedings of the International Conference on Electromagnetics in Advanced Applications (ICEAA01), pages 141-144, September 10-14 2001. (Expanded version under the review of "Parallel Computing")
Sanjay Velamparambil and Weng Cho Chew. ScaleME: Application interface; a programmer's guide and reference. Technical Report CCEM-27-99, Center for Computational Electromagnetics, 1999.
S.V. Velamparambil, J. Song, W.C. Chew, and K. Gallivan. ScaleME: A portable, scalable multipole engine for electromagnetic and acoustic integral equation solvers. IEEE Antennas Propagat. Symp., 3:1774-1777, 1998.
S. V. Velamparambil, J. E. Schutt-Aine, J. G. Nickel, J. M. Song, and W. C. Chew. Solving large scale electromagnetic problems using a linux cluster and parallel MLFMA. IEEE Antennas Propagat. Symp., 1:636-639, July 1999.
S. V. Velamparambil, J. M. Song, and W. C. Chew. A Portable Parallel Multilevel Fast Multipole Solver for Scattering from Perfectly Conducting Bodies. IEEE Antennas Propagat. Symp., 1:648-651, July 1999.
Sanjay Velamparambil, Jiming Song, and Weng Cho Chew. On the parallelization of dynamic multilevel fast multipole method on distributed memory computers. Proc. International Workshop on Innovative Architecture for Future Generation High-Performance Processors and Systems, Maui, Hawaii, November 1999.
Sanjay Velamparambil and Weng Cho Chew, An Improved Parallelization Strategy for MLFMA on Distributed Memory Computers, IEEE Antenas and Propagation Symposium, 2000
Sanjay Velamparambil, Jiming Song, and Weng Cho Chew. ScaleME: A Portable, Distributed Memory Multilevel Fast Mulipole Kernel for Electromagnetic and Acoustic Integral Equation Solvers. Technical Report CCEM-23-99, Center for Computational Electromagnetics, Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 1999.
Sanjay Velamparambil and Weng Cho Chew. Fast Polynomial Representations for the Diagonal Translation Operators of the Three Dimensional Helmholtz Equation. Technical Report CCEM-17-99, Center for Computational Electromagnetics, Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, 1999.

Related Research

Bin Hu, Weng Cho Chew, and Sanjay Velamparambil. Fast inhomogeneous plane wave algorithm (FIPWA) for the analysis of electromagnetic scattering problems. (Submitted to Radio Science), 2000.
¹ K. C Donepudi, J. M. Jin, S Velamparambil, J. M. Song, and W. C. Chew. A higher order parallelized multilevel fast multipole algorithm for 3D scattering. Submitted to IEEE Trans. Antennas and Propagation. (More recent versions of this work can be found in IEEE APS 2000 Digest)

1 S. Velamparambil is no longer associated with this work and its various incarnations, Contact the first and second authors for more details on this work.