Multi- and manycore as well as heterogenous microarchitecture today play a major role in the hardware landscape. Specialized hardware, such as commodity graphics processing units, have proven to be compute accelerators that are capable of solving specific scientific problems orders of magnitude faster than conventional CPUs.
This chapter studies optimizations of a computational kernel appearance within a biomedical application, hyperthermia cancer treatment, on some of the latest microarchitectures, including Intel's Xeon Nehalem and AMD's Opteron Barcelona multicore processors, the Cell Broadband Engine Architecture (Cell BE), NVIDIA's graphics processing units (GPUs), and two cluster computers: a "traditional " CPU and a Cell BE cluster.
Hyperthermia is a relatively new treatment modality that is used as a complementary therapy to radio- or chemo-therapies. Clinical studies have shown that the effect of both radio- and chemo-therapies can be substantially enhanced by combining them with hyperthermia. The computationally demanding part of the treatment consists of having a large-scale nonlinear, nonconvex partial differential equation (PDE)-constrained optimization problem, as well as the forward problem, which is discussed in this chapter and which can be used to solve the inverse problem.
The optimizations discussed in the chapter concern bandwidth-saving algorithmic transformations and implementations on the architectures mentioned above.