TFCC - Technical Area: Software Engineering
IEEE IEEE  Computer Society Task Force on Cluster Computing, Home Page Computer Society

Technical Area Software Engineering


INTRODUCTION

Clusters of workstations and/or PCs have gained increased attention as low cost parallel computing platforms over the past years. High performance CPUs and interconnect technology make clusters an attractive hardware platform for many applications of industrial relevance that have been the domain of supercomputing for a long time. However, to make use of these cost effective platforms, application software has to be parallelized.

Simulation based design is one domain that is of particular interest to industry, especially to small to medium size companies who cannot afford access to traditional supercomputers or MPPs. Since there are a number of mature software packages in Fluid Dynamics, Finite Element Analysis, and Digital Mockup, software projects that aim at exploiting cluster power usually will be parallelizations of existing software. Parallelizing large-scale industrial software packages, however, requires appropriate software engineering methods.

In the past, quite a few projects have parallelized industrial simulation codes. An example is the EUROPORT initiative, where a number of industrial codes have been ported to parallel computers. Only very few software vendors, however, actually offer parallel execution as an option in their products. The goal of TFCC's Technical Area Software Engineering is to overcome the obstacles that prevent parallel software to come into widespread use on clusters. In particular, TFCC-SWE seeks to promote research and facilitate exchange of experience related to the issues listed below.

RESEARCH ISSUES

Hardware Platforms

SCI Clusters

Programming Models and Environments

PVM and MPI currently are the standards in distributed memory computing. However, Symmetric Multiprocessors have become increasingly popular. Actually, most high-end workstations and PCs are SMPs with two to eight processors. Thus we are faced with two levels of parallelism in future clusters. OpenMP and POSIX threads are two standards for programming SMPs. Future research will have to adress hierarchical parallelism. In distributed Computing, CORBA has established as a standard that allows one to interface software packages in a heterogeneous environment. In particular, legacy applications can be integrated as components into new distributed environments. In order to promote exchange of experience, we provide a
list of projects that use any of the programming environments mentionned above.

Software Engineering Methods for Porting existing Software to Clusters

In scientific computing, most parallel software projects are parallelizations of existing software. Due to the limitations of automatic parallelization and data parallel languages, many software packages have to be parallelized manually. This process usually requires interdisciplinary cooperation. To achieve high efficiency and scalability, a well-defined software engineering process is required that is consequently applied throughout the project. Using standardized programming model, all parallel software is developed for a whole range of platforms. However, achieving good efficiency on clusters is a particular challenge for a number of reasons. Interconnection networks (WANs, Ethernet based LANs) exhibit much higher latency than MPP interconnects. A remarkable exception are SCI based clusters. Clusters are usually heterogeneous, i.e., nodes have different types (and numbers) of CPUs, different CPU power and memory capacity. In addition, multiple users compete for resources. This makes resource management and dynamic load balancing a pariticularly important issue. Distributed object oriented computing based on standards like CORBA makes exisiting applications (including legacy systems) interoperate in a cluster environment.
list of projects related to the software engineering issues mentionned above.

Program Development Tools

Appropriate tools for program development and analysis are a key prerequisite to productivity in engineering parallel and distributed software. EuroTools and PTools a two consortiums that coordinate and promote research in tools for parallel programming. They provide information on available tools and ongoing research projects. At LRR-TUM, a standard for on-line monitoring, OMIS, has been defined that provides a basis for an integrated tool environment. A reference implementation, OCM, is currently being implemented. On top of OCM, the parallel debugger DETOP will be made available on a series of common parallel platforms as the first part of the integrated tool environment THE TOOL-SET.
Program analysis tools are a Technical Aera of its own in the TFCC.

Links

Contact:

Peter Luksch
LRR-TUM, Institut für Informatik
Technische Universität München
D-80290 München
Germany
e-mail: Peter.Luksch@computer.org
phone: ++49-89-289-28164 or ++49-89-289-22382
fax: ++49-89-289-28232


This page still is under construction. Please send comments, suggestions, or contributions to Peter.Luksch@in.tum.de



Peter Luksch
$Id: index.html,v 1.5 1999/10/12 13:16:58 luksch Exp $