LRR-TUM
Department of Informatics
Technische Universität München
Informatik X: Computer Organisation; Parallel Computer Architecture
Prof. Dr. Arndt Bode, Prof. Dr. Hans Michael Gerndt
Abakus
Home | Addresses | Staff | Research | Lectures
Search
Valid HTML 4.01!
Valid CSS!

BALANCE

Balanced High Availability in Layered and Distributed Computing Elements

Introduction to BALANCE

BALANCE is an academic-industrial cooperation project between Technische Universität München and Force Computers in Germany.

The project aims at investigating the possibilities for improving overall system performance as well as availability of telecommunication computing systems by the use of High Availability middleware.

The goal is to integrate the middleware's functionality into a layered architecture in order to fully exploit the underlying telecommunication hardware's High Availability core functions. This is achieved by utilizing special communication platforms. Thus, the entire system provides High Availability mechanisms throughout its architectural layers. By adapting each layer's mechanisms, we obtain a so-called balanced high availability system.

Objectives

With the increasing common use of the Internet and a constantly growing number of computers being used in modern telecommunication networks, there is a increasing demand for reliable and highly available computing systems. In contrast to so called "classical" high availability systems, (e.g. fault tolerant switching stations), and "classical" safety critical system s (e.g. nuclear power station control) the systems being used in modern telecommunication networks require a number of modifications and/or additional features.

Traditionally, a "classical" fault tolerant system has been designed with the following properties:

However, the Internet and telecommunication network consists of modular architectures which are based on open standards in order to obtain lower costs and higher flexibility. These systems comprise of independent modules communicating via standardized interfaces. It can be observed that this trend towards more modularization will continue in the near future.

In a modern telecommunications environment, components have to fulfill a number of tasks apart from the "classical" telephone services like connection setup, voice transmission and billing: The communication network e.g. stores voice mails, forwards calls, informs about missed calls etc. Furthermore, it can be expected that in the near future, there will be a number of new applications due to improved mobile phone features and the use of voice over IP. Examples are the integration of SMS, MMS, E-Mail or Internet applications. Apart from this, the growing complexity of telecommunication applications results in an increase of the application development costs compared to hardware development costs. Thus, hardware vendors are forced to provide a standardized API to application programmers. Well known examples in computing history are the IBM 360 family, the Intel x86 microprocessors or the Windows operating system.

Therefore, requirements for systems being used in modern tele- and data communication environments can be summarized as follows:

Traditionally, fault tolerance mechanisms are integrated into the application to be able to tolerate all possible sources of errors in this layer. These mechanisms use special services which are provided by the underlying hardware and are adapted to the application. However, in modern telecommunication systems, this approach is no longer feasible for the following reasons: First, it would be too costly to provide each application with specific, completely new fault tolerance mechanisms.

Second, the fault tolerance mechanisms implemented in the underlying hardware can not be adapted to arbitrary changes in the software configuration. On the other hand, a hardware upgrade should not result in lower reliability at the application level and the fault tolerance mechanisms available in the new hardware should be used with minimal porting effort.

Therefore, the project BALANCE (Balanced High Availability in Layered and Distributed Computing Elements), aims at the investigation of possible solutions to the problem that occurs with the traditional approach.

The basic idea of BALANCE is a Middleware-oriented system view. The Middleware, which is based on fault tolerant and highly available systems, will provide the applications with an abstract and platform independent view of the hardware, without hiding the fault tolerance mechanisms provided by the hardware.