LRR-TUM-Logo Department of Informatics
Technische Universität München
Informatik X: Rechnertechnik und Rechnerorganisation / Parallelrechnerarchitektur
Prof. Dr. Arndt Bode , Prof. Dr. Hans Michael Gerndt
 Home  | Addresses  | Staff  | Research  | Lectures 

HotSwap - Enabling Redundancy at the Peripheral Bus Level 

Dr. W. Karl

Test & Maintenance
Student´s Information
Operating Systems
Related Sites
System Modelling
Project Partners

The usual solution to increase the availability of a system is redundancy at system level, which means adding spare computers being able to replace each other in the case of a failure. But this solution suffers from two severe drawbacks:
  • due to the coarse level of redundancy, many non-live critical components have to be duplicated, resulting in very high costs.
  • the interconnection network between the redundant nodes is relativly slow, leading to long fail-over times. 
A solution to both problems is structural redundancy at the peripheral bus level. A requirement for redundancy at this level is the possibility to remove and insert adapter cards from and into the bus during runtime, a process widely known as hot-swapping. 
Project Partners and Funding
The project is a cooperation between the LRR and FORCE Computers Inc., a leading designer and supplier of open, scalable system- and board-level computer platforms for the embedded market.
Both project partners are funded by the "Bayerische Forschungsstiftung" .
Standardisation Efforts
fhfh We are a member of the PICMG, the PCI Industrial Computers Manufacturers Group, which is a consortium of over 450 companies who collaboratively develop specifications that adapt PCI technology for use in industrial and telecommunications computing applications. PICMG-specifications include the CompactPCI standard, a HotSwap capable PCI-Bus-variant.

To deal with the problems, which arise  using more then one system host board on the same bus, a subcommittee is being founded to specify slot redundancy systems, in which we will participate. 

The Hotswap-Project´s goal is to develop a technique for replacing generic hardware components of open, industrial computers during runtime.  The replacement of hardware components may be necessary due to the following reasons:
  • Exchange of a failed component with an identical component.
  • Upgrade of a component. (that means: remove a component and replace it with a different one.)
  • Adding a new component.
Using conventional industrial computers, a system shutdown is inevitable in these cases, which leads to system unavailability during repair / upgrade. This downtime can be avoided by employing the hotswap technology. Furthermore, structural redundancy at peripheral bus level is possible, e.g. using two redundant ethernet-cards in the same  system. Additionally one could use multiple system boards at the same bus, which would lead to a architectur similar to so-called server-clusters. These aggregations of redundant off-the-shelf computers enable a high-availability solution at relatively low costs. The only difference between nowadays server-clusters and a hot-swap based redundant systems would be the interconnection network: a LAN on the one hand, and direct connection via the local bus on the other hand. Using HotSwap-technology would lead to the following advantages over conventional server clusters:
  • Only "live-critical" components have to be redundant, which will decrease production costs.
  • The fail-over time will be shorter, because no data transfer over the relatively slow LAN will be necessary. The fail-over time has a major impact on the systems mean downtime.
  • Exchange of a failed component will be very easy. This will lead to shorter repair times and lower maintenance costs.
Introducing the HotSwap-ability affects the system on all levels of its architecture, reaching from disturbing electrical effects on the interconnecting bus´ signals during insertion and removal of boards to influences on the systems highest level, the application layer. The goals of our project are to evaluate these impacts, propose possible solutions in a generic way and finally to implement a prototype of a complete HotSwap system based on the CompactPCI local bus.

At the moment, our efforts are concentrated on three main topics:

Internal Information
back to the LRR homepage
back to: Parallel and Distributed Architectures

Max Walter, 1.6.99
(none) Webmaster