Open Theses

The following list is not exhaustive. There is always some student work to be done in various research projects. You can send an email, asking for currently available topics.

Abbreviations:

  • BA = Bachelorarbeit, Bachelor's Thesis
  • MA = Masterarbeit, Master's Thesis
  • IDP = Interdisciplinary Project
  • GR = Guided Research

BA: Performance Evaluation of a Raspberry Pi Cluster

Cluster of 40 Raspberry Pi 3
Cluster of 40 Raspberry Pi 3

What is more energy efficient - a single Skylake processor or a cluster of 40 Raspberry Pi 3?

Our chair has a cluster of 40 Raspberry Pi 3 (HimMUC cluster), each having 4 ARM Cortex-A53 cores. The goal of this thesis is (1) to tune the LINPACK benchmark (HPL) to maximize the reported performance and energy efficiency on the cluster and (2) to compare the results with the performance and energy consumption other servers with x86 processors, e.g. a server powered by two recent Intel Skylake processors.

Prerequisites: Knowledge in computer architecture and linear algebra, and C; knowledge about assembly, the LINPACK benchmark and parallel programming is beneficial.

Contact: If you are interested, contact Alexis Engelke with a grade report of relevant courses (e.g. ERA, ASP, ACA, Parallel Programming) and other experience you have on the subject.

BA/MA/GR: Modeling and Characterizing HPC Cluster Availability

In a current project at our chair, we are analyzing modern High Performance Computing (HPC) systems with heterogeneous architectures towards exascale computing. Major challenges in exascale computing include an increasing number of nodes, dynamic resource allocation and organization, and fault resilience. In this Thesis/Project, a model of failure for HPC Systems should be developed and the availability of HPC Systems shall be quantified.

Work Packages

• Literature Research on Availability Modelling in Cluster/Grid/Cloud/HPC Systems
• Modelling of Availability for large-scale LRZ systems, such as Linux Cluster and SuperMUC
• Model verification 

Full-Description

Contact: Dai Yang

BA/MA: Design and Implementation of a Benchmark for Predicting System Health in High Performance Computing

In a current project at our chair, we analyse modern High Performance Computing (HPC) systems with heterogeneous architectures towards exascale computing. A central element in this project is to find a reliable prediction method, which can determine the current system health state in a given HPC environment. Different methods are being evaluated by our research group currently. One of those is to run an efficient and fast benchmark in order to determine abnormality in system performance. Using such a benchmark and corresponding historical data, one is capable of predicting upcoming system faults, which may lead to a failure.

Contact: Dai Yang

Description: here

MA: Runtime Prediction for OpenCL-Kernels in heterogeneous System using Machine Learning

Motivation Modern computer system consists a large number of heterogeneous processing units (PU). For the efficient usage of such system, the programmer must know different programming models for different hardware architectures. To relief the uses from the complexity, library based runtime system are developed by many current research institutions. Compute Kernels, such as BLAS and cuBLAS, Intel MKL, are developed by architectural experts to gain maximum performance. In addition, runtime scheduling system enables dynamic kernel selection. To achieve best result, information about implementation variance, such as execution time and cost for data transfer must be collect and know beforehand. Typically, these data is collected in a library, which produces extra overhead during runtime. In this thesis, we target this overhead and try to predict runtime by using advance machine learning technology. This Project is a cooperation between TUM and KIT in Karlsruhe, Germany.

For the runtime system HALadapt, a machine learning based runtime prediction was already developed in an earlier work. First, static code analysis is done and metrics such as number of operations, number of memory accesses, etc. is collected. These information, combined with knowledge of application runtime, is used as training data for the machine learning model. With the previous work, we have found out that the variance of prediction is relatively high, making such prediction less useful for standard CPU application. In this thesis, similar methods and measurements is to be collect and analysed for OpenCL-based applications.

Full Description

Contact: Dai Yang

MA/GR: Porting HPCG Benchmark for Application Integrated Load Balancing and Fault Tolerance

In a current project at our chair, we are analyzing modern High Performance Computing (HPC) systems with heterogeneous architectures towards exascale computing. Major challenges in exascale computing include an increasing number of nodes, dynamic resource allocation and organization, and fault resilience. So far, we have developed an extensible yet lightweight library (LAIK) to dynamically manage the application workload for better load balancing and proactive fault tolerance. This way, an upcoming failure can be avoided by proactively migrating application data to other physical location. Furthermore, by using our library, a global rebalancing can be triggered, ensuring application load balancing.

To assess and improve the performance of our library, runtime results from suitable high performance benchmarks are required. One of the most common benchmark suites is the HPCG (high performance conjugate gradient) Benchmark. It is intended to model the data access patterns of real-world applications such as sparse matrix calculations. This benchmark is written in C++/C. In this master’s thesis, a selected subset of the HPCG ist to be ported using our LAIK library. Performance test and analysis are to be conducted on the ported HPCG benchmarks.

More Information

Contact: Dai Yang

MA: Scalable Clustering of Large Scale Sensor Data

 

Description:

Data Mining is an important tool to get useful information from huge data sets in complex systems; especially industrial systems are equipped with many sensors that record the conditions and states of the machines as time series. This data can be used to protect the assets from the failure and also help finding better operational points. In this project we will look into Matrix Profile (MP) and the algorithms to compute it, which allegedly, "has the potential to revolutionize time series data mining" and implement the algorithms for a HPC cluster to study the time series data from gas turbines.

Recommended knowledge:

  1. Experience in parallel programming and High Performance Computing, e.g., MPI
  2. Familiarity with the Hadoop ecosystem and Apache Spark
  3. Knowledge in Machine Learning, data mining and time series analysis

Workpackages:

  1.  Study Matrix Profile and published algorithms to compute it
  2. Implementation of the algorithms and deployment on SuperMUC
  3. Analysis of gas turbine sensor data using the developed tool
  4. Performance analysis and optimization

What you will gain:

  1. Experience in working with large scale systems in one of the world's top super computing centers, LRZ, e.g., SuperMUC
  2. Experience with one of the most common large data analytics scenarios
  3. Working with our industry partner IfTA GmbH; on the frontiers in analysis and monitoring technology for industrial systems
  4. Insights into a real-world problem on supporting the energy grid in collaboration with gas turbine plant operator

 

 

Various MPI-Related Topics

Please Note:

MPI is a high performance programming model and communication library designed for HPC applications. It is designed and standardised by the members of the MPI-Forum, which includes various research, academic and industrial institutions. The current chair of the MPI-Forum is Prof. Dr. Martin Schulz

The following topics are all available as Master's Thesis and Guided Research. They will be advised and supervised by Prof. Dr. Martin Schulz himself, with help of researches from the chair. If you are very familiar with MPI and parallel programming, please don't hesitate to drop a mail to either Dai Yang or Prof. Dr. Martin Schulz

These topics are mostly related to current research and active discussions in the MPI-Forum, which are subject of standardisation in the next years. Your contribution achieved in these topics may make you become contributor to the MPI-Standard, and your implementation may become a part of the code base of OpenMPI.

Many of these topics require a collaboration with other MPI-Research bodies, such as the Lawrence Livermore National Laboratories and Innovative Computing Laboratory. Some of these topics may require you to attend MPI-Forum Meetings which is at late afternoon (due to time synchronisation worldwide). Generally, these advanced topics may require more effort to understand and may be more time consuming - but they are more prestigious, too. 

MA/GR: Porting LAIK to Elastic MPI & ULFM

LAIK is a new programming abstraction developed at LRR-TUM

  • Decouple data decompositionand computation, while hiding communication
  • Applications work on index spaces
  • Mapping of index spaces to nodes can be adaptive at runtime
  • Goal: dynamic process management and fault tolerance
  • Current status: works on standard MPI, but no dynamic support

Task 1: Port LAIK to Elastic MPI

  • New model developed locally that allows process additions and removal
  • Should be very straightforward

Task 2: Port LAIK to ULFM

  • Proposed MPI FT Standard for “shrinking” recovery, prototype available
  • Requires refactoring of code and evaluation of ULFM

Task 3: Compare performance with direct implementations of same models on MLEM

  • Medical image reconstruction code
  • Requires porting MLEM to both Elastic MPI and ULFM

Task 4: Comprehensive Evaluation

MA/GR: Lazy Non-Collective Shrinking in ULFM

ULFM (User-Level Fault Mitigation) is the current proposal for MPI Fault Tolerance

  • Failures make communicators unusable
  • Once detected, communicators an be “shrunk”
  • Detection is active and synchronous by capturing error codes
  • Shrinking is collective, typically after a global agreement
  • Problem: can lead to deadlocks

Alternative idea

  • Make shrinking lazy and with that non-collective
  • New, smaller communicators are created on the fly

Tasks:

  • Formalize non-collective shrinking idea
  • Propose API modifications to ULFM
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI forum

MA/GR: A New FT Model with “Hole-Y” Shrinking

ULFM works on the classic MPI assumptions

  • Complete communicator must be working
  • No holes in the rank space are allowed
  • Collectives always work on all processes

Alternative: break these assumptions

  • A failure creates communicator with a hole
  • Point to point operations work as usual
  • Collectives work (after acknowledgement) on reduced process set

Tasks:

  • Formalize“hole-y” shrinking
  • Proposenew API
  • Implement prototype in Open MPI
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

MA/GR: Prototype for MPI_T_Events

With MPI 3.1, MPI added a second tools interface: MPI_T

  • Access to internal variables 
  • Query, read, write
  • Performance and configuration information
  • Missing: event information using callbacks
  • New proposal in the MPI Forum (driven by RWTH Aachen)
  • Add event support to MPI_T
  • Proposal is rather complete

Tasks:

  • Implement prototype in either Open MPI or MVAPICH
  • Identify a series of events that are of interest
  • Message queuing, memory allocation, transient faults, …
  • Implement events for these through MPI_T
  • Develop tool using MPI_T to write events into a common trace format
  • Performance evaluation

Possible collaboration with RWTH Aachen

 

MA/GR: Prototype Local MPI Sessions

New concept discussed in the MPI forum: MPI Sessions

  • Avoid global initialization if not necessary
  • Enable runtime system to manage smaller groups of processes
  • Provide groups for containment and resource isolation

Currently two modes of thinking

  • The main proposal builds on local operations and only at the end switches to global
  • Alternative: treat sessions as global objects

Tasks:

  • FormalizeMPI Sessions using local operations
  • Complete API proposal
  • Implement prototype in Open MPI or MVAPICH
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

Possible collaboration with EPCC (Edinburgh)

MA/GR: Prototype Global MPI Sessions

New concept discussed in the MPI forum: MPI Sessions

  • Avoid global initialization if not necessary
  • Enable runtime system to manage smaller groups of processes
  • Provide groups for containment and resource isolation

Currently two modes of thinking

  • The main proposal builds on local operations and only at the end switches to global
  • Alternative: treat sessions as global objects

Tasks:

  • Formalize MPI Sessions using global operations
  • Complete API proposal
  • Implement prototype in Open MPI or MPICH
  • Evaluate performance
  • Create proposal that can be discussed in the MPI Forum

Bonus:

  • Work with “local sessions” topic on a clean comparison

MA/GR: Evaluation of PMIx on MPICH and SLURM

PMIxis a proposed resource management layer for runtimes (for Exascale)

  • Enables MPI runtime to communicate with resource managers
  • Come out of previous PMI efforts as well as the Open MPI community
  • Under active development / prototype available on Open MPI

Tasks: 

  • Implement PMIx on top of MPICH or MVAPICH
  • Integrate PMIx into SLURM
  • Evaluate implementation and compare to Open MPI implementation
  • Assess and possible extend interfaces for tools 
  • Query process sets

MA/GR: Active Messaging for Charm++ or Legion

MPI was originally intended as runtime support not as end user API

  • Several other programming models use it that way
  • However, often not first choice due to performance reasons
  • Especially task/actor based models require more asynchrony

Question: can more asynchronmodels be added to MPI

  • Example: active messages

Tasks:

  • Understand communication modes in an asynchronmodel
  • Charm++: actor based (UIUC)•Legion: task based (Stanford, LANL)
  • Propose extensions to MPI that capture this model better
  • Implement prototype in Open MPI or MVAPICH
  • Evaluation and Documentation

Possible collaboration with LLNL and/or BSC

MA/GR: Crazy Idea: Multi-MPI Support

MPI can and should be used for more than Compute

  • Could be runtime system for any communication
  • Example: traffic to visualization / desktops

Problem:

  • Different network requirements and layers
  • May require different MPI implementations
  • Common protocol is unlikely to be accepted

Idea: can we use a bridge node with two MPIs linked to it

  • User should see only two communicators, but same API

Tasks:

  • Implement this concept coupling two MPIs
  • Open MPI on compute cluster and TCP MPICH to desktop
  • Demonstrate using on-line visualization streaming to front-end
  • Document and provide evaluation
  • Warning: likely requires good understanding of linkers and loaders