Introdution to Power-Aware HPC
Lecture in the winter-term 2020/21
Prof. Dr. D. Kranzlmüller,
Dr. Hayk Shoukourian
This course will be held in English!
Welcome to the course webpage for Introdution to Power-Aware HPC for winter-term 2020/21 at LMU Munich. Here you will be able to find the details on the lecture and the accompanying practical project.
Welcome to the course webpage Introdution to Power-Aware HPC for winter-term 2020/21 at LMU Munich.
- Registration will be opened on the 3rd of September via UNI2WORK
(NOTE: registration closes on 11.10.2020 at 18:00)
- The lectures are scheduled for Wednesdays from 10:00 to 12:00. The first lecture will take place on 04.11.2020 at 10:00. The room number is 220 (Amalienstr. 73A)
Contents of the lecture
Some of the current High Performance Computing (HPC) systems already consume more than 15 MW of power - a sufficient amount of power for sustaining a small city.
Energy consumption is becoming a dominating factor for the Total Cost of Ownership of many HPC systems, making high-performance design and energy-efficient design in
many ways synonymous.
Apart from the high power bills, power consumptions of these magnitudes act as a limiting factor in building and operating Exascale systems, i.e. next generation of HPC systems that are capable of performing 1018 floating point operations per second. This could already cause the entire
data center's power delivery and cooling infrastructures to breach the safety limits as well as affect the environmental sustainability by producing high carbon footprint.
Therefore, it is important to be preemptive in improving energy/power efficiency of HPC data centers.
This course explores different energy consumption issues in modern HPC data centers, discusses their impacts on the design of new computing systems and presents different strategies that aim to reduce the overall power consumption.
The lecture will cover the main concepts of energy consumption paradigms that should remain valid despite the continuous technological changes in the area.
Upon completion of this course the participants should acquire knowledge on:
- the importance of power/energy-efficiency for modern data centers
- the theory behind a variety of impacts that power dissipation in a CMOS chip has on HPC data centers
- contemporary tools for monitoring different power consumption related metrics
- diverse techniques on energy-efficiency tuning
- power-related challenges for next generation HPC systems
- contemporary resource management and scheduling techniques that are tuned for energy-efficiency
- power variation in homogeneous HPC systems and the potential of possible cost savings
- Intel's Model Specific Registers (MSRs) used for power management support
- principles of various machine learning techniques and their applications for intelligent power management
- high-frequency data collection techniques
- datacenter basics (understand the building blocks of modern datacenters and learn about possible architectures)
The course is intended for master students of computer science and related fields. The lecture and the project work have a cumulative weight of 6 ECTS.
More formally, in German:
Die Vorlesung richtet sich an Master-Studierende der Informatik. Für die Vorlesung und die Projektarbeit werden 6 ECTS-Punkte vergeben.
The number of students will be limited to 20. The registration will open 03.09.2020 from 08:00 via UNI2WORK and will close on 11.10.2020 at 18:00.
- Python knowledge
- Interest in energy-efficient supercomputing
- Interest in developing machine-learning frameworks
- Lecture: Wednesdays, 10:15 to 11:45 in room 220 in Amalienstr. 73A. The first lecture will be held on 4th of November 2020.
- Guided Tour at Leibniz Supercomputing Centre (LRZ): TBA. Meeting point: LRZ, 85748 Garching bei Muenchen.
- Exam: TBA (see the exam section for more details)
- Repeat Exam: TBA (see the exam section for more details)
Project: "Increasing Cooling Efficiency of a Data Center"
This project aims at building Machine-Learning (ML) based models for predicting the power consumption of a HPC data center's cooling loop.
Participants will form groups, where each group will be assigned with an annual operational data obtained at Leibniz Supercomputing Centre (LRZ).
The provided data will contain various sensor measurements from LRZ's building infrastructure.
Each group of students would need to analyze the data, design and develop a ML-based model capable of predicting the power consumption of LRZ's warm-water cooling loop.
During this project students will gain an experience that could be applied not only to HPC data centers but also to other domains involving ML-based modeling.
The detailed description of the project assignment will follow during the lecture.
The training data can be found here Project Section.
There will be a written examination (closed book) which will be held in February 2021. The exact time and room will be published as soon as possible.
The retake of the exam is scheduled for:
The lecture notes are available in the Download Section.
CMOS VLSI Design: A Circuits and Systems Perspective (4th Edition) by Neil Weste, David Harris
Computer Organization and Design RISC-V Edition: The Hardware Software Interface by David A. Patterson, John L. Hennessy
Energy-Efficient Distributed Computing Systems by Albert Y. Zomaya, Young Choon Lee
Machine Learning: A Probabilistic Perspective by Kevin P. Murphy
Machine Learning: An Algorithmic Perspective, second edition by Stephen Marsland
Introduction to Apache Flink: Stream Processing for Real Time and Beyond By Ellen, M.D. Friedman, Kostas Tzoumas
The Data Center as a Computer by Luiz André Barroso, Jimmy Clidaras, Urs Hölzle
Additional scholary articles: sources will be indicated in the course slides
, or per appointment, or after lectures.