Tutorials

The following tutorials have been accepted and will be held during the conference:

Hands-On Tutorial for Mastering Linux Performance Analysis Tools
Top-Down Performance Profiling and Bottleneck Identification for AI/ML Workloads using the Ampere Performance Toolkit
Energy Efficiency Benchmarking of GPU Servers - Why It’s Challenging, What Matters, and How to Do It Right

Hands-On Tutorial for Mastering Linux Performance Analysis Tools

Timings & Location

Mon 4 May, 09:00-12:30, Room: Rita Levi Montalcini

Authors

Adel Belkhiri, Heng Li - Polytechnique Montreal

Abstract

Modern software systems operate within increasingly complex execution environments, making performance diagnosis a critical and challenging task for developers, engineers, and researchers. Effectively identifying performance bottlenecks requires powerful tools and a clear methodology for selecting, configuring, and combining complementary analysis techniques. This tutorial presents a comprehensive, hands-on introduction to three widely used and production-ready Linux performance analysis tools: perf, LTTng, and Trace Compass. These tools complement each other, and when used together, they can provide a comprehensive view of the performance of the target system.

The tutorial guides participants through a complete performance analysis workflow, combining statistical profiling, low-overhead tracing, and advanced trace visualization to diagnose performance issues in Linux systems. Attendees will learn how to install and configure each tool, understand their respective strengths and trade-offs, and apply them effectively to analyze application- and system-level behavior. Through guided hands-on exercises, participants will gain practical experience in profiling execution hotspots, tracing system and application events, correlating performance data across layers, and identifying the root causes of performance bottlenecks.

By the end of the session, participants will be equipped with the practical skills and methodological knowledge needed to perform reproducible and efficient performance analyses using state-of-the-art Linux tools, enabling them to diagnose and resolve performance issues in both research and production environments.

Preparation and Teaching Method

First, the presenters will give a theoretical overview of the performance diagnostics tools using slides and a projector. Then, they will illustrate the use of these tools in diagnosing and solving performance problems through live demonstrations. Finally, they will ask attendees to perform guided tasks on their laptops and to use the tools in real-world scenarios.

Therefore, attendees are expected to bring a Linux-capable laptop for software installation and exercises. To ensure that participants can fully benefit from the hands-on components of this tutorial, a minimal level of preparation is required.

The recommended setup is as follows:

Operating System: A recent Linux distribution, preferably Ubuntu 20.04 LTS or newer. Other modern Linux distributions may be used, provided they support perf, LTTng, and Trace Compass.
Execution Environment: Participants may use either a native Linux installation or a Linux virtual machine (e.g., VirtualBox, VMware), with appropriate permissions to access performance counters and kernel tracing features.
System Privileges: Administrative (root or sudo) access is required for installing tools and enabling tracing and profiling features.
Hardware Resources: At least 8 GB of RAM and sufficient disk space (a few gigabytes) to store trace data are recommended.
Network Connectivity: A stable internet connection is required to download and install the tools and supporting packages during the tutorial.

Short Bio

Adel Belkhiri received his Ph.D. in computer engineering from Polytechnique Montréal in 2021 and is currently a research associate at Polytechnique Montréal. His research interests focus on software support for performance analysis, fault diagnosis, and runtime observability in complex and distributed systems.

Heng Li is an associate professor in the Department of Computer and Software Engineering at Polytechnique Montreal. He holds a Ph.D. in Computing from Queen’s University (Canada), an M.Sc. from Fudan University (China), and a B.Eng. from Sun Yat-sen University (China). Before his academic career, he worked in the industry for years as a software engineer at Synopsys and a software performance engineer at BlackBerry.

Top-Down Performance Profiling and Bottleneck Identification for AI/ML Workloads using the Ampere Performance Toolkit

Timings & Location

Mon 4 May, 14:00-17:30, Room: Rita Levi Montalcini

Authors

Bhakti Hinduja, Zach Meyers, Tito Reinhart, Simran Gambani - Ampere Computing

Abstract

Identifying performance bottlenecks in complex AI/ML workloads, which span intricate application software and diverse hardware architectures, presents a significant challenge. Pinpointing performance inhibitors across this vast stack is difficult, making an understanding of a workload’s unique execution signature crucial for developing targeted and effective performance optimizations. The Ampere Performance Toolkit addresses this by providing a hierarchical, top-down methodology for workload observability and profiling. This tutorial demonstrates how to effectively pinpoint performance inhibitors across combined application software and hardware architecture.

The approach begins with a two-stage process. First, the Ampere-System-Profiler offers a holistic platform view by monitoring CPU, network, disk, and kernel activity. This identifies and rules out high-level system bottlenecks, which is vital before advancing to more targeted, lower-level profiling efforts. If no apparent system-level issues are found, the analysis proceeds. The Ampere-PMU-Profiler then takes a deeper dive, collecting Performance Monitoring Unit (PMU) event data visualized through sunburst plots, illustrating how instructions are distributed across the CPU. This methodical application of the Ampere Performance Toolkit facilitates precise bottleneck identification, driving significant performance optimizations for demanding workloads.

Preparation and Teaching Method

The tutorial will include theoretical explanations, demonstrate practical use cases and benchmarking runs, and encourage interactive discussions and Q&A. The tutorial will be supplemented with a comprehensive slide deck explaining the tutorial agenda.

Intended Audience Skill Level: Intermediate
Prerequisite Knowledge: Familiarity with common Linux utilities and CLI-based applications.

Short Bio

Zach Meyers is a Senior Performance Engineer at Ampere Computing and is the code owner/maintainer of the ampere-performance-toolkit and ampere-perfkit-benchmarker projects. He leverages the use of open-source tooling to solve real-world customer-facing problems to drive outcomes that innovate and advance the growing aarch64-based server compute space.

Tito Reinhart is an Experienced Performance Engineer at Ampere Computing and primarily works on performance application development on aarch64/ARM64-based server platforms, including workload characterization and optimization. Tito is a key developer for Ampere Performance Toolkit and code owner of the Ampere System Profiler (ASP), one of the key system profiling modules. His work aims to enable other software and performance engineers to more effectively identify and root-cause performance bottlenecks for cloud-native and AI/ML workloads.

Simran Gambani is an experienced Performance Engineer at Ampere Computing and a key developer of the Ampere PerfKit Benchmarker, developing benchmarks and frameworks to evaluate and scale performance of LLM and ML workloads on current and future aarch64-based compute platforms. She works on identifying bottlenecks, workload characterization, and driving performance improvements in software stacks.

Bhakti Hinduja is the Director of Application Engineering at Ampere. She manages software applications and tools for Ampere aarch64 systems and works with key customers to support their applications on Ampere systems.

Energy Efficiency Benchmarking of GPU Servers - Why It’s Challenging, What Matters, and How to Do It Right

Timings & Location

Tue 5 May, 09:00-12:30, Room: Rita Levi Montalcini

Authors

Maximilian Meissner (University of Wuerzburg), Klaus-Dieter Lange (Hewlett Packard Enterprise), Sanjay Sharma (NVIDIA), Aaron Cragin (Microsoft), David Reiner (AMD), Samuel Kounev (University of Wuerzburg)

Abstract

The rapid rise of GPU-accelerated workloads in AI and high-performance computing has driven a steep increase in global data center power consumption. Improving the energy efficiency of GPU servers depends on trusted benchmarks built from rigorous measurement methods, representative workloads, and robust metrics. Energy efficiency benchmarking differs from performance benchmarking in that it introduces additional variables, power measurements, sampling strategy, thermal and power-management settings, and workload determinism, that must be carefully controlled. Without disciplined setup and documentation, results can be misleading, inconclusive, or irreproducible.

Reproducibility is a fundamental concern for the scientific community, and understanding the subtleties of power and efficiency measurement is essential for meaningful experiments. This is particularly important for GPU systems, where workload specialization, tight hardware-software co-optimization, and vendor-specific behavior mean small changes can have disproportional effects. In this tutorial, industry experts from the SPECpower and SPEC ISG Server Efficiency Committees present best practices and lessons learned from implementing GPU workloads for the latest SPEC efficiency benchmarks, providing practical guidance for reliable, reproducible energy-efficiency experiments on GPU servers.

Preparation and Teaching Method

Proposed duration: 3 hours
Intended audience skill level: Beginner to intermediate
Prerequisite knowledge required: No special prerequisite knowledge required
Audio-visual and technical requirements: No special technical requirements for the participants

Short Bio

Klaus-Dieter Lange is a Distinguished Technologist at Hewlett Packard Enterprise, Houston, TX, where he has worked since 1998. He began his professional career after receiving his bachelor degrees from the University of Applied Sciences, Lübeck, Germany and the Milwaukee School of Engineering, WI, USA in 1997. Klaus spent his career working on performance evaluation, workload characterization, benchmark development, and energy efficiency of computer systems. He has been active in several of the Standard Performance Evaluation Corporation (SPEC) committees since 2005 and serves as a Director of its Board. Klaus is the founding chair of the SPECpower Committee that developed the SPEC Power and Performance Methodology, the SPECpower_ssj2008 benchmark, the SPEC PTDaemon Interface, and the SPEC SERT suite. Klaus received several SPECtacular Awards over the years, and the SPEC Presidential Award in 2013 and 2021. He has served on the program committees of many conferences and workshops and as the General Co-Chair for ICPE 2014 in Dublin, and the SPEC Symposium 2016 Asia Summit in Beijing. He co-founded the SPECpower Research Working Group in 2016 and co-presented the tutorials “Measuring and Benchmarking Power Consumption and Energy Efficiency” at ICPE 2018 and “SPEC Efficiency Benchmark Development: How to Contribute to the Future of Energy Conservation” at ICPE 2022.

Maximilian Meißner is a researcher at the University of Wuerzburg in Germany. He is currently working on his dissertation on the topic of benchmarking and improving energy efficiency in the context of servers, data centers, and cloud. Maximilian is a member of the SPECpower Committee since October 2021, member of the SPEC ISG Server Efficiency Committee, and chair of the SPECpower Research Working Group since early 2022. He received his master’s degree in Computer Science in September 2021 at the University of Wuerzburg. He is organizer and co-lecturer of the Master-level lecture “Systems Benchmarking” at the University of Wuerzburg. He organized/co-presented the tutorial “SPEC Efficiency Benchmark Development: How to Contribute to the Future of Energy Conservation” at ICPE 2022.

Sanjay Sharma is a Principal Energy Efficiency Engineer at NVIDIA Corporation. Sanjay has been with NVIDIA since February 2025, and one of his primary functional areas is performance and power efficiency benchmarking and analysis of desktop and server systems. Sanjay began his career in 1990, after acquiring his bachelor’s degree in Computer Engineering from the University of Bombay, India.

Aaron Cragin has worked in the IT industry for over two decades, spanning systems benchmarking, technical marketing, and end-to-end system development. Working for HPE from 2004 to 2012 and Microsoft since then, Aaron joined Microsoft as part of the WCS program that developed Microsoft’s first in-house servers. Currently, as a Principal Ecodesign Engineer, Aaron engages externally with industry and regulatory groups to impact regulations and tooling for compliance requirements. Internally, he works across organizations to enable compliance tool support for custom silicon, gather stakeholder feedback, and forecast process changes. Within industry, Aaron serves on the Board of Directors at SPEC and engages with various committees across Digital Europe, ITI, The Green Grid, and ETSI.