Improving the Performance of DGEMM with MoA and Cache-Blocking (ARRAY 2021)

Write a Blog >>

Sun 20 - Sat 26 June 2021 PLDI

Who

Stephen Thomas, Lenore Mullin, Kasia Swirydowicz

Track

ARRAY 2021

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 21 Jun 2021 18:00 - 18:25 at ARRAY - Session 4 (short talks) Chair(s): Jonathan Ragan-Kelley

Abstract

The goal of this paper is to demonstrate performance enhancements of the high performance dense linear algebra matrix-matrix multiply DGEMM kernel, widely implemented by vendors in the basic linear algebra subroutine BLAS library. The mathematics of arrays (MoA) paradigm due to Mullin (1988) results in contiguous memory accesses in combination with Church-Rosser complete language constructs optimized for target processor architectures. Our performance studies demonstrate that the MoA implementation of DGEMM combined with optimal cache-blocking strategies results in at least a 25% performance gain on both Intel Xeon Skylake and IBM Power-9 processors over the vendor supplied Intel MKL and IBM ESSL basic linear algebra libraries.
Results are presented for the NREL Eagle and ORNL Summit supercomputers.

File attachments

Extended abstract (ARRAY_2021_paper_4 (revised).pdf)	547KiB

Stephen Thomas

National Renewable Energy Laboratory

United States

Lenore Mullin

SUNY Albany, USA

Kasia Swirydowicz

Pacific Northwest National Laboratory