Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study (PLDI 2021 - PLDI Research Papers)

Who

Xiaolei Ren, Michael Ho, Jiang Ming, Jeff Yu Lei, Li Li

Track

PLDI 2021

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Thu 24 Jun 2021 09:10 - 09:15 at PLDI-A - Talks 3A: Analysis and Synthesis
Thu 24 Jun 2021 21:10 - 21:15 at PLDI-A - Talks 3A: Analysis and Synthesis

Abstract

Hunting binary code difference without source code (i.e., binary diffing) has compelling applications in software security. Due to the high variability of binary code, existing solutions have been driven towards measuring semantic similarities from syntactically different code. Since compiler optimization is the most common source contributing to binary code differences in syntax, testing the resilience against the changes caused by different compiler optimization settings has become a standard evaluation step for most binary diffing approaches. For example, 47 top-venue papers in the last 12 years compared different program versions compiled by default optimization levels (e.g., -Ox in GCC and LLVM). Although many of them claim they are immune to compiler transformations, it is yet unclear about their resistance to non-default optimization settings. Especially, we have observed that adversaries explored non-default compiler settings to amplify malware differences.

This paper takes the first step to systematically studying the effectiveness of compiler optimization on binary code differences. We tailor search-based iterative compilation for the auto-tuning of binary code differences. We develop BinTuner to search near-optimal optimization sequences that can maximize the amount of binary code differences. We run BinTuner with GCC 10.2 and LLVM 11.0 on SPEC benchmarks (CPU2006 & CPU2017), Coreutils, and OpenSSL. Our experiments show that at the cost of 279 to 1,881 compilation iterations, BinTuner can find custom optimization sequences that are substantially better than the general -Ox settings. BinTuner's outputs seriously undermine prominent binary diffing tools' comparisons. In addition, the detection rate of the IoT malware variants tuned by BinTuner falls by more than 50%. Our findings paint a cautionary tale for security analysts that attackers have a new way to mutate malware code cost-effectively, and the research community needs to step back to reassess optimization-resistance evaluations.

DOI

https://doi.org/10.1145/3453483.3454035

Xiaolei Ren

University of Texas at Arlington

United States

Michael Ho

University of Texas at Arlington

United States

Jiang Ming

University of Texas at Arlington

United States

Jeff Yu Lei

University of Texas at Arlington

United States

Li Li

Monash University

Australia

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Thu 24 Jun
Displayed time zone: Eastern Time (US & Canada) change

09:00 - 09:40	Talks 3A: Analysis and SynthesisPLDI at PLDI-A +12h

09:00 5m Talk		Trace-Based Control-Flow Analysis PLDI Benoît Montagu Inria, Thomas P. Jensen Inria DOI
09:05 5m Talk		Demanded Abstract Interpretation PLDI Benno Stein University of Colorado at Boulder, Bor-Yuh Evan Chang University of Colorado at Boulder; Amazon, Manu Sridharan University of California at Riverside DOI
09:10 5m Talk		Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study PLDI Xiaolei Ren University of Texas at Arlington, Michael Ho University of Texas at Arlington, Jiang Ming University of Texas at Arlington, Jeff Yu Lei University of Texas at Arlington, Li Li Monash University DOI
09:15 5m Talk		Chianina: An Evolving Graph System for Flow- and Context-Sensitive Analyses of Million Lines of C Code PLDI Zhiqiang Zuo Nanjing University, Yiyu Zhang Nanjing University, Qiuhong Pan Nanjing University, Shenming Lu Nanjing University, Yue Li Nanjing University, Linzhang Wang Nanjing University, Xuandong Li Nanjing University, Guoqing Harry Xu University of California at Los Angeles DOI
09:20 5m Talk		Termination Analysis without the Tears PLDI Shaowei Zhu Princeton University, Zachary Kincaid Princeton University DOI
09:25 5m Talk		Reverse Engineering for Reduction Parallelization via Semiring Polynomials PLDI Akimasa Morihata University of Tokyo, Shigeyuki Sato University of Tokyo DOI
09:30 5m Talk		RbSyn: Type- and Effect-Guided Program Synthesis PLDI Sankha Narayan Guria University of Maryland, Jeffrey S. Foster Tufts University, David Van Horn University of Maryland DOI
09:35 5m Talk		Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI Di Wang Carnegie Mellon University, Jan Hoffmann Carnegie Mellon University, Thomas Reps University of Wisconsin DOI

21:00 - 21:40	Talks 3A: Analysis and SynthesisPLDI at PLDI-A

21:00 5m Talk		Trace-Based Control-Flow Analysis PLDI Benoît Montagu Inria, Thomas P. Jensen Inria DOI
21:05 5m Talk		Demanded Abstract Interpretation PLDI Benno Stein University of Colorado at Boulder, Bor-Yuh Evan Chang University of Colorado at Boulder; Amazon, Manu Sridharan University of California at Riverside DOI
21:10 5m Talk		Unleashing the Hidden Power of Compiler Optimization on Binary Code Difference: An Empirical Study PLDI Xiaolei Ren University of Texas at Arlington, Michael Ho University of Texas at Arlington, Jiang Ming University of Texas at Arlington, Jeff Yu Lei University of Texas at Arlington, Li Li Monash University DOI
21:15 5m Talk		Chianina: An Evolving Graph System for Flow- and Context-Sensitive Analyses of Million Lines of C Code PLDI Zhiqiang Zuo Nanjing University, Yiyu Zhang Nanjing University, Qiuhong Pan Nanjing University, Shenming Lu Nanjing University, Yue Li Nanjing University, Linzhang Wang Nanjing University, Xuandong Li Nanjing University, Guoqing Harry Xu University of California at Los Angeles DOI
21:20 5m Talk		Termination Analysis without the Tears PLDI Shaowei Zhu Princeton University, Zachary Kincaid Princeton University DOI
21:25 5m Talk		Reverse Engineering for Reduction Parallelization via Semiring Polynomials PLDI Akimasa Morihata University of Tokyo, Shigeyuki Sato University of Tokyo DOI
21:30 5m Talk		RbSyn: Type- and Effect-Guided Program Synthesis PLDI Sankha Narayan Guria University of Maryland, Jeffrey S. Foster Tufts University, David Van Horn University of Maryland DOI
21:35 5m Talk		Central Moment Analysis for Cost Accumulators in Probabilistic Programs PLDI Di Wang Carnegie Mellon University, Jan Hoffmann Carnegie Mellon University, Thomas Reps University of Wisconsin DOI