Optimize Binary for Size
I'm working on very small (less than 1k LoC, including tests) command line application. I'm satisfied with implemented feature set, and I'm preparing to ship it to potential users.
As an exercise, I wanted to play with binary optimization techniques in Rust. As a result I was able to go from huge 75 MB binary in debug mode, to 3.9 MB after all optimizations without changing single line of rust code.
I won't go into details of the application (maybe in a separate post), but it's intended to work as an External Tool for IntelliJ IDEA to allow quick content synchronization between AEM (Adobe Experience Manager) instance and local file system. It will make my, and others, day-to-day work easier.
Because the target audience is technical people, I decided to ship the app as a binary which can be downloaded from GitHub.
The system I'm working on is 64 bit Ubuntu 20.04 with Quad Core Intel Core i7-7700HQ 3.8 GHz. I'm using rustc version 1.53.0.
Tricks
By default, Rust optimizes for execution speed. In order to optimize for size, we need to apply few tricks.
Do note that we saved some KB by using Rust in version >= 1.32. Since Rust 1.32, jemalloc, which is custom memory allocator, is removed. This allocator often outperforms system allocator, but it's not must have for every binary, at the end we can still change the allocator if we need.
Release mode
First, we need to compile in release mode. It's not really a trick, but a must-have while you plan to release an app. By default, cargo compiles our code in dev mode. The resulting binary is huge. On my system it's 75 MB in size! That's crazy, but it has a reason. In dev mode, our resulting binary contains a lot of debug information and metadata useful while debugging your program. Additionally, in release mode, rustc performs a lot of optimizations which are not present in dev mode.
Using release mode we are going down to from 75 MB to 9.9 MB.
Changing opt-level
This option controls how many optimizations are performed on our code. There is a compromise between optimization level and compile time. Higher optimization level results in faster code, but costs us time while compiling. That's why, by default in dev mode, it's set to 0 which means that there are no optimizations done. Those are level which we can set:
- 0 — no optimizations — default for dev mode
- 1 — basic optimization
- 2 — more optimizations
- 3 — all optimizations
- "s" — optimize for size
- "z" — optimize for size but also disables loop vectorization
Loop vectorization is a technique which leverages SIMD instructions on modern CPUs. SIMD stands for Single Instruction Multiple Data — it basically allows to execute one instruction on multiple pieces of data at the same time. For loops, it tries to unroll the loop to allow executing SIMD instruction which also means that resulting binary is bigger. The surprising result here was that using option "s" is actually better on my machine, so let's use that.
After using
opt-level = "s"
, we are going down from 9.9 MB to 9.8 MB.
Changing LTO
LTO stands for Link Time Optimization. Linker is able to optimize the code during linking. This setting allows to tell the linker how wide part of code should it take into account while performing optimization. Possible settings:
- false — only local crate is taken into account
- true — search for optimizations across all crates within dependency graph
- "thin" — similar to above but takes less time
- "off" — disables LTO
After using
lto = true
, we are going down from 9.8 MB to 6.7 MB.
Just out of curiosity lto = "thin"
resulted in 7.9 MB binary.
Changing codegen-units
This setting allows splitting code into multiple units which are then compiled in parallel. It makes the compilation faster, but performed optimizations are worse. This option takes integer bigger than 0 as a value. By setting it to 1 we are making compilation slower, but resulting code is smaller.
After setting
codegen-units = 1
, we are going down from 6.7 MB to 6.1 MB.
Changing panic
This setting controls how panics should behave. There are two options:
- "unwind" — the default behavior for both dev and release mode
- "abort" — tells rustc to stop execution without unwinding the stack
Thanks to the "abort"
value, we are getting rid of additional code which unwinds
the stack, thus making our binary smaller.
After setting
panic = "abort"
, we are going from 6.1 MB to 5.6 MB.
Using strip
The last trick we'll use is the strip
command line app. It comes with binutils.
It turns out that binary contains debug symbols which are not needed for our
binary to run. We can run strip <binary>
to get rid of those symbols.
After using
strip
we are going down from 5.6 MB to 3.9 MB.
Summary
Using few tricks, I was able to optimize binary size from 75 MB down to 5.6 MB. None of those tricks required a code change. Tricks used:
- compile in release mode to get rid of debug symbols and other debug-useful metadata
- change optimization level, using
opt-level
setting, to "s" which optimizes binary for size in a cost of compilation time lto
option allows optimizing binary during linking, we set it to true, to optimize across all crates within dependency graphcodegen-units
allows preventing splitting the code into smaller parts which allows smarter optimizationspanic
set toabort
removes some code which is responsible for stack unwinding which makes our binary smallerstrip
binary allows to remove debug symbols which are not required to correctly run our application, so removing them makes our binary smaller
And that's all. I just want to note that there is more that we can do to optimize
our binary size. By default, Rust ships its standard library stdlib
with every
program. It's statically linked with our binary, and we don't have control over
how this library is compiled. For example, it's optimized for speed instead of
size, we can't control the panic behavior of it etc. To fix that we can recompile
stdlib using Xargo, or we can get rid of
stdlib entirely using #![no_std]
.