Learning MLIR and HLO by Building a Tiny StableHLO-to-LLVM IR Compiler
Summary
This project explored how to lower machine-learning-oriented IR into low-level compiler IR by building a small MLIR-based compiler. The tool reads StableHLO MLIR, applies a sequence of lowering and cleanup passes, lowers the result to the MLIR LLVM dialect, translates that module to standard textual LLVM IR, and prints the final .ll-style output.
The work had three major outcomes:
- A validated local toolchain setup establishing that
StableHLO v1.13.9is compatible with the localLLVM-22.1.3-Linux-X64installation, with a small patch to support a library-only StableHLO build. - A working end-to-end prototype that accepts StableHLO input and produces textual LLVM IR.
- A practical workflow for agent-assisted compiler development, where coding agents helped accelerate documentation lookup, pass discovery, incremental code edits, rebuild-and-run cycles, and failure-driven replanning.
This mini project was successful because it followed an incremental implementation strategy, verified IR at each stage, used realistic sample input to validate the lowering pipeline, and made effective use of coding agents to reduce the implementation-detail burden of working with MLIR and StableHLO.
Project Goal
The goal of the project was to become familiar with:
- MLIR as a multi-level compiler infrastructure,
- HLO-family IRs, especially StableHLO, and
- the path from a machine-learning frontend representation to low-level code generation.
The intended deliverable was a compiler executable that:
- reads StableHLO source code from a file,
- parses it into MLIR,
- runs a sequence of lowering and canonicalization passes,
- lowers the program through progressively lower-level MLIR dialects,
- reaches the MLIR LLVM dialect,
- translates that module into textual LLVM IR, and
- emits the result as a
.llfile.
A representative CLI shape is:
1 | |
Background
Lowering in MLIR
A useful way to understand MLIR lowering is as dialect elimination. Given a module containing a set of dialects, the task is to drive the program toward a target dialect set.
For example:
1 | |
This naturally leads to questions such as:
- How do we eliminate
affine? - How do we eliminate
scf? - How do we eliminate
func? - How do we eliminate
arith? - How do we eliminate
memref? - How do we eliminate
cfif it is introduced along the way?
A typical candidate pass sequence might include:
lower-affinescf-to-cffunc-to-llvmarith-to-llvmmemref-to-llvmcf-to-llvmreconcile-unrealized-casts
MLIR does not provide a fully general automatic planner for this process. In practice, lowering is driven by knowledge of:
- Which dialects are currently present?
- Which dialects are desired?
- Which passes are available?
- The preconditions and postconditions of those passes.
That makes the work highly iterative and well-suited to a Karpathy loop of planning, editing, building, running, and revising.
Where StableHLO Comes From
Machine learning frameworks such as JAX do not normally expose LLVM IR directly. Instead, they lower from frontend programs through higher-level compiler representations. For this project, the most useful interchange representation was StableHLO because it:
- is an MLIR dialect,
- preserves tensor-level semantics,
- supports portability and interchange, and
- remains much closer to the source computation than LLVM IR.
A practical way to obtain StableHLO is to lower a jax.jit-compiled function and request compiler IR in the stablehlo dialect.
Example:
1 | |
Conceptually, this kind of program lowers to operations such as:
stablehlo.dot_generalfor matrix multiplication,stablehlo.broadcast_in_dimfor bias expansion,stablehlo.addfor bias addition,stablehlo.constantfor constants, andstablehlo.maximumfor ReLU.
Local Environment and Compatibility Findings
The project used a hub-and-spoke dependency model:
- The fixed local LLVM installation served as the hub;
- Different StableHLO tags acted as spokes;
- Compatibility was determined empirically by configuring and building candidate versions.
Observed Version Boundary
The newest StableHLO tag verified to build successfully against the installed LLVM package was:
v1.13.9
Observed results:
v1.13.7,v1.13.8, andv1.13.9build successfully;v1.14.0and newer tested tags fail against this LLVM installation due to MLIR API drift.
The local checkout was moved to v1.13.9, patched for a library-only build, and installed successfully.
Packaging Caveat
The LLVM install package includes MLIR headers, libraries, and CMake packages, but it does not provide the full test/tool integration expected by StableHLO’s default build. In particular, the stock StableHLO build expects targets or tools such as:
FileChecknot- a usable
LLVM_EXTERNAL_LIT
This matters for building the default tests, tools, and integration targets. It does not matter for a library-only build, provided those subdirectories are excluded from the build.
Minimal Patch for a Library-Only StableHLO Build
For this project, a library-only StableHLO build was sufficient. The minimal patch touched four files:
CMakeLists.txtstablehlo/CMakeLists.txtstablehlo/conversions/linalg/CMakeLists.txtstablehlo/conversions/tosa/CMakeLists.txt
1. Top-Level CMakeLists.txt
Change:
1 | |
To:
1 | |
2. stablehlo/CMakeLists.txt
Change:
1 | |
To:
1 | |
To install StableHLO headers as part of the library-only install, append:
1 | |
3. stablehlo/conversions/linalg/CMakeLists.txt
Remove:
1 | |
4. stablehlo/conversions/tosa/CMakeLists.txt
Remove:
1 | |
Building and Installing StableHLO
With the library-only patch applied, StableHLO can be configured, built, and installed as follows:
1 | |
1 | |
1 | |
1 | |
What the Install Produces
The install step provides:
- StableHLO static libraries in
$STABLEHLO_PREFIX/lib - StableHLO source headers in
$STABLEHLO_PREFIX/include/stablehlo/... - generated
.h.incheaders in$STABLEHLO_PREFIX/include/stablehlo/...
The install step does not produce:
- a StableHLO CMake package config
As a result, the install root is a usable headers + libraries prefix, but not a complete find_package(StableHLO) SDK.
Include Paths for Downstream Compilation
A downstream application can compile against:
-I$LLVM_PREFIX/include-I$STABLEHLO_PREFIX/include
Example compile command:
1 | |
Minimal source example:
1 | |
Linking Strategy
For this setup, using raw compiler and linker commands was simpler than using downstream CMake because the installed StableHLO tree is not a full packaged SDK.
A straightforward working link strategy is:
- include all StableHLO static archives,
- include all MLIR static archives,
- include all LLVM static archives,
- wrap them in a linker group so archive order does not matter.
Example:
1 | |
Intended Compiler Pipeline
The project was never intended to translate StableHLO directly into LLVM IR in a single step. The educational value comes from observing the intermediate representations and their transformations.
A reasonable conceptual pipeline is:
1 | |
Stage-by-Stage Interpretation
StableHLO
StableHLO is the input language. It represents tensor-level computation and ML semantics, such as:
- broadcasts,
- reductions,
- dot products,
- reshapes, and
- elementwise operations.
This form is useful for interchange but is still far from machine code.
Linalg / Tensor / Arith
At this level, computation becomes more explicit in MLIR terms.
Typical mappings include:
stablehlo.add->linalg.genericor an equivalent structured opstablehlo.dot_general->linalg.matmulorlinalg.genericstablehlo.maximum-> elementwiselinalg.genericwitharithinside- shape-preserving tensor manipulations ->
tensoroperations
This stage expresses high-level tensor semantics as structured computations over tensors.
MemRef / Bufferization
Bufferization converts tensor values into explicit storage objects.
Before bufferization, tensors are SSA values. After bufferization, data is represented in memory through memref objects. This stage answers questions such as:
- where outputs live,
- what gets allocated,
- what can be updated in place, and
- how temporaries are stored.
SCF
Structured ops are lowered into explicit loop-based control flow.
For example, elementwise tensor addition eventually becomes something conceptually like:
1 | |
Matrix multiplication similarly becomes nested loops over indices.
LLVM Dialect
This is the last MLIR stage before translation to standard LLVM IR. At this point:
- functions become
llvm.func, - memory is converted to LLVM-compatible representations,
- structured control flow is lowered to lower-level control flow, and
- arithmetic is converted into LLVM-compatible forms.
LLVM IR Source (.ll)
The final product is textual LLVM IR. This representation is easy to inspect, diff, save, and pass to downstream LLVM tools.
Implementation Status
The current executable milestone was implemented in simple_stablehlo_app.cpp.
The program now:
- accepts an input MLIR file path on the command line,
- registers core MLIR dialects,
- registers StableHLO dialects,
- registers the external
BufferizableOpInterfacemodels required by one-shot bufferization, - registers the translation interfaces required to export MLIR to LLVM IR,
- parses the input into an
mlir::ModuleOp, - verifies the parsed module,
- runs
canonicalizerandcse, - runs
stablehlo-legalize-to-linalg, - runs
canonicalizerandcseagain, - verifies and prints the intermediate Linalg/Tensor/Arith form,
- runs
one-shot-bufferizewith function-boundary bufferization enabled, - runs
convert-bufferization-to-memref, - verifies and prints the MemRef-oriented form,
- runs
convert-linalg-to-loops, - verifies and prints the SCF-based form,
- runs
scf-to-cf,lower-affine,index-to-llvm,arith-to-llvm,memref-to-llvm,func-to-llvm,cf-to-llvm, andreconcile-unrealized-casts, - runs cleanup again,
- verifies and prints the LLVM dialect result, and
- translates that LLVM dialect module to standard textual LLVM IR and prints it.
In short, the prototype now functions as a working:
- StableHLO reader,
- parser and verifier,
- StableHLO-to-Linalg lowering driver,
- one-shot bufferization driver,
- bufferization-to-MemRef conversion prototype,
- Linalg-to-loops prototype,
- LLVM-dialect lowering prototype,
- LLVM IR translation prototype, and
- multi-stage printer.
It also performs basic CLI validation and reports errors if parsing, verification, or lowering fails.
Validation and Testing
The prototype was tested with:
jit_matmul_bias_relu.mlir
Example usage:
1 | |
Test Outcome
The test succeeded end-to-end:
- the file parsed successfully,
- the input module verified successfully,
- the StableHLO-to-Linalg cleanup and lowering pipeline ran successfully,
- the intermediate Linalg/Tensor/Arith module verified successfully,
- the one-shot bufferization plus bufferization-to-MemRef pipeline ran successfully,
- the MemRef-oriented module verified successfully,
- the Linalg-to-loops pipeline ran successfully,
- the SCF-based module verified successfully,
- the LLVM-dialect lowering pipeline ran successfully,
- the final LLVM dialect module verified successfully,
- the LLVM dialect module translated successfully to standard LLVM IR, and
- all printed stages were produced successfully.
On jit_matmul_bias_relu.mlir, the tool completed four meaningful MLIR dialect transitions followed by a final LLVM IR export.
The sample input was a useful smoke test because it exercised nontrivial tensor behavior:
- matrix multiplication,
- broadcast,
- elementwise addition,
- elementwise maximum, and
- constants.
Remaining Work
Although the end-to-end LLVM IR path now works, several follow-on improvements remain:
- add output-file support such as
-o output.ll, - design a more polished CLI for stage selection and output control,
- separate debug-stage dumps from final
.lloutput, - improve memory management beyond the current educational prototype if needed.
Recommended Next Step
The next milestone should turn the current stdout-based prototype into a more compiler-like tool:
1 | |
Why This Mini Project Succeeded
This mini project succeeded because it combined a clear objective, incremental implementation, continuous validation, and effective use of coding agents.
1. Clear, Concrete Scope
The goal was not vaguely to “learn MLIR.” It was to build a small compiler executable that:
- reads StableHLO MLIR,
- lowers it through intermediate MLIR stages,
- reaches the MLIR LLVM dialect, and
- emits textual LLVM IR.
That concrete scope kept the work focused and measurable.
2. Incremental Milestones
The implementation progressed in small, testable steps:
- parse and verify StableHLO,
- canonicalize and CSE,
- lower StableHLO to Linalg/Tensor/Arith,
- bufferize to MemRef,
- lower Linalg to SCF loops,
- lower to the MLIR LLVM dialect,
- translate to textual LLVM IR.
This reduced risk and made failures easier to diagnose.
3. Verification at Every Stage
The program consistently:
- parsed input,
- verified IR before transformations,
- ran passes,
- verified IR after transformations, and
- printed intermediate results.
That discipline turned each milestone into a stable foundation for the next one.
4. Meaningful Test Input
Using jit_matmul_bias_relu.mlir as a smoke test ensured the pipeline was exercised on a realistic tensor computation rather than a trivial toy example.
5. Effective Delegation to Coding Agents
Coding agents were especially useful for the repetitive and implementation-heavy parts of the work, including:
- reading MLIR and StableHLO documentation,
- identifying relevant passes and headers,
- mapping lowering ideas to concrete APIs,
- editing the C++ driver incrementally,
- rebuilding and rerunning after each change,
- inspecting failures and adjusting the pass pipeline, and
- updating the accompanying markdown documentation.
This was valuable because MLIR is conceptually elegant but operationally detailed. Much of the difficulty lies not in the broad lowering idea, but in the implementation burden:
- which headers to include,
- which passes exist in the installed version,
- which interfaces must be registered,
- what order passes should run in, and
- what each stage expects as input.
Conclusion
This project achieved its central goal: building a small but functional MLIR-based compiler driver that lowers StableHLO to textual LLVM IR.
Beyond the executable itself, the work also produced a practical understanding of:
- MLIR’s staged lowering model,
- StableHLO’s role as a portable tensor-level IR,
- one-shot bufferization and lower-level dialect conversion,
- how coding agents can accelerate iterative compiler engineering by handling documentation lookup, code edits, rebuilds, and failure-driven replanning, and
- the realities of version compatibility and packaging when working with LLVM/MLIR-based projects.
The prototype is already a useful educational compiler driver. With output-file support and a cleaner CLI, it can be turned into a more polished standalone tool.