Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them

What is a schedule

A schedule is the plan for how a computation is executed efficiently on hardware, as opposed to what the computation means mathematically.

loop order
tiling / blocking
fusion
vectorization
parallelization
memory placement
GPU thread/block mapping

This matters because the same mathematical computation can have many implementations with very different performance.

Different ML frameworks and how much they require schedule specification

Below is a practical spectrum.

High-level tensor frameworks

The programmer usually specifies no schedule directly.

The user mostly writes:

tensor ops
module definitions
training/inference logic

Scheduling decisions are left to:

the framework
backend compilers
optimized libraries like cuBLAS and cuDNN

Schedule burden on the programmer: very low.

ML compilers (JAX+XLA, TensorFlow+XLA, PyTorch compile paths)

The programmer still usually does not explicitly write the schedule, but they do need to write code that compiles well. That means understanding things like:

fusion-friendly expression style
avoiding patterns that trigger recompilation
avoiding graph breaks
being aware of shape and layout effects

The schedule is still mostly chosen by the compiler.

Schedule burden on the programmer: low to moderate, mostly indirect.

Halide / TVM style systems

These may let the programmer explicitly control the schedule.

The programmer can specify or tune:

tiling
loop order
placement of intermediate computations
vectorization / parallelization
GPU mapping

This is a much more direct engagement with scheduling.

Schedule burden on the programmer: moderate to high, depending on use.

Triton

Triton does not always call it “schedule,” but the programmer is effectively writing many schedule decisions directly into the kernel.

The programmer often controls:

tile/block sizes
memory access pattern
reduction structure
mapping of work to program instances
autotuning search space

Schedule burden on the programmer: high for kernel authors, but only if writing custom kernels.

CUDA C++

CUDA gives the programmer the most explicit control.

The programmer directly manages:

blocks and threads
shared memory usage
synchronization
memory access strategy
occupancy and register pressure tradeoffs

This is lower-level than Triton.

Schedule burden on the programmer: very high.

How important is learning about schedules?

The answer depends on what kind of ML programmer you are.

Ordinary ML user / researcher

Usually, deep knowledge of schedules is not necessary.

What matters more is:

understanding the framework
knowing how to express computations in compiler-friendly ways
using optimized libraries and built-in kernels
profiling bottlenecks

This is analogous to ordinary CPU programming, where most programmers care more about compiler flags and coding style than about assembly.

Importance of schedule knowledge: low.

Performance-conscious ML engineer

Some schedule intuition is very useful, even if they do not write schedules directly.

Helpful concepts include:

fusion reduces memory traffic
layouts and shapes affect performance
materializing intermediates can be expensive
memory bandwidth often dominates
not all mathematically equivalent code compiles equally well

This is analogous to knowing about cache locality and vectorization in CPU programming.

Importance of schedule knowledge: moderate, mainly conceptual.

Compiler engineer / kernel author / Triton user

Deep schedule knowledge becomes very important.

They may need to reason directly about:

loop transformations
tiling
memory hierarchy
register pressure
occupancy
autotuning
code generation tradeoffs

This is analogous to compiler engineering, SIMD kernel writing, or assembly-level optimization in CPU land.

Importance of schedule knowledge: high to essential.

Why ML sometimes goes lower-level than mainstream software

A natural question is: if most ordinary software developers do not write assembly, why is low-level kernel work more visible in ML?

The main reason is that ML is a domain where a small number of numeric kernels often dominate total runtime, while the abstractions, hardware mappings, and model patterns are still evolving quickly.

Why most software engineers do not write assembly

In mainstream software:

common workloads are relatively stable
compilers are mature
standard libraries are strong
performance bottlenecks are often spread across I/O, networking, databases, and application logic
portability and maintainability usually matter more than squeezing the last bit of performance from one loop

Even when low-level optimization matters, it is usually hidden inside mature libraries such as:

BLAS
compression libraries
crypto libraries
media codecs
database engines

So ordinary programmers can stay at a high level.

Why ML more often exposes low-level concerns

In ML:

a small number of kernels may dominate runtime
model architectures evolve quickly
hardware features evolve quickly
compiler support is improving but not always sufficient
optimized libraries do not always cover new model variants

As a result, ML developers more often fall off the standard-library happy path and need custom kernels.

Important clarification

This does not mean that most ML practitioners write Triton or CUDA.
In fact, most do not.

Most ML users stay at the framework level, just as most software engineers stay far above assembly.

The lower-level work is done by specialists:

kernel authors
compiler engineers
inference/runtime teams
performance-focused infrastructure teams

Better analogy

The right analogy is usually:

ordinary software engineer ↔ ordinary ML practitioner
compiler/runtime/library engineer ↔ ML compiler/kernel systems engineer

So the ecosystems are not completely different. The difference is that ML currently has more visible pressure on the performance-critical substrate because:

the hot loops are more dominant
the workloads are more algebraic and hardware-sensitive
the abstractions are less fully settled

Practical consequence

ML today is in a phase where:

high-level use is the norm for most users
low-level optimization still delivers unusually large gains in some important cases
the need for escape hatches like Triton remains higher than the need for assembly in ordinary software

AI and Machine Learning

#machine-learning #reference #systems #computer-architecture

Schedules in Machine Learning Computation: What They Are and Who Needs to Know About Them

https://jifengwu2k.github.io/2026/05/03/Schedules-in-Machine-Learning-Computation/

Author

Jifeng Wu

Posted on

May 3, 2026

Licensed under

Exporting Compute Graphs, LLM Shape Dynamics, and Serving Runtimes Previous

Personal GitHub Portfolio Next