Hugging Face Model Repositories: Organization, Semantics, and Portability
What a Hugging Face model repository is
A Hugging Face model repository does not necessarily contain a complete, formal description of the model computation. It is best understood as a versioned artifact repository. A typical repository contains:
1 | |
config.jsoncontains metadata and hyperparameters, but it does not fully define arbitrary model semantics.safetensorscontains named tensors, but tensor names are only conventions unless interpreted by code.
Where the model definition lives
There is usually no separate file called a “model definition.” Instead, model structure is determined by a combination of:
1 | |
Standard Transformers-supported models
For known architectures, config.json contains fields such as:
1 | |
In this case, the model definition is not stored in the repository; it is supplied by the installed transformers library. transformers reads model_type and maps it to an internal Python implementation.
Custom models
If the architecture is not built into transformers, the repository may include files such as:
1 | |
The config.json may contain an auto_map field:
1 | |
Users then load the model with:
1 | |
This downloads and executes Python code from the repository.
Hugging Face as an artifact host for a separate runtime
A separate optimized runtime can use Hugging Face only as a host for:
1 | |
Most runtimes do not support arbitrary HF repositories. They support a known list of architectures:
1 | |
Internally, they have architecture-specific loaders:
1 | |
Each loader knows:
- expected config fields
- tensor naming patterns
- shape expectations
- attention layout
- RoPE conventions
- MLP/MoE structure
- quantization format
- how to pack weights
- how to construct the runtime execution plan
Static framework conversion
There is no generally reliable path from:
1 | |
to:
1 | |
Tracing and export tools may work for simple cases:
1 | |
But they can fail or produce poor results when the model has:
- dynamic control flow
- shape-dependent behavior
- MoE routing
- custom kernels
- cache abstractions
- unsupported ops
- complex quantization
- backend-specific memory layouts
A correct and efficient port often requires:
1 | |