Building Custom Models Using FlagGems modules#
In some scenarios, users may want to build their own models from scratch or to adapt existing ones to better suit their specific use cases. To support this, FlagGems provides a growing collection of high-performance modules commonly used in large language models (LLMs).
These components are implemented using FlagGems-accelerated operators
and can be used in the way you use any standard torch.nn.Module.
You can seamlessly integrate them into your system to benefit from kernel-level acceleration
without writing custom CUDA code or Triton code.
Modules can be found in flag_gems/modules.
Modules Available#
| Module | Description | Supported Features |
|---|---|---|
GemsRMSNorm | RMS LayerNorm | Fused residual add, inplace and outplace |
GemsRope | Standard rotary position embedding | inplace and outplace |
GemsDeepseekYarnRoPE | RoPE with extrapolation for DeepSeek-style LLMs | inplace and outplace |
GemsSiluAndMul | Fused SiLU activation with elementwise multiplication. | outplace only |
We encourage users to use these as drop-in replacements for the equivalent PyTorch layers. More components such as fused attention, MoE layers, and transformer blocks are under development.