Building Custom Models Using FlagGems modules#

In some scenarios, users may want to build their own models from scratch or to adapt existing ones to better suit their specific use cases. To support this, FlagGems provides a growing collection of high-performance modules commonly used in large language models (LLMs).

These components are implemented using FlagGems-accelerated operators and can be used in the way you use any standard torch.nn.Module. You can seamlessly integrate them into your system to benefit from kernel-level acceleration without writing custom CUDA code or Triton code.

Modules can be found in flag_gems/modules.

Modules Available#

ModuleDescriptionSupported Features
GemsRMSNormRMS LayerNormFused residual add, inplace and outplace
GemsRopeStandard rotary position embeddinginplace and outplace
GemsDeepseekYarnRoPERoPE with extrapolation for DeepSeek-style LLMsinplace and outplace
GemsSiluAndMulFused SiLU activation with elementwise multiplication.outplace only

We encourage users to use these as drop-in replacements for the equivalent PyTorch layers. More components such as fused attention, MoE layers, and transformer blocks are under development.