Adding a New Backend#
1. Introduction#
The flag_gems accelerated operators library from the FlagGems project can be used on multiple backends.
If you are a chip vendor and wish to contribute backend-specific optimizations for your hardware,
you can use this documentation to integrate the optimizations into FlagGems..
2. Create a backend directory#
All vendor-specific optimization code reside in the src/flag_gems/runtime/backend directory.
You can start by creating a folder for identification under this directory,
following the naming pattern <_vendor-name>. As an example, all NVIDIA-specific customization
can be found at src/flag_gems/runtime/backend/_nvidia.
3. Initialize the directory#
Create the necessary files, including but not limited to the __init__.py file,
the heuristics_config_utils.py file, the tune_configs.yaml file , as well as a folder named ops.
The expected directory layout is shown in the following example:
├── __init__.py
├── heuristics_config_utils.py
├── ops
│ ├── __init__.py
│ ├── add.py
│ └── gelu.py
│ `── (other operators ...)
└── tune_configs.yaml3.1 About __init__.py file#
An easy way to to create this file is to copy one from existing vendors
(say src/flag_gems/runtime/backend/_nvidia/__init__.py).
After having created your __init__.py file, the only change you need to make is
to configure the properties for the VendorInfoBase class:
vendor_info = VendorInfoBase(
vendor_name="<your vendor name>",
device_name="<the device name>",
device_query_cmd="<command for querying hardware info>"
)The important properties for VendorInfoBase are:
vendor_name: the vendor name at your choice, e.g.nvidia;device_name: the name for your acclerator device, e.g.cuda;device_query_cmd: the command line that is used to check the hardware devices on the node, e.g.nvidia-smi.dispatch_key: an optional property for registering operators totorch.library.Libraryin PyTorch, e.g.PrivateUse1.
3.2 The heuristics_config_utils.py file#
In the heuristics_config_utils.py file, You will configure the triton.heuristics parameters.
You can refer to src/flag_gems/runtime/backend/_nvidia/heuristics_config_utils.py
for an example.
3.3 The tune_configs.yaml file#
In the tune_configs.yaml file, you can customize triton.autotune parameters.
Similarly, you can refer to src/flag_gems/runtime/backend/_nvidia/tune_configs.yaml
for an example.
3.4 The ops directory#
The ops directory is where vendor-customized operator implementations are stored.
For instance, if you want to create a custom add operator, you will place the implementation
in ops/add.py. Following that, you should update the ops/__init__.py accordingly
as shown in the following example. The __all__ list in the ops/__init__.py file
ensures that your implementation for add and gelu is accessible from external
packages.
from .add import add
from .gelu import gelu
__all__= ["add", "gelu"]