FlagGems Experimental Operators#

This document lists all experimental operators in FlagGems that have achieved an average speedup of 0.8x or higher compared to PyTorch implementations.

Performance Overview#

  • Total Operators: 142
  • Average Speedup Range: 0.81x - 7.23x
  • Test Environment: Hopper GPU
  • Filtering Criteria: Average speedup â‰Ĩ 0.8x

Operators by Performance#

RankOperatorAvg SpeedupCategory
1_safe_softmax7.23x 🏆Internal
2digamma_2.41x 🏆Math
3zero1.85x ✅Other
4relu1.79x ✅Activation
5mse_loss1.64x ✅Loss
6masked_select1.47x ✅Other
7masked_scatter1.44x ✅Other
8eye1.43x ✅Other
9t_copy1.41x ✅Shape
10trace1.40x ✅Math
11i0_1.37x ✅Math
12zeros_like1.32x ✅Other
13diag1.27x ✅Other
14lift_fresh_copy1.24x ✅Other
15alias_copy1.23x ✅Other
16pixel_unshuffle1.20x 📈Vision
17triu1.18x 📈Shape
18rrelu_with_noise_backward1.17x 📈Activation
19glu1.17x 📈Activation
20tril1.16x 📈Shape
21silu_1.16x 📈Activation
22asinh_1.14x 📈Math
23mv1.14x 📈Linear Algebra
24arcsinh_1.13x 📈Math
25pixel_shuffle1.12x 📈Vision
26replication_pad3d1.11x 📈Padding
27_upsample_nearest_exact1d1.11x 📈Vision
28i01.11x 📈Math
29softplus1.10x 📈Activation
30selu_1.10x 📈Activation
31upsample_nearest1d1.10x 📈Vision
32special_i11.09x 📈Math
33selu1.09x 📈Activation
34amin1.09x 📈Math
35sinh_1.09x 📈Math
36logit_1.08x 📈Math
37upsample_nearest3d1.07x 📈Vision
38im2col1.06x 📈Vision
39reflection_pad1d1.06x 📈Padding
40elu1.06x 📈Activation
41arctanh_1.05x 📈Math
42sigmoid1.05x 📈Activation
43replication_pad1d1.04x 📈Padding
44silu1.04x 📈Activation
45sigmoid_1.04x 📈Activation
46addcdiv1.04x 📈Arithmetic
47sinc_1.03x 📈Math
48relu61.03x 📈Activation
49hardtanh1.03x 📈Activation
50hardtanh_1.03x 📈Activation
51hardswish_1.03x 📈Activation
52reciprocal_1.03x 📈Math
53sinc1.03x 📈Math
54hardsigmoid1.03x 📈Activation
55logaddexp21.02x 📈Math
56logit1.02x 📈Math
57arctanh1.02x 📈Math
58logaddexp1.02x 📈Math
59cosh_1.02x 📈Math
60special_xlog1py1.02x 📈Math
61celu1.02x 📈Activation
62hardsigmoid_1.02x 📈Activation
63arcsinh1.02x 📈Math
64sign1.02x 📈Math
65absolute_1.01x 📈Math
66_adaptive_avg_pool3d1.01x 📈Vision
67special_i0e1.01x 📈Math
68cos_1.01x 📈Math
69deg2rad_1.01x 📈Math
70floor_1.01x 📈Math
71negative1.01x 📈Math
72xlogy1.01x 📈Math
73exp21.01x 📈Math
74exp_1.00x 📈Math
75fix1.00x 📈Math
76xlogy_1.00x 📈Math
77absolute1.00x 📈Math
78prelu1.00x 📈Activation
79hypot1.00x 📈Math
80rad2deg_1.00x 📈Math
81smooth_l1_loss1.00x 📈Loss
82deg2rad1.00x 📈Math
83log_1.00x 📈Math
84sgn_1.00x 📈Math
85sin_1.00x 📈Math
86heaviside1.00x 📈Math
87logical_xor_1.00x 📈Other
88trunc1.00x 📈Math
89heaviside_1.00x 📈Math
90hardshrink1.00x 📈Activation
91huber_loss1.00x 📈Loss
92threshold_1.00x 📈Activation
93addcmul_1.00x 📈Arithmetic
94neg_1.00x 📈Math
95hypot_1.00x 📈Math
96leaky_relu1.00x 📈Activation
97fmin1.00x 📈Math
98erfinv1.00x 📈Math
99log1p_1.00x 📈Math
100frac1.00x ⚡Math
101_functional_sym_constrain_range_for_size1.00x ⚡Internal
102expand1.00x ⚡Shape
103lift1.00x ⚡Other
104unsqueeze1.00x ⚡Shape
105_unsafe_view1.00x ⚡Internal
106softshrink1.00x ⚡Activation
107log2_1.00x ⚡Math
108permute1.00x ⚡Shape
109leaky_relu_1.00x ⚡Activation
110atanh_1.00x ⚡Math
111permute_copy1.00x ⚡Shape
112fft_ifftshift1.00x ⚡Other
113copy_1.00x ⚡Other
114fix_1.00x ⚡Math
115slice_scatter0.99x ⚡Other
116exp2_0.99x ⚡Math
117rsqrt_0.99x ⚡Math
118threshold0.98x ⚡Activation
119reciprocal0.97x ⚡Math
120maximum0.97x ⚡Arithmetic
121abs0.96x ⚡Math
122arccosh0.96x ⚡Math
123multiply0.95x ⚡Arithmetic
124margin_ranking_loss0.95x ⚡Loss
125celu_0.92x ⚡Activation
126hardswish0.91x ⚡Activation
127soft_margin_loss0.90x ⚡Loss
128replication_pad2d0.90x ⚡Padding
129unsqueeze_copy0.89x ⚡Shape
130native_dropout_backward0.89x ⚡Other
131slice_backward0.88x ⚡Other
132relu_0.86x ⚡Activation
133negative_0.86x ⚡Math
134abs_0.86x ⚡Math
135take0.86x ⚡Other
136sgn0.86x ⚡Math
137erf_0.82x ⚡Math
138gelu_0.82x ⚡Activation
139erfinv_0.82x ⚡Math
140_log_softmax_backward_data0.82x ⚡Internal
141log10_0.81x ⚡Math
142rmsnormspecial ⚡Normalization

Legend:

  • 🏆 Outstanding: Speedup â‰Ĩ 2.0x
  • ✅ Excellent: Speedup â‰Ĩ 1.5x
  • 📈 Good: Speedup â‰Ĩ 1.0x
  • ⚡ Decent: Speedup â‰Ĩ 0.8x

Categories#

  • Activation: Activation functions (ReLU, GELU, Sigmoid, etc.)
  • Arithmetic: Basic arithmetic operations (add, mul, div, etc.)
  • Comparison: Comparison operations (eq, ne, gt, lt, etc.)
  • Internal: Internal/utility operations
  • Linear Algebra: Matrix operations (matmul, mv, etc.)
  • Loss: Loss functions (MSE, Cross-Entropy, etc.)
  • Math: Mathematical functions (sin, cos, exp, log, etc.)
  • NLP: Natural language processing operations
  • Other: Miscellaneous operations
  • Padding: Padding operations (reflection_pad, replication_pad, etc.)
  • Shape: Shape manipulation operations
  • Vision: Computer vision operations

Notes#

  • All operators have passed accuracy tests
  • Performance measured on Hopper GPU with various input shapes
  • Speedup calculated as: PyTorch_time / FlagGems_time
  • Higher values indicate better performance