| 1 | abs | Math | Stable | 1.0 | aten, pointwise | Computes the absolute value of each element in input.
This is a simple wrapper of the existing torch abs operator. |
| 2 | abs_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of abs(), which is a simple wrapper of the Torch abs operator. |
| 3 | absolute | Math | Beta | 5.0 | aaten, KernelGen | This is an alias for abs() with the low-level operations implemented
by invoking low-level Torch operators. |
| 4 | acos | Math | Stable | 5.0 | aten, pointwise | Returns a new tensor with the arccosine (in radians) of each element in input. |
| 5 | act_quant_triton | Quantization | Beta | 5.1 | fused | This is a fused operator. |
| 6 | adaptive_avg_pool3d | NeuralNetwork | Alpha | 5.0 | aten, nn.functional, KernelGen | Apply a 3D adaptive average pooling over an input signal composed of several input planes. |
| 7 | adaptive_avg_pool3d_out | NeuralNetwork | Alpha | 5.0 | aten, nn.functional, KernelGen | A variant of _adaptive_avg_pool3d that assigns the output to the out tensor. |
| 8 | add | Math | Stable | 1.0 | aten, pointwise | Add a scalar or tensor to self tensor. If both alpha and other are specified,
each element of other is scaled by alpha before being used. |
| 9 | add_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of add(). |
| 10 | add_rms_norm | NeuralNetwork | Alpha | 5.1 | aten, KernelGen, Normalization | Add two inputs element-wise and apply Root Mean Square Layer Normalization. |
| 11 | addcdiv | LinearAlg | Stable | 4.0 | aten, pointwise | Performs the element-wise division of tensor1 by tensor2, multiplies the result
by the scalar value and adds it to input. |
| 12 | addcdiv_out | LinearAlg | Stable | 5.1 | aten, pointwise, KernelGen | A variant of addcdiv() that assigns the output to the given out parameter.. |
| 13 | addcmul | LinearAlg | Stable | 4.0 | aten, pointwise | Performs the element-wise multiplication of tensor1 by tensor2,
multiplies the result by the scalar value and adds it to input. |
| 14 | addcmul_out | LinearAlg | Alpha | 5.0 | aten, pointwise, KernelGen | A variant of addcmul that allows the output to be assigned to out. |
| 15 | addmm | BLAS | Stable | 1.0 | aten | Performs a matrix multiplication of the matrices mat1 and mat2.
The matrix input is added to the final result. |
| 16 | addmm_dtype | BLAS | Beta | 5.1 | aten | A variant of addmm that allows the dtype of the output tensor to be specified.
This is supported only on CUDA and for torch.float32 given torch.float16 or torch.bfloat16 input dtypes. |
| 17 | addmm_dtype_out | BLAS | Beta | 5.1 | aten | A variant of addmm_dtype() that allows the output to be saved to the provided out parameter. |
| 18 | addmm_out | BLAS | Stable | 4.0 | aten | A variant of addmm that assigns to the output to the provided out parameter. |
| 19 | addmv | BLAS | Stable | 4.0 | aten | Performs a matrix-vector product of the matrix mat and the vector vec.
The vector input is added to the final result. |
| 20 | addmv_out | BLAS | Stable | 4.0 | aten | Performs a matrix-vector product of the matrix mat and the vector vec.
The vector input is added to the final result. |
| 21 | addr | BLAS | Stable | 4.0 | aten | Performs the outer-product of vectors vec1 and vec2
and adds it to the matrix input. |
| 22 | affine_grid_generator | Tensor | Alpha | 5.1 | aten, KernelGen, pointwise | Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices theta. |
| 23 | alias_copy | Tensor | Beta | 5.0 | aten, KernelGen | Creates a new tensor that shares the same storage data as the original tensor,
but without preserving the original tensor's metadata (like shape or strides)
in a way that links future mutations. |
| 24 | alias_copy_out | Tensor | Beta | 5.0 | aten, KernelGen | A variant of alias_copy() that assigns the output to the out tensor. |
| 25 | all | Math | Stable | 2.0 | aten, Reduction | Tests if all elements in input evaluate to True. |
| 26 | all_dim | Math | Stable | 2.0 | aten, Reduction | For each row of input in the given dimension dim, returns True if all elements
in the row evaluate to True and False otherwise. |
| 27 | all_dims | Math | Stable | 2.0 | aten, Reduction | A variant of all. |
| 28 | allclose | Math | Stable | 2.1 | aten | This function checks if input and other satisfy a condition specified via
atol and rtol elementwise, for all elements of input and other. |
| 29 | amax | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the maximum value of each slice of the input tensor in the given dimension(s) dim. |
| 30 | aminmax | Tensor | Beta | 5.1 | aten | Computes the minimum and maximum values of the input tensor. |
| 31 | angle | Math | Stable | 3.0 | aten, pointwise | Computes the element-wise angle (in radians) of the given input tensor. |
| 32 | any | Math | Stable | 2.0 | aten, Reduction | Tests if any element in input evaluates to True. |
| 33 | any_dim | Math | Stable | 2.0 | aten, Reduction | For each row of input in the given dimension dim, returns True if any element in the row evaluate to True and False otherwise. |
| 34 | any_dims | Math | Stable | 2.0 | aten, Reduction | For each row of input in the given dimensions in dims, returns True if any element in the row evaluate to True and False otherwise.
The dims contains tuple of ints indicating the dimensions to reduce. |
| 35 | apply_repetition_penalties | NeuralNetwork | Stable | 5.0 | fused, vLLM | Modifies logit tensors in place to penalize tokens that have already appeared in the generated sequence. |
| 36 | apply_rotary_pos_emb | NeuralNetwork | Stable | 2.0 | fused | A method to incorporate positional information into the Transformer architecture.
Rotary Positional Embedding (RoPE) applies position-dependent rotation to the query (Q)
and key (K) vectors before computing the attention score. |
| 37 | arange | Tensor | Stable | 2.1 | aten | Returns a 1-D tensor of size ceiling((end−start)/step) with values from the interval [start, end)
taken with common difference step beginning from start. |
| 38 | arange_start | tensor | Stable | 2.1 | aten | A variant of arange, with start and/or step specified. |
| 39 | arange_start_step | tensor | Stable | 2.1 | aten | A variant of arange, with start and/or step specified. |
| 40 | arcsinh | Math | Beta | 5.0 | aten, KernelGen | Performs an element-wise inverse hyperbolic sine computation on the given tensor. |
| 41 | arcsinh_ | Math | Beta | 5.0 | aten, KernelGen | The in-place version of arcsinh(). |
| 42 | arcsinh_out | Math | Beta | 5.0 | aten, KernelGen | A variant of arcsinh that allows the output to be assigned to the out tensor. |
| 43 | arctanh_ | Math | Beta | 5.0 | aten, KernelGen | Computes the element-wise inverse hyperbolic tangent of a given input tensor.
This is an in-place version. |
| 44 | argmax | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the indices of the maximum value of all elements in the input tensor. |
| 45 | argmin | LinearAlg | Stable | 2.2 | aten, Reduction | Returns the indices of the minimum value(s) of the flattened tensor or along a dimension. |
| 46 | asinh | Math | Beta | 5.1 | aten, KernelGen | Returns a new tensor with the inverse hyperbolic sine of the elements of input. |
| 47 | asinh_ | Math | Beta | 5.0 | aten, KernelGen | Computes the inverse hyperbolic sine for each element of a tensor in-place. |
| 48 | as_strided_copy | Tensor | Beta | 5.1 | aten, KernelGen | Creates a contiguous copy of an as_strided view of the input tensor. |
| 49 | as_strided_copy_out | Tensor | Beta | 5.1 | aten, KernelGen | A variant of as_strided_copy() that assigns the output to the out tensor. |
| 50 | assert_async | Tensor | Stable | 5.1 | utility | A utility used to perform data-dependent assertions on GPU tensors
without triggering an immediate, performance-heavy GPU-to-CPU synchronization. |
| 51 | atan | Math | Stable | 4.0 | aten, pointwise | Returns a new tensor with the arctangent of the elements (in radians) in the input tensor. |
| 52 | atan_ | Math | Stable | 4.0 | aten, pointwise | The in-place version of atan(). |
| 53 | atan2 | Math | Stable | 5.1 | aten, pointwise | Computes the element-wise arc tangent of input/other(y/x),
returning angles in radians between -PI and PI. |
| 54 | atan2_out | Math | Beta | 5.1 | aten, pointwise | A variant of atan2 that allows the output to be saved into out. |
| 55 | avg_pool2d | NeuralNetwork | Stable | 4.1 | nn.functional | Applies 2D average-pooling operation in kH \mul kW regions by step size sH \mul sW steps.
The number of output features is equal to the number of input planes.
This is for the forward case. |
| 56 | avg_pool2d_backward | NeuralNetwork | Stable | 4.1 | aten | The backward version of avg_pool2d(). |
| 57 | avg_pool3d | NeuralNetwork | Beta | 5.1 | aten | Applies 3D average-pooling operation in kD \times kH \times kW regions by step size
sD \times sH \times sW steps. |
| 58 | avg_pool3d_backward | NeuralNetwork | Alpha | 5.1 | aten | This is the backward version of avg_pool3d(). |
| 59 | baddbmm | BLAS | Stable | 4.1 | aten | Performs a batch matrix-matrix product of matrices in batch1 and batch2.
input is added to the final result. batch1 and batch2 must be 3-D tensors
each containing the same number of matrices. |
| 60 | baddbmm.out | BLAS | Beta | 5.1 | aten | This is a variant of baddbmm(). |
| 61 | batch_norm | NeuralNetwork | Stable | 3.0 | aten | An internal operator used for implementing the BatchNorm functionality. |
| 62 | batch_norm_backward | NeuralNetwork | Stable | 3.0 | aten | The backward version of batch_norm(). |
| 63 | bernoulli_ | Tensor | Beta | 5.1 | aten, skip_precision_check, KernelGen | Draws binary random numbers (0 or 1) from a Bernoulli distribution. |
| 64 | bincount | Reduction | Stable | 5.0 | aten, pointwise, KernelGen | Count the frequency of each value in an array of non-negative integers. |
| 65 | bitwise_and_scalar | Math | Stable | 2.0 | aten, pointwise | Computes the bitwise AND of input and other scalar. |
| 66 | bitwise_and_scalar_ | Math | Stable | 2.2 | aten, pointwise | The in-place, scalar version of bitwise_and(). |
| 67 | bitwise_and_scalar_tensor | Math | Stable | 2.0 | aten, pointwise | A variant of bitwise_and(). |
| 68 | bitwise_and_tensor | Math | Stable | 2.0 | aten, pointwise | The Tensor method version of bitwise_and(). |
| 69 | bitwise_and_tensor_ | Math | Stable | 2.2 | aten, pointwise | The in-place, Tensor method version of bitwise_and(). |
| 70 | bitwise_left_shift | Math | Stable | 4.0 | aten, pointwise | Computes the left arithmetic shift of input by other bits. |
| 71 | bitwise_not | Math | Stable | 2.0 | aten, pointwise | Computes the bitwise NOT of the given input tensor. |
| 72 | bitwise_not_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of bitwise_not(). |
| 73 | bitwise_or_scalar | Math | Stable | 2.0 | aten, pointwise | Computes the bitwise OR of scalars input and other. |
| 74 | bitwise_or_scalar_ | Math | Stable | 2.2 | aten | The in-place version of bitwise_or_scalar. |
| 75 | bitwise_or_scalar_tensor | Math | Stable | 2.0 | aten, pointwise | Computes the bitwise OR of input and other. |
| 76 | bitwise_or_tensor | Math | Stable | 2.0 | aten, pointwise | Computes the bitwise OR of input and other, this is the Tensor method variant. |
| 77 | bitwise_or_tensor_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of bitwise_or_tensor(). |
| 78 | bitwise_right_shift | Math | Stable | 4.0 | aten, pointwise | Computes the right arithmetic shift of input by other bits. |
| 79 | bmm | BLAS | Stable | 1.0 | aten | Performs a batch matrix-matrix product of matrices stored in input and mat2. |
| 80 | bmm_out | BLAS | Stable | 5.0 | aten | Performs a batch matrix-matrix product of matrices stored in input and mat2.
This is a variant of bmm with out specified. |
| 81 | bucket_sort_topk | NeuralNetwork | Beta | 5.1 | fused, DSA | A wrapper of the TLE version and the Triton version bucket-sort topk operation. |
| 82 | cat | Tensor | Stable | 2.2 | aten | Concatenates the given sequence of tensors in tensors in the given dimension. |
| 83 | cat_out | Tensor | Stable | 2.2 | aten | A variant of cat that assigns the result to the provided out parameter. |
| 84 | cauchy | Distribution | Beta | 5.1 | aten | Draws random numbers from a Cauchy distribution. |
| 85 | cauchy_ | Distribution | Beta | 5.1 | aten | Fills the tensor with numbers drawn from the Cauchy distribution. |
| 86 | ceil | Math | Stable | 5.0 | aten, pointwise | Returns a new tensor with the ceil of the elements of input, the smallest integer greater than
or equal to each element. |
| 87 | ceil_ | Math | Stable | 5.0 | aten, pointwise | The in-place version of ceil(). |
| 88 | ceil_out | Math | Stable | 5.0 | aten, pointwise | A variant of ceil() with out specified. |
| 89 | celu | NeuralNetwork | Stable | 4.0 | aten, nn.functional, pointwise | Applies the quantized CELU (Continuously Differentiable Exponential Linear Unit)
activation function element-wise. |
| 90 | celu_ | NeuralNetwork | Stable | 4.0 | aten, nn.functional, pointwise | The in-place version of celu(). |
| 91 | chunk_gated_delta_rule_fwd | Attention | Alpha | 5.0 | fused, FLA | The forward case for ChunkGatedDeltaRuleFunction with Flash Linear Attention (FLA). |
| 92 | clamp | Math | Stable | 2.0 | aten, pointwise | Clamps all elements in input into the range [min, max]. |
| 93 | clamp_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of clamp(). |
| 94 | clamp_max | Math | Alpha | 5.1 | aten, KernelGen, pointwise | Clamps all elements in input to be smaller or equal max. |
| 95 | clamp_max_ | Math | Alpha | 5.1 | aten, KernelGen, pointwise | The in-place version of clamp_max(). |
| 96 | clamp_min | Math | Stable | 4.0 | aten, pointwise | A variant of clamp() with min set to min. |
| 97 | clamp_min_ | Math | Stable | 4.0 | aten, pointwise | The in-place version of clamp_(). |
| 98 | clamp_tensor | Math | Stable | 2.0 | aten, pointwise | The tensor version of clamp(). |
| 99 | clamp_tensor_ | Math | Stable | 2.2 | aten, pointwise | The in-place, tensor version of clamp(). |
| 100 | clip | Math | Beta | 5.1 | aten, KernelGen | This is identical to clamp(). |
| 101 | clip_ | Math | Beta | 5.1 | aten, KernelGen | This is identical to clamp_(). |
| 102 | col2im | Math | Alpha | 5.1 | aten, KernelGen | Rearranges column blocks back into a multidimensional tensor (inverse of im2col). |
| 103 | combine_topk_swa_indices | NeuralNetwork | Beta | 5.1 | fused, Attention, vLLM, DeepSeekV4 | Combines compressed top-k sparse attention indices with sliding-window attention indices
for DeepSeekV4 attention. |
| 104 | concat_and_cache_mla | Attention | Beta | 3.0 | fused, MLA | Writes the latent and RoPE value into KV cache for Multi-head Latent Attention forward case. |
| 105 | compute_global_topk_indices_and_lens | NeuralNetwork | Beta | 5.1 | fused, Attention, vLLM, DeepSeekV4 | Converts local top-k sparse attention indices to global KV-cache indices and computes
valid top-k lengths for DeepSeekV4 attention. |
| 106 | concatenate | Tensor | Alpha | 5.1 | aten, KernelGen | An alias of cat(). |
| 107 | conj_physical | LinearAlg | Beta | 5.1 | aten | Computes the element-wise conjugate of the given input tensor.
If input has a non-complex dtype, this function just returns input. |
| 108 | constant_pad_nd | NeuralNetwork | Stable | 2.2 | aten, IR | Pads the input tensor boundaries with a constant value.
This is an IR representation, not a public API. |
| 109 | contiguous | Tensor | Removed | 4.1 | aten, skip_precision_check | Returns a contiguous in memory tensor containing the same data as self tensor. |
| 110 | conv1d | Convolution | Stable | 4.2 | aten | Applies a 1D convolution over a quantized 1D input composed of several input planes. |
| 111 | conv1d_padding | Convolution | Stable | 4.2 | aten | Applies a 1D convolution over a quantized 1D input composed of several input planes. |
| 112 | conv2d | Convolution | Stable | 4.2 | aten | Applies a 2D convolution over a quantized 2D input composed of several input planes. |
| 113 | conv2d_padding | Convolution | Stable | 4.2 | aten | Applies a 2D convolution over a quantized 2D input composed of several input planes. |
| 114 | conv3d | Convolution | Stable | 4.2 | aten | Applies a 3D convolution over a quantized 3D input composed of several input planes. |
| 115 | conv3d_padding | Convolution | Stable | 4.2 | aten | Applies a 3D convolution over a quantized 3D input composed of several input planes. |
| 116 | conv_depthwise2d | NeuralNetwork | Beta | 2.2 | aten, Convolution, NoCPU | A depthwise convolution for the conv2d neural network function. |
| 117 | conv_transpose1d | Convolution | Beta | 5.1 | aten, KernelGen | Applies a 1D transposed convolution operator over an input image composed of several input planes. |
| 118 | conv_transpose2d | Convolution | Alpha | 5.1 | aten, KernelGen | Applies a 2D transposed convolution operator over an input image composed of several input planes. |
| 119 | copy | Tensor | Beta | 4.2 | aten, pointwise | As a wrapper of copy_, this operator copies elements from src to out
using given template for shapes. |
| 120 | copy_ | Tensor | Stable | 4.1 | aten, pointwise, skip_precision_check | Copies the elements from src into self tensor and returns self. |
| 121 | copysign | Tensor | Beta | 5.1 | aten, pointwise | Create a new floating-point tensor with the magnitude of input and the sign of other, elementwise. |
| 122 | copysign_out | Tensor | Beta | 5.1 | aten, pointwise | A variant of copysign that allows the output to be saved into out. |
| 123 | cos | Math | Stable | 2.0 | aten, pointwise | Returns a new tensor with the cosine of the elements of input given in radians. |
| 124 | cos_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of cos(). |
| 125 | cosh | Math | Stable | 5.1 | aten, pointwise | Returns a new tensor with the hyperbolic cosine of the elements of input. |
| 126 | cosh_ | Math | Stable | 5.1 | aten, pointwise | This is the in-place version of cosh(). |
| 127 | cosh_out | Math | Stable | 5.1 | aten, pointwise | This is an variant of cosh() that assigns the output to the provided out. |
| 128 | count_nonzero | Tensor | Stable | 2.2 | aten, Reduction | Counts the number of non-zero values in the tensor input along the given dim.
If no dim is specified then all non-zeros in the tensor are counted. |
| 129 | cp_gather_indexer_k_quant_cache | Quantization | Beta | 5.1 | fused, vLLM | This is a fused operator that gathers FP8 K cache values and scales. |
| 130 | cross_entropy_loss | NeuralNetwork | Removed | 3.0 | fused, Reduction | Computes the cross entropy loss between input logits and target. |
| 131 | cudnn_convolution | NeuralNetwork | Beta | 5.1 | aten, KernelGen | A wrapper for cuDNN convolution backend. |
| 132 | cummax | Math | Stable | 3.0 | aten, Reduction | Returns a named tuple (values, indices) where values is the cumulative maximum of elements
of input in the dimension dim. And indices is the index location of each maximum value
found in the dimension dim. |
| 133 | cummin | Math | Stable | 2.2 | aten, Reduction | Returns a named tuple (values, indices) where values is the cumulative minimum of elements
of input in the dimension dim. And indices is the index location of each minimum value
found in the dimension dim. |
| 134 | cumprod | Math | Beta | 5.1 | aten, Reduction | Returns the cumulative product of elements of input in the dimension dim. |
| 135 | cumprod_ | Math | Beta | 5.1 | aten, Reduction | This is the in-place version of cumprod(). |
| 136 | cumsum | LinearAlg | Stable | 1.0 | aten | |
| 137 | cumsum_out | Reduction | Stable | 3.0 | aten | |
| 138 | cutlass_scaled_mm | LinearAlg | Beta | 5.0 | fused, vLLM | |
| 139 | dequantize_and_gather_k_cache | NeuralNetwork | Beta | 5.1 | fused, Attention, vLLM, DeepSeekV4 | Dequantizes FP8 K-cache entries and gathers them into a BF16 tensor for DeepSeekV4 attention. |
| 140 | dgeglu | NeuralNetwork | Stable | 5.0 | fused, Transformer | Gaussian Error Gated Linear Unit with GELU activation instead of sigmoid function.
This is for the backward case. |
| 141 | diag | Tensor | Stable | 2.2 | aten | - If
input is a vector (1-D tensor), then returns a 2-D square tensor
with the elements of input as the diagonal. - If
input is a matrix (2-D tensor), then returns a 1-D tensor
with the diagonal elements of input.
|
| 142 | diag_embed | Tensor | Stable | 2.2 | aten, pointwise | Creates a tensor whose diagonals of certain 2D planes (specified by dim1 and dim2) are filled by input.
To facilitate creating batched diagonal matrices, the 2D planes formed by the last two dimensions
of the returned tensor are chosen by default. |
| 143 | diagonal_backward | LinearAlg | Stable | 2.2 | aten, pointwise | A diagonal operation returns a partial view of input with the its diagonal elements
with respect to dim1 and dim2 appended as a dimension at the end of the shape.
This is the backward case for diagonal(). |
| 144 | diff | Math | Beta | 5.1 | aten, KernelGen | Computes the n-th forward difference along the given dimension. |
| 145 | digamma_ | Math | Beta | 5.0 | aten, KernelGen | Computes the in-place digamma function, which is the logarithmic derivative of the Gamma function. |
| 146 | dispatch_fused_moe_kernel | MoE | Beta | 5.0 | fused, Activation, vLLM | Accelerates neural network training by combining token routing (dispatch/all-to-all communication),
expert computation (GEMM), and result aggregation into a single GPU kernel. |
| 147 | div_scalar | Math | Stable | 2.1 | aten | This is the scalar version of div(). |
| 148 | div_scalar_ | Math | Stable | 2.1 | aten | This is the in-place version of div_scalar(). |
| 149 | div_tensor | Math | Stable | 2.1 | aten, pointwise | Divides each element of the input input by the corresponding element of other.
Note that torch.divide() is an alias of torch.div() and torch.true_divide()
is an alias of torch.div() with rounding_mode=None. |
| 150 | div_tensor_ | Math | Stable | 2.1 | aten | This is the in-place version of div_tensor(). |
| 151 | div_out | Math | Stable | 4.2 | aten | This is an variant of div() with an out argument. |
| 152 | div_scalar_mode | Math | Stable | 1.0 | aten, pointwise | Divides each element of the input by the corresponding element of other.
An optional rounding_mode can be specified. |
| 153 | div_scalar_mode_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of div_mode(). |
| 154 | div_tensor_mode | Math | Stable | 1.0 | aten, pointwise | Divides each element of the input by the corresponding element of other.
An optional rounding_mode can be specified. |
| 155 | div_tensor_mode_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of div_mode(). |
| 156 | dot | BLAS | Stable | 3.0 | aten | Computes the dot product of two 1D tensors. |
| 157 | dreglu | NeuralNetwork | Alpha | 4.2 | fused, Transformer | Rectified Gated Linear Unit is a variant of GLU that uses ReLU instead of
the sigmoid function for gating. This is the backward case. |
| 158 | dropout | NeuralNetwork | Stable | 1.0 | aten, nn.functional | An internal IR for implementing torch.nn.functional.dropout. |
| 159 | dropout_backward | NeuralNetwork | Stable | 3.0 | aten, nn.functional | The backward case of dropout(). |
| 160 | dswiglu | NeuralNetwork | Alpha | 5.0 | fused, Transformer | Swish-Gated Linear Unit, a variant of GLU with the Swish activation function.
This is for the backward case. |
| 161 | dunder_ior_scalar | Math | Beta | 5.1 | aten, KernelGen | The scalar version of dunder_ior_tensor. |
| 162 | dunder_ior_tensor | Math | Beta | 5.1 | aten, KernelGen | The in-place version of bitwise or operation for tensor and scalar. |
| 163 | dunder_or_scalar | Math | Beta | 5.1 | aten, KernelGen | The scalar version of dunder_or_tensor. |
| 164 | dunder_or_tensor | Math | Beta | 5.1 | aten, KernelGen | The in-place version of bitwise or operation for tensor and scalar. |
| 165 | einsum | Reduction | Alpha | 5.1 | aten, KernelGen | Sums the product of the elements of the input operands along dimensions specified using a notation
based on the Einstein summation convention. |
| 166 | elu | NeuralNetwork | Stable | 2.2 | aten, nn.functional, pointwise | Apply the Exponential Linear Unit (ELU) function element-wise. |
| 167 | elu_ | NeuralNetwork | Stable | 4.0 | aten, pointwise | The in-place version of elu(). |
| 168 | elu_backward | NeuralNetwork | Stable | 4.0 | aten, pointwise | The backward version of elu(). |
| 169 | embedding | NeuralNetwork | Stable | 2.1 | aten, nn.functional | Generate a simple lookup table that looks up embeddings in a fixed dictionary and size.
Note that the parameter sequence differs from torch.nn.functional.embedding. |
| 170 | embedding_backward | NeuralNetwork | Stable | 3.0 | aten, NoCPU | The backward version of embedding(). |
| 171 | embedding_dense_backward | NeuralNetwork | Stable | 5.0 | aten | Calculates the gradient of the weight matrix for a dense embedding layer during backpropagation. |
| 172 | eq | Math | Stable | 2.0 | aten, pointwise | Computes element-wise equality. |
| 173 | eq_scalar | Math | Stable | 2.0 | aten, pointwise | Computes equality between scalars. |
| 174 | equal | Math | Stable | 5.0 | aten, Reduction | Returns True if two tensors have the same size and elements, False otherwise. |
| 175 | euclidean_dist | Math | Alpha | 5.1 | aten, KernelGen, pointwise | Computes pairwise Euclidean distances between rows of two 2D tensors. |
| 176 | erf | Science | Stable | 2.1 | aten | Computes the error function of input. |
| 177 | erf_ | Science | Stable | 2.2 | aten, pointwise | The in-place version of erf(). |
| 178 | exp | Math | Stable | 1.0 | aten, pointwise | Returns a new tensor with the exponential of the elements of the input tensor input. |
| 179 | exp_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of exp(). |
| 180 | exp_out | Math | Stable | 4.1 | aten, pointwise | A variant of exp2(), with out specified. |
| 181 | exp2 | Math | Stable | 4.0 | aten, pointwise | Computes the base two exponential function of input. |
| 182 | exp2_ | Math | Stable | 4.0 | aten, pointwise | The in-place version of exp2(). |
| 183 | expm1 | Math | Beta | 5.1 | aten | Computes the exponential of the elements minus 1 of input. |
| 184 | expm1_ | Math | Beta | 5.1 | aten | The inplace version of expm1. |
| 185 | expm1_out | Math | Beta | 5.1 | aten | A variant of expm1 that saves the output to the specified out. |
| 186 | exponential_ | Distribution | Stable | 2.1 | aten, skip_precision_check | Fills self tensor with elements drawn from a PDF (probability density function). |
| 187 | eye | LinearAlg | Stable | 3.0 | aten, Reduction | Returns a 2-D tensor with ones on the diagonal and zeros elsewhere. |
| 188 | eye_m | LinearAlg | Stable | 3.0 | aten, Reduction | Triton-based implementation of torch.eye_m(n, m), using 2D tiles to split the matrix into blocks. |
| 189 | feature_dropout | NeuralNetwork | Alpha | 5.1 | aten, KernelGen | Applies feature dropout to the input tensor. Randomly zeroes out entire channels of the input tensor with probability p.
Each batch element has its own independent channel mask. |
| 190 | feature_dropout_ | NeuralNetwork | Alpha | 5.1 | aten, KernelGen | The in-place version of feature_dropout(). |
| 191 | fill_scalar | Tensor | Stable | 2.2 | aten, pointwise | Fills a scalar with the specified value. |
| 192 | fill_scalar_ | Tensor | Stable | 2.2 | aten, pointwise | The in-place version of fill_scalar(). |
| 193 | fill_scalar_out | Tensor | Stable | 5.0 | aten, pointwise, KernelGen | A variant of fill_scalar() that assigns the output to an out tensor. |
| 194 | fill_tensor | Tensor | Stable | 2.2 | aten, pointwise | Fills a tensor with the specified value. |
| 195 | fill_tensor_ | Tensor | Stable | 2.2 | aten, pointwise | The in-place version of fill_tensor(). |
| 196 | fill_tensor_out | Tensor | Stable | 5.0 | aten, pointwise, KernelGen | A variant of fill_tensor() that assigns the output to an out tensor. |
| 197 | flash_attention_forward | NeuralNetwork | Stable | 3.0 | aten, NoCPU | |
| 198 | flash_attn_varlen_func | NeuralNetwork | Stable | 3.1 | aten, Attention, FlashAttention | Compute attention for sequences of variable lengths within a single batch.
Eliminating the need for padding. |
| 199 | flash_attn_varlen_opt_func | NeuralNetwork | Beta | 5.1 | aten, Attention, FlashAttention | A variant of flash_attn_varlen_func that has lse as an optional parameter. |
| 200 | flash_mla | NeuralNetwork | Stable | 3.0 | fused, Attention, vLLM | A variant of Multi-head Latent Attention (MLA). |
| 201 | flash_mla_sparse_fwd | NeuralNetwork | Alpha | 5.1 | fused, Attention, vLLM | Part of the FlashMLA. |
| 202 | flip | Tensor | Stable | 2.1 | aten, pointwise | Reverse the order of an n-D tensor along given axis in dims. |
| 203 | floor | Math | Beta | 5.0 | aten, KernelGen | Performs an element-wise floor operation, rounding each element of a tensor
down to the nearest integer less than or equal to itself. |
| 204 | floor_out | Math | Beta | 5.0 | aten, KernelGen | Performs an element-wise floor operation with output tensor, rounding each element
down to the nearest integer less than or equal to itself. |
| 205 | floor_ | Math | Beta | 5.0 | aten, KernelGen | Performs an in-place element-wise floor operation, rounding each element of a tensor
down to the nearest integer less than or equal to itself. |
| 206 | floor_divide_scalar | Math | Stable | 2.1 | aten | Computes input divided by other, elementwise, and floors the result. |
| 207 | floor_divide_scalar_ | Math | Stable | 2.2 | aten | Computes input divided by other, elementwise, and floors the result. |
| 208 | floor_divide_tensor | Math | Stable | 2.1 | aten | Computes input divided by other, elementwise, and floors the result. |
| 209 | floor_divide_tensor_ | Math | Stable | 2.2 | aten | Computes input divided by other, elementwise, and floors the result. |
| 210 | fmin | Math | Beta | 5.0 | aten, KernelGen | Computes the element-wise minimum of two tensors, specially handling NaN values
by prioritizing the numerical value. Unlike minimum(), if one input is NaN and the other is a number,
fmin() returns the number. It supports broadcasting, type promotion, and operates on both CPU and GPU. |
| 211 | fmin_out | Math | Beta | 5.0 | aten, KernelGen | A variant of fmin() that assigns the output to the out tensor. |
| 212 | fmod_scalar | Math | Alpha | 5.1 | aten, KernelGen | Computes the element-wise remainder of division of input by a scalar divisor. |
| 213 | fmod_scalar_ | Math | Alpha | 5.1 | aten, KernelGen | In-place version of fmod with a scalar divisor. |
| 214 | fmod_tensor | Math | Alpha | 5.1 | aten, KernelGen | Computes the element-wise remainder of division of input by a tensor divisor. |
| 215 | fmod_tensor_ | Math | Alpha | 5.1 | aten, KernelGen | In-place version of fmod with a tensor divisor. |
| 216 | fp8_mqa_logits | NeuralNetwork | Beta | 5.1 | fused, vLLM | For each token in the given E4M3 tensor, iterate all tokens from two other given tensors,
calculate the logit. |
| 217 | full | Tensor | Stable | 2.1 | aten, pointwise, skip_precision_check | Creates a tensor of size size filled with fill_value.
The tensor's dtype is inferred from fill_value. |
| 218 | full_like | Tensor | Stable | 2.1 | aten, pointwise | Returns a tensor with the same size as input filled with fill_value. |
| 219 | functional_sym_constrain_range_for_size | Tensor | Beta | 5.0 | aten, KernelGen | A low-level function used in symbolic shape analysis to restrict the possible numerical range
(min/max) of an unbacked symbolic integer. |
| 220 | fused_add_rms_norm | NeuralNetwork | Stable | 2.0 | fused, Normalization | |
| 221 | fused_deepseek_v4_qnorm_rope_kv_rope_quant_insert | NeuralNetwork | Beta | 5.1 | fused, vLLM, DeepSeekV4 | Horizontally-fused DeepseekV4-MLA.
per-head RMSNorm + GPT-J RoPE for Q, and GPT-J RoPE + UE8M0 FP8
quant + paged cache insert for KV, all in one kernel launch. |
| 222 | fused_experts_impl | NeuralNetwork | Beta | 5.1 | fused, vLLM, MoE | An implementation of fused MoE. |
| 223 | fused_moe | NeuralNetwork | Beta | 5.1 | fused, vLLM, MoE | The generic interface for fused MoE. |
| 224 | fused_q_kv_rmsnorm | NeuralNetwork | Beta | 5.1 | fused, Attention, vLLM, DeepSeekV4 | Applies RMSNorm to Q and KV tensors in a single fused kernel for DeepSeekV4 attention. |
| 225 | fused_recurrent_gated_delta_rule_fwd | Attention | Alpha | 5.0 | fused, FLA | The forward case for fused_recurrent_gated_delta_rule used in Flash Linear Attention (FLA). |
| 226 | gather | Tensor | Stable | 2.2 | aten, Reduction | Gathers values along an axis specified by dim. |
| 227 | gather_backward | Tensor | Stable | 2.2 | aten, Reduction | The backward version of gather(). |
| 228 | gcd | Math | Beta | 5.1 | aten | Computes the element-wise greatest common divisor (GCD) of input and other. |
| 229 | gcd_out | Math | Beta | 5.1 | aten | A variant of gcd() that allows the output to be assigned to the specified out. |
| 230 | ge | Math | Stable | 2.0 | aten, pointwise | Computes input is greater or equal to other element-wise. |
| 231 | ge_scalar | Math | Stable | 2.0 | aten, pointwise | The scalar version of ge(). |
| 232 | geglu | NeuralNetwork | Alpha | 4.2 | fused, Activation, Transformer | Gaussian Error Gated Linear Unit with GELU activation instead of sigmoid function. |
| 233 | gelu | NeuralNetwork | Stable | 1.0 | aten, pointwise, Activation, nn.functional | Apply Cumulative Distribution Function for Gaussian Distribution function element-wise. |
| 234 | gelu_ | NeuralNetwork | Stable | 2.2 | aten, Activation, pointwise | The in-place version of gelu(). |
| 235 | gelu_and_mul | NeuralNetwork | Stable | 2.0 | fused, pointwise, Activation | An activation function for GeGLU. |
| 236 | gelu_backward | NeuralNetwork | Stable | 3.0 | aten, Activation, pointwise | The backward version of gelu(). |
| 237 | get_paged_mqa_logits_metadata | NeuralNetwork | Beta | 5.1 | vLLM | Build scheduling metadata for paged MQA logits. |
| 238 | get_scheduler_metadata | Attention | Stable | 4.0 | NoCPU, vLLM | Computes scheduling metadata for attention work partitioning so that
CPU computations can be routed to ISA-specific kernel implmentations.
The metadata is stored in a tensor. |
| 239 | glu | NeuralNetwork | Stable | 3.0 | aten, Activation, pointwise | Gated Linear Unit activation for modulating the output of a linear transformation with a gate. |
| 240 | glu_backward | NeuralNetwork | Stable | 4.0 | aten, Activation, pointwise | The backward version of glu(). |
| 241 | greater | Math | Stable | 5.1 | aten | Test if input is greater than other elementwise. |
| 242 | greater_out | Math | Stable | 5.1 | aten | A variant of greater that saves the output to the specified out. |
| 243 | greater_scalar | Math | Stable | 5.1 | aten | A variant of greater for scalar variables. |
| 244 | greater_scalar_out | Math | Stable | 5.1 | aten | A variant of greater_out that saves the output to the specified out. |
| 245 | grid_sample | NeuralNetwork | Alpha | 5.1 | aten, nn.functional | Given an input and a flow-field grid, computes the output using input values and
pixel locations from grid. |
| 246 | group_norm | NeuralNetwork | Stable | 2.0 | aten, Reduction | An internal IR for applying Group Normalization for last certain number of dimensions. |
| 247 | group_norm_backward | NeuralNetwork | Stable | 3.0 | aten, Reduction | The backward case for group_norm(). |
| 248 | grouped_mm | BLAS | Beta | 5.1 | aten | Grouped matrix multiply is a functional operator designed to accelerate Mixture-of-Experts (MoE) models
by computing multiple matrix multiplications in a single kernel launch. |
| 249 | grouped_topk | MoE | Stable | 5.0 | fused, NoCPU, vLLM | A specialized routing mechanism used in Mixture-of-Experts (MoE) models (like DeepSeek-V3/R1)
to select top-k experts by first grouping them, rather than selecting globally. |
| 250 | gt | Math | Stable | 2.0 | aten, pointwise | Computes that input is greater than other element-wise. |
| 251 | gt_scalar | Math | Stable | 2.0 | aten, pointwise | The scalar version of gt(). |
| 252 | hardsigmoid | NeuralNetwork | Beta | 5.0 | aten, pointwise, nn.functional, Activation, KernelGen | An activation function that provides a piecewise linear approximation of the standard sigmoid function,
mapping inputs to a range between 0 and 1. |
| 253 | hardsigmoid_out | NeuralNetwork | Beta | 5.0 | aten, pointwise, nn.functional, Activation, KernelGen | A variant of hardsigmoid that supports an output tensor to receive the result. |
| 254 | hardswish_ | NeuralNetwork | Beta | 5.0 | aten, pointwise, KernelGen, Activation | Applies the Hard Swish activation function, commonly used in models like MobileNetV3
to improve accuracy while reducing computational cost compared to traditional Swish.
This is an in-place version. |
| 255 | hc_split_sinkhorn_forward | NeuralNetwork | Beta | 5.1 | fused | Computes a differentiable approximation of the Wasserstein distance (Optimal Transport)
between two probability distributions or point clouds. |
| 256 | histc | Math | Alpha | 5.1 | aten, KernelGen | Computes the histogram of a tensor, binning each element into equal-width bins. |
| 257 | hstack | Tensor | Stable | 2.2 | aten | Stack tensors in sequence horizontally (column wise). This is equivalent to concatenation
along the first axis for 1-D tensors, and along the second axis for all other tensors. |
| 258 | hypot | Math | Beta | 5.0 | aten, KernelGen | Given the legs of a right triangle, return its hypotenuse.
The shapes of both input tensors must be broadcastable. |
| 259 | hypot_out | Math | Beta | 5.0 | aten, KernelGen | Given the legs of a right triangle, return its hypotenuse.
The shapes of both input tensors must be broadcastable.
This is a variant of hypot that allows the output to be a different tensor. |
| 260 | i0 | Math | Beta | 5.0 | aten, KernelGen | Computes the modified Bessel function of the first kind of order zero element-wise for a given input tensor. |
| 261 | i0_ | Math | Beta | 5.0 | aten, KernelGen | The inplace version of i0. |
| 262 | i0_out | Math | Beta | 5.0 | aten, KernelGen | A variant of i0 that assigns the output to the out tensor. |
| 263 | index | Reduction | Stable | 4.2 | aten | Extract, access or modify specific elements, slices, or subsets of data within a tensor.
The location of data is specified for each dimension, starting from index 0. |
| 264 | index_add | Tensor | Stable | 2.2 | aten | Accumulate the elements of alpha times source into the input tensor
by adding to the indices in the order given in index. |
| 265 | index_add_ | Tensor | Stable | 4.0 | aten | The in-place version of index_add(). |
| 266 | index_copy | Tensor | Beta | 5.1 | aten, KernelGen | Copies the elements from source into input at the positions specified by
index along the given dim. |
| 267 | index_copy_ | Tensor | Beta | 5.1 | aten, KernelGen | The in-place version of index_copy(). |
| 268 | index_put | Tensor | Stable | 2.2 | aten | Puts values from the tensor values into the tensor input using the indices specified
in indices (which is a tuple of Tensors). |
| 269 | index_put_ | Tensor | Stable | 3.0 | aten | The in-place version of index_put(). |
| 270 | index_put_impl | Tensor | Beta | 5.1 | aten | An internal C++ function that handles the heavy lifting for placing values into a tensor at specific indices. |
| 271 | index_select | Tensor | Stable | 2.1 | aten | Returns a new tensor which indexes the input tensor along dimension dim
using the entries in index. |
| 272 | indexer_k_quant_and_cache | Quantization | Beta | 5.1 | fused, vLLM | This is a fused operator that quantizes K tensors and writes them into the FP8 KV cache. |
| 273 | inplace_fused_experts | MoE | Beta | 5.0 | fused, Activation, vLLM | This operator writes output directly to hidden_states. |
| 274 | instance_norm | NeuralNetwork | Beta | 5.1 | fused | Apply Instance Normalization independently for each channel in every data sample within a batch. |
| 275 | is_all_true | Tensor | Beta | 5.1 | aten, pointwise | The low-level implementation for checking if all elements in a tensor are True. |
| 276 | isclose | Math | Stable | 2.1 | aten, pointwise | Returns a new tensor with boolean elements representing if each element of input
is "close" to the corresponding element of other.
The closeness is defined with rtol and atol. |
| 277 | isfinite | Math | Stable | 2.1 | aten, pointwise | Returns a new tensor with boolean elements representing if each element is finite or not. |
| 278 | isin | Tensor | Stable | 2.2 | aten | Tests if each element of elements is in test_elements.
Returns a boolean tensor of the same shape as elements that is True
for elements in test_elements and False otherwise. |
| 279 | isin_scalar_tensor | Tensor | Stable | 2.2 | aten | A variant of isin(). |
| 280 | isin_tensor_scalar | Tensor | Stable | 2.2 | aten | A variant of isin(). |
| 281 | isinf | Math | Stable | 2.0 | aten, pointwise | Tests if each element of input is infinite (positive or negative infinity) or not. |
| 282 | isnan | Math | Stable | 2.0 | aten, pointwise | Returns a new tensor with boolean elements representing if each element of input is NaN or not. |
| 283 | isneginf | Math | Stable | 5.1 | aten, KernelGen, pointwise | Tests if each element of input is negative infinity or not. |
| 284 | isneginf_out | Math | Stable | 5.1 | aten, KernelGen, pointwise | A variant of isneginf that saves the output to the specified out. |
| 285 | kron | LinearAlg | Stable | 2.2 | aten | Computes the Kronecker product of input and other. |
| 286 | layer_norm | NeuralNetwork | Stable | 1.0 | aten | An internal IR for applying Layer Normalization for last certain number of dimensions. |
| 287 | layer_norm_backward | Reduction | Stable | 3.0 | aten | The backward case for layer_norm(). |
| 288 | le | Math | Stable | 2.0 | aten, pointwise | Computes that input is less than or equal to other element-wise. |
| 289 | le_scalar | Math | Stable | 2.0 | aten | The scalar version of le(). |
| 290 | leaky_relu | NeuralNetwork | Beta | 5.1 | aten | Applies the LeakyReLU function element-wise. |
| 291 | leaky_relu_ | NeuralNetwork | Beta | 5.1 | aten | The in-place version of leaky_relu(). |
| 292 | leaky_relu_out | NeuralNetwork | Beta | 5.1 | aten | A variant of leaky_relu(). |
| 293 | lerp_scalar | LinearAlg | Stable | 3.0 | aten, pointwise | The scalar version of lerp(). |
| 294 | lerp_scalar_ | LinearAlg | Stable | 3.0 | aten, pointwise | The in-place, scalar version of lerp(). |
| 295 | lerp_tensor | LinearAlg | Stable | 3.0 | aten, pointwise | Performs a linear interpolation of two tensors start (given by input) and end
based on a scalar or tensor weight and returns the resulting out tensor. |
| 296 | lerp_tensor_ | LinearAlg | Stable | 3.0 | aten, pointwise | The in-place version of lerp(). |
| 297 | lift_fresh_copy | Tensor | Beta | 5.0 | aten, KernelGen | Creates a new, independent copy of a tensor within a compiled graph. |
| 298 | linspace | Tensor | Stable | 2.2 | aten | Creates a one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive. |
| 299 | log | Math | Stable | 2.2 | aten, pointwise | Returns a new tensor with the natural logarithm of the elements of input. |
| 300 | log10 | Math | Beta | 5.1 | aten, pointwise | Returns a new tensor with the logarithm to the base 10 of the elements of input. |
| 301 | log10_ | Math | Beta | 5.1 | aten, pointwise | The in-place version of log10(). |
| 302 | log10_out | Math | Beta | 5.1 | aten, pointwise | A variant of log10() that assigns the output to the provided out. |
| 303 | log1p | Math | Beta | 5.0 | aten, KernelGen | Computes the natural logarithm of 1+x(y_i=log_e(x_i+1)) for each element in the input tensor. |
| 304 | log1p_ | Math | Beta | 5.0 | aten, KernelGen | Computes the natural logarithm of 1+x(y_i=log_e(x_i+1)) for each element in the input tensor in-place. |
| 305 | log_sigmoid | NeuralNetwork | Stable | 2.2 | aten, pointwise, nn.functional | Applies the Logsigmoid function element-wise. |
| 306 | log_softmax | NeuralNetwork | Stable | 3.0 | aten, Reduction | An internal IR for applying a softmax followed by a logarithm. |
| 307 | log_softmax_backward_data | NeuralNetwork | Alpha | 5.0 | aten, KernelGen | Computes the gradient of the input tensor with respect to a log_softmax operation
during backpropagation. |
| 308 | log_softmax_backward_data_out | NeuralNetwork | Alpha | 5.0 | aten, KernelGen | A variant of _log_softmax_backward_data that assigns the output to the out tensor. |
| 309 | log_softmax_out | NeuralNetwork | Stable | 3.0 | aten, Reduction | An internal IR for applying a softmax followed by a logarithm. |
| 310 | logaddexp | Math | Beta | 5.0 | aten, pointwise, KernelGen | Computes the element-wise logarithm of the sum of the exponentials of two input tensors. |
| 311 | logaddexp_out | Math | Beta | 5.0 | aten, pointwise, KernelGen | A variant of logaddexp that allows the output to be assigned to an out tensor. |
| 312 | logical_and | Math | Stable | 2.2 | aten, pointwise | Computes the element-wise logical AND of the given input tensors.
Zeros are treated as False and nonzeros are treated as True. |
| 313 | logical_and_ | Math | Stable | 5.0 | aten, pointwise | The in-place version of logical_and(). |
| 314 | logical_not | Math | Stable | 2.2 | aten, pointwise | Computes the element-wise logical NOT of the given input tensor. |
| 315 | logical_or | Math | Stable | 2.2 | aten, pointwise | Computes the element-wise logical OR of the given input tensors. |
| 316 | logical_or_ | Math | Stable | 5.0 | aten, pointwise | The in-place version of logical_or(). |
| 317 | logical_xor | Math | Stable | 2.2 | aten, pointwise | Computes the element-wise logical XOR of the given input tensors. |
| 318 | logit | LinearAlg | Beta | 5.0 | aten, pointwise, KernelGen | Returns a new tensor with the logit of the elements of input.
input is clamped to [eps, 1-eps] when eps is not None.
When eps is None and input<0 or input>1, the function will yield NaN. |
| 319 | logit_ | LinearAlg | Beta | 5.0 | aten, pointwise, KernelGen | The in-place version of logit(). |
| 320 | logit_out | LinearAlg | Beta | 5.0 | aten, pointwise, KernelGen | A variant of logit that allows the output to be assigned to another tensor. |
| 321 | logspace | tensor | Stable | 4.0 | aten | Creates a one-dimensional tensor of size steps whose values are evenly spaced
from base^start to base^end, inclusive, on a logarithmic scale with base base. |
| 322 | logsumexp | Math | Alpha | 5.1 | aten, KernelGen | Computes the log of the sum of exponentials of elements in the input tensor along given dimensions. |
| 323 | lt | Math | Stable | 2.0 | aten, pointwise | Computes that input is less than other element-wise. |
| 324 | lt_scalar | Math | Stable | 2.0 | aten, pointwise | The scalar version of lt. |
| 325 | margin_ranking_loss | NeuralNetwork | Beta | 5.1 | aten, nn.functional | Compute the margin ranking loss. |
| 326 | masked_fill | Tensor | Stable | 2.2 | aten, pointwise | Fills elements of given tensor with value where mask is True. |
| 327 | masked_fill_ | Tensor | Stable | 2.2 | aten, pointwise, skip_precision_check | The in-place version of masked_fill(). |
| 328 | masked_fill_scalar | Tensor | Stable | 2.2 | aten, pointwise | Fills elements of given tensor with value where mask is True. |
| 329 | masked_fill_scalar_ | Tensor | Stable | 2.2 | aten, pointwise, skip_precision_check | The in-place version of masked_fill(). |
| 330 | masked_scatter | tensor | Stable | 4.2 | aten | Copies elements from source into the given tensor at positions where the mask is True. |
| 331 | masked_scatter_ | tensor | Stable | 4.2 | aten | The in-place version of masked_scatter(). |
| 332 | masked_select | Tensor | Stable | 2.1 | aten | Returns a new 1-D tensor which indexes the input tensor according to
the boolean mask mask which is a BoolTensor. |
| 333 | max | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the maximum value of all elements in the input tensor. |
| 334 | max_dim | LinearAlg | Stable | 2.0 | aten, Reduction | Returns a namedtuple (values, indices) where values is the maximum value
of each row of the input tensor in the given dimension dim.
And indices is the index location of each maximum value found (argmax). |
| 335 | max_pool2d_backward | IR | Stable | 4.0 | aten | Applies a 2D max pooling over an input signal composed of several input planes.
This is an IR representation rather than a public API and it is for the backward step. |
| 336 | max_pool2d_with_indices | IR | Stable | 4.0 | aten | Applies a 2D max pooling over an input signal composed of several input planes.
This is an IR representation rather than a public API. |
| 337 | max_pool3d_backward | NeuralNetwork | Beta | 5.1 | aten, nn.functional | The backward version of max_pool2d_with_indices(). |
| 338 | max_pool3d_with_indices | NeuralNetwork | Beta | 5.1 | aten, nn.functional | Applies a 3D max pooling over an input signal composed of several input planes. |
| 339 | maximum | Math | Stable | 2.1 | aten, pointwise | Computes the element-wise maximum of input and other. |
| 340 | mean | LinearAlg | Stable | 1.0 | aten, Reduction | Returns the mean value of all elements in the input tensor. Input must be floating point or complex. |
| 341 | mean_dim | Reduction | Stable | 2.0 | aten | Returns the mean value of each row of the input tensor in the given dimension dim.
If dim is a list of dimensions, reduce over all of them. |
| 342 | hc_head_fused_kernel | NeuralNetwork | Beta | 5.2 | fused, vLLM, DSA | The head fusion kernel for MHC (Manifold-Constrained Hyper-Connections).
This fused implementation computes RMS-normalized hidden states and applies
per-head weighted mixing to produce the output activations. |
| 343 | mhc_bwd | NeuralNetwork | Beta | 5.1 | fused, vLLM, DSA | The backward case for MHC (Manifold-Constrained Hyper-Connections).
This is the Triton implmentation for Sinkhorn implicit CG differentiation.
It computes the gradient of the Sinkhorn normalization using implicit differentiation via the conjugate gradient method. |
| 344 | mhc_post | NeuralNetwork | Beta | 5.1 | fused, vLLM, DSA | Triton implementation of mHC Post operator (optimized v3). |
| 345 | mhc_pre | NeuralNetwork | Beta | 5.1 | fused, vLLM, DSA | Triton implementation of mHC Pre operator (optimized v2). |
| 346 | min | Tensor | Stable | 2.0 | aten, Reduction | Returns the minimum value of all elements in the input tensor. |
| 347 | min_dim | LinearAlg | Stable | 2.0 | aten, Reduction | Returns a namedtuple (values, indices) where values is the minimum value of
each row of the input tensor in the given dimension dim.
And indices is the index location of each minimum value found (argmin). |
| 348 | minimum | Math | Stable | 2.1 | aten, pointwise | Computes the element-wise minimum of input and other. |
| 349 | mm | BLAS | Stable | 1.0 | aten | Performs a matrix multiplication of the two input matrices. |
| 350 | mm_out | BLAS | Stable | 3.0 | aten | A variant of mm() with out specified. |
| 351 | moe_align_block_size_triton | MoE | Stable | 4.2 | fused, Reduction, vLLM | Aligns the token distribution across experts to be compatible with block size
for matrix multiplication. |
| 352 | moe_sum | MoE | Stable | 4.2 | fused, Reduction, vLLM | An implementation of Mixture of Experts (MoE) with sum-based aggregation
instead of the more common weighted average. |
| 353 | mse_loss | NeuralNetwork | Stable | 2.2 | aten, pointwise, nn.functional | Compute the element-wise mean squared error, with optional weighting. |
| 354 | reflection_pad1d_backward | Math | Alpha | 5.1 | aten, KernelGen | Computes the gradient of reflection_pad1d with respect to the input tensor. |
| 355 | smooth_l1_loss | NeuralNetwork | Alpha | 5.1 | aten, pointwise, nn.functional | Compute the smooth L1 loss between input and target tensors. |
| 356 | smooth_l1_loss_backward | NeuralNetwork | Alpha | 5.1 | aten, pointwise, nn.functional | Compute the gradient of smooth L1 loss with respect to the input tensor. |
| 357 | mul | Math | Stable | 1.0 | aten, pointwise | Multiplies input by other. |
| 358 | mul_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of mul(). |
| 359 | multinomial | Distribution | Stable | 2.1 | aten, skip_precision_check | Returns a tensor where each row contains num_samples indices sampled
from the multinomial probability distribution located in the corresponding row
of tensor input. |
| 360 | mv | BLAS | Stable | 2.0 | aten | Performs a matrix-vector product of the matrix input and the vector vec. |
| 361 | nan_to_num | Math | Stable | 3.0 | aten, pointwise | Replaces NaN, positive infinity, and negative infinity values in input
with the values specified by nan, posinf, and neginf, respectively. |
| 362 | ne | Math | Stable | 2.0 | aten, pointwise | Computes that input is not equal to other element-wise. |
| 363 | ne_scalar | Math | Stable | 2.0 | aten, pointwise | The scalar version of ne(). |
| 364 | neg | Math | Stable | 2.0 | aten, pointwise | Returns a new tensor with the negative of the elements of input. |
| 365 | neg_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of neg(). |
| 366 | new_full | Tensor | Beta | 5.1 | aten, pointwise | Returns a Tensor of size size filled with fill_value.
By default, the returned Tensor has the same torch.dtype and torch.device
as this tensor. |
| 367 | nll_loss_backward | NeuralNetwork | Stable | 2.2 | aten, IR | Compute the negative log likelihood loss. This is the backward case. |
| 368 | nll_loss_forward | NeuralNetwork | Stable | 2.2 | aten, IR | Compute the negative log likelihood loss. This is the forward case. |
| 369 | nll_loss2d_backward | NeuralNetwork | Stable | 2.2 | aten, IR | An internal IR for supporting torch.nn.NLLLoss2d, which has been deprecated
and is now integrated into the standard torch.nn.NLLLoss.
This is the backward case. |
| 370 | nll_loss2d_forward | NeuralNetwork | Stable | 2.2 | aten, IR | An internal IR for supporting torch.nn.NLLLoss2d, which has been deprecated
and is now integrated into the standard torch.nn.NLLLoss. This is the forward case. |
| 371 | nll_loss_nd_backward | NeuralNetwork | Stable | 5.0 | aten | Measures the performance of a classification model by penalizing low probabilities for correct classe.s
This computes the gradients of this loss with respect to model parameters using automatic differentiation. |
| 372 | nll_loss_nd_forward | NeuralNetwork | Stable | 5.0 | aten | Measures the performance of a classification model by calculating the negative log probability
of the true class. This defines the computation flow, transforming input data through layers
to produce output predictions. |
| 373 | nonzero | Tensor | Stable | 2.1 | aten | Returns a 2-D tensor where each row is the index for a nonzero value.
When as_tuple is explicitly set to True, this returns a tuple of 1-D index tensors,
allowing for advanced indexing of all nonzero values. |
| 374 | nonzero_numpy | Tensor | Alpha | 5.1 | aten, KernelGen | Returns a tuple of 1-D tensors, one for each dimension, containing the indices of
the nonzero elements in the input tensor (NumPy-style). |
| 375 | normal_float_float_ | Distribution | Stable | 5.0 | aten, pointwise, skip_precision_check | Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.
This is one of the variants that takes a float mean and a float std. |
| 376 | normal_float_tensor | Distribution | Stable | 2.1 | aten, pointwise | Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.
This is one of the variants that takes a float mean and a tensor std. |
| 377 | normal_tensor_float | Distribution | Stable | 2.1 | aten, pointwise | Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.
This is one of the variants that takes a tensor mean and a float std. |
| 378 | normal_tensor_tensor | Distribution | Stable | 2.1 | aten, pointwise | Returns a tensor of random numbers drawn from separate normal distributions
whose mean and standard deviation are given.
This is one of the variants that takes a tensor mean and a tensor std. |
| 379 | normed_cumsum | Reduction | Stable | 2.1 | aten | Get the normalized cumulative sum where each step is divided by the total sum
of the dataset, resulting in values ranging from 0 to 1.
Internally used by the multinomial operator. |
| 380 | one_hot | NeuralNetwork | Stable | 5.0 | aten, nn.functional, KernelGen | Takes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes)
that have zeros everywhere except where the index of last dimension matches the corresponding value
of the input tensor, in which case it will be 1. |
| 381 | ones | Tensor | Stable | 2.1 | aten, skip_precision_check | Returns a tensor filled with the scalar value 1, with the shape defined
by the variable argument size. |
| 382 | ones_like | Tensor | Stable | 2.1 | aten | Returns a tensor filled with the scalar value 1, with the same size as input. |
| 383 | outer | BLAS | Stable | 2.0 | fused | Computes outer product of self and the input vector.
If the self tensor is a vector of size n and the input tensor is a vector of size m,
the out tensor (if specified) must be a matrix of size n * m. |
| 384 | outplace_fused_experts | MoE | Beta | 5.0 | fused, Activation, vLLM | This operator allocates and returns a new output tensor. |
| 385 | pack_seq_triton | NeuralNetwork | Beta | 5.1 | fused, vLLM, DeepSeekV4 | Pack variable-length token sequences into a padded batched tensor. |
| 386 | pad | NeuralNetwork | Stable | 2.1 | aten, pointwise, nn.functional | This pads a tenor using the specified mode. |
| 387 | per_token_group_quant_fp8 | Quantization | Alpha | 4.2 | NoCPU, vLLM | Function to perform per-token-group quantization on an input tensor x.
It converts the tensor values into signed float8 values and returns the
quantized tensor along with the scaling factor used for quantization. |
| 388 | pixel_shuffle | NeuralNetwork | Beta | 5.0 | aten, nn.functional | Rearranges elements in a tensor to a new tensor of different shape. |
| 389 | pixel_unshuffle | NeuralNetwork | Beta | 5.0 | aten, KernelGen | Rearranges elements from a low-resolution feature map with many channels
into a higher-resolution feature map with fewer channels. |
| 390 | pixel_unshuffle_out | NeuralNetwork | Beta | 5.0 | aten, KernelGen | A variant of pixel_unshuffle that assigns the output to the out tensor. |
| 391 | poisson | Math | Beta | 5.1 | aten, KernelGen | Returns a tensor of the same size as input with each element sampled
from a Poisson distribution with rate given by the corresponding element in input. |
| 392 | polar | Math | Stable | 3.0 | aten, pointwise | Constructs a complex tensor whose elements are Cartesian coordinates corresponding to
the polar coordinates with absolute value abs and angle angle. |
| 393 | pow_scalar | Math | Stable | 1.0 | aten | Takes the power of each element in input with exponent and returns a tensor with the result.
The input is a single float, while the exponent is a tensor. |
| 394 | pow_tensor_scalar | Math | Stable | 1.0 | aten, pointwise | Takes the power of each element in input with exponent and returns a tensor with the result.
The input is a tensor, while the exponent is a float. |
| 395 | pow_tensor_scalar_ | Math | Stable | 2.2 | aten, pointwise | This is the in-place version of pow_tensor_scalar(). |
| 396 | pow_tensor_tensor | Math | Stable | 1.0 | aten, pointwise | Takes the power of each element in input with exponent and returns a tensor with the result.
The input is a tensor, while the exponent is also a tensor. |
| 397 | pow_tensor_tensor_ | Math | Stable | 2.2 | aten, pointwise | This is the in-place version of pow_tensor_tensor(). |
| 398 | prelu | NeuralNetwork | Beta | 5.0 | aten, Activation, pointwise, nn.functional, KernelGen | An activation function used in neural networks that improves upon ReLU (Rectified Linear Unit)
by allowing the network to learn the slope of negative inputs.
It performs an element-wise operation that keeps positive values and scales negative values
by a learnable parameter. |
| 399 | prod | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the product of all elements in the input tensor. |
| 400 | prod_dim_int | Reduction | Stable | 2.0 | aten | Returns the product of each row of the input tensor in the given dimension dim. |
| 401 | quantile | Tensor | Stable | 2.2 | aten | Computes the q-th quantiles of each row of the input tensor along the dimension dim. |
| 402 | rad2deg | Math | Alpha | 5.1 | aten, KernelGen, pointwise | Converts each element from angles in radians to degrees. |
| 403 | rad2deg_ | Math | Alpha | 5.1 | aten, KernelGen, pointwise | In-place version of rad2deg. |
| 404 | rand | Distribution | Stable | 2.1 | aten | Returns a tensor filled with random numbers from a uniform distribution on the interval [0,1). |
| 405 | rand_like | Distribution | Stable | 2.1 | aten | Returns a tensor with the same size as input that is filled with random numbers
from a uniform distribution on the interval [0,1). |
| 406 | randn | Distribution | Stable | 2.1 | aten | Returns a tensor filled with random numbers from a normal distribution with mean 0
and variance 1 (also called the standard normal distribution). |
| 407 | randn_like | Distribution | Stable | 2.1 | aten | Returns a tensor with the same size as input that is filled with random numbers
from a normal distribution with mean 0 and variance 1. |
| 408 | randint | Distribution | Alpha | 5.1 | aten, KernelGen | Returns a tensor filled with random integers generated uniformly between low (inclusive) and high (exclusive). |
| 409 | randperm | Distribution | Stable | 2.2 | aten, skip_precision_check | Returns a random permutation of integers from 0 to n - 1. |
| 410 | reciprocal | Math | Stable | 1.0 | aten, pointwise | Returns a new tensor with the reciprocal of the elements of input. |
| 411 | reciprocal_ | Math | Stable | 2.2 | aten, pointwise | This is the in-place version of reciprocal(). |
| 412 | reflection_pad1d | NeuralNetwork | Beta | 5.0 | aten, pointwise, KernelGen | Pads the input 3D or 2D tensor (typically representing signals or sequences)
by reflecting the boundary values at the edges. |
| 413 | reflection_pad1d_out | NeuralNetwork | Beta | 5.0 | aten, pointwise, KernelGen | A variant of reflection_pad1d that assigns the output to out tensor. |
| 414 | reflection_pad2d | NeuralNetwork | Beta | 5.0 | aten, pointwise, KernelGen | Pads the input 3D or 2D tensor (typically representing signals or sequences)
by reflecting the boundary values at the both edges. |
| 415 | reflection_pad2d_out | NeuralNetwork | Beta | 5.0 | aten, pointwise, KernelGen | A variant of reflection_pad2d that assigns the output to out tensor. |
| 416 | reglu | NeuralNetwork | Alpha | 4.2 | fused, Transformer | Rectified Gated Linear Unit is a variant of GLU that uses ReLU instead of the sigmoid function for gating. |
| 417 | relu | NeuralNetwork | Stable | 1.0 | aten, Activation, pointwise, nn.functional | Apply the RELU (Rectified Linear Unit) activation function element-wise. |
| 418 | relu_ | NeuralNetwork | Stable | 2.2 | aten, pointwise, Activation | This is the in-place version of relu(). |
| 419 | relu6 | NeuralNetwork | Beta | 5.0 | aten, pointwise, Activation, KernelGen | Applies the element-wise function f(x)=min(max(0,x),6).
This is a variation of the standard ReLU activation function that "caps" its output
at a maximum value of 6. |
| 420 | remainder_scalar | Math | Stable | 2.2 | aten | Computes Python's modulus operation entrywise. The result has the same sign
as the divisor other and its absolute value is less than that of other. |
| 421 | remainder_scalar_ | Math | Stable | 2.2 | aten | This is the in-place version of remainder(). |
| 422 | remainder_scalar_tensor | Math | Stable | 2.2 | aten | Computes Python's modulus operation entrywise. The result has the same sign
as the divisor other and its absolute value is less than that of other. |
| 423 | remainder_tensor | Math | Stable | 2.2 | aten | Computes Python's modulus operation entrywise. The result has the same sign
as the divisor other and its absolute value is less than that of other. |
| 424 | remainder_tensor_ | Math | Stable | 2.2 | aten | This is the in-place version of remainder(). |
| 425 | repeat | Tensor | Stable | 2.1 | aten | Repeats this tensor along the specified dimensions. |
| 426 | repeat_interleave_self_int | Tensor | Stable | 2.2 | aten, pointwise | Repeats elements of a tensor. The number of repetitions is specified as an integer repeats. |
| 427 | repeat_interleave_self_tensor | Tensor | Stable | 2.2 | aten, pointwise | Repeats elements of a tensor. The number of repetitions is specified as a tensor repeats.
repeats is broadcasted to fit the shape of the given axis. |
| 428 | repeat_interleave_tensor | Tensor | Stable | 2.2 | aten, pointwise | Repeats 0 repeats[0] times, 1 repeats[1] times, 2 repeats[2] times, etc. |
| 429 | replication_pad1d | Tensor | Beta | 5.0 | aten, KernelGen | Pads the edge of a 1D input tensor by repeating the boundary values. |
| 430 | replication_pad1d_out | Tensor | Beta | 5.0 | aten, KernelGen | A variant of replication_pad1d that assigns the output to the out tensor. |
| 431 | replication_pad3d | NeuralNetwork | Alpha | 5.0 | aten | Pads the edge of a 3D input tensor by repeating the boundary values. |
| 432 | reshape_and_cache | Attention | Stable | 3.0 | fused, vLLM | Store the key/value token states into the pre-allcated kv_cache buffers of paged attention. |
| 433 | reshape_and_cache_flash | Attention | Stable | 3.0 | fused | Store the key/value token states into the pre-allcated kv_cache buffers of paged attention. |
| 434 | resolve_conj | Science | Stable | 2.1 | aten | Returns a new tensor with materialized conjugation if input's conjugate bit is set to True,
else returns input. The output tensor will always have its conjugate bit set to False. |
| 435 | resolve_neg | Science | Stable | 2.1 | aten | Returns a new tensor with materialized negation if input's negative bit is set to True,
else returns input. The output tensor will always have its negative bit set to False. |
| 436 | rms_norm | NeuralNetwork | Stable | 2.0 | aten, nn.functional, Reduction | Apply Root Mean Square Layer Normalization over a mini-batch of inputs. |
| 437 | roll | BLAS | Beta | 5.1 | aten, KernelGen | Roll the tensor input along the given dimension(s). Elements that are shifted beyond the last position
are re-introduced at the first position. |
| 438 | round | Math | Beta | 5.1 | aten, pointwise | Rounds elements of input to the nearest integer. |
| 439 | round_ | Math | Beta | 5.1 | aten, pointwise | The inplace version of round. |
| 440 | round_out | Math | Beta | 5.1 | aten, pointwise | A variant of round that assigns the output to the specifiec out. |
| 441 | rrelu_with_noise_backward | NeuralNetwork | Beta | 5.0 | aten, KernelGen | Computes the gradient of the Randomized Leaky ReLU (RReLU) activation function with respect to
its input during backpropagation. It uses the noise tensor generated in the forward pass
to correctly apply the slope to negative input values. |
| 442 | rsqrt | Math | Stable | 1.0 | aten, pointwise | Returns a new tensor with the reciprocal of the square-root of each of the elements of input. |
| 443 | rsqrt_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of rsqrt(). |
| 444 | rsub_scalar | Math | Alpha | 5.1 | aten, KernelGen | Substracts other, scaled by alpha, from input. This is the scalar version. |
| 445 | rsub_tensor | Math | Alpha | 5.1 | aten, KernelGen | Substracts other, scaled by alpha, from input. This is the tensor version. |
| 446 | rwkv_ka_fusion | RWKV | Stable | 4.1 | fused | Merges, aligns, and enhances features from different data sources or spatial directions
using the efficient, linear-time RWKV framework. |
| 447 | rwkv_mm_sparsity | RWKV | Stable | 4.1 | fused | Optimized, lossless sparse matrix multiplication in RWKV-7 models. |
| 448 | safe_softmax | NeuralNetwork | Alpha | 5.1 | aten, IR, KernelGen | Apply a softmax function. Note this version may not be functional. |
| 449 | scaled_dot_product_attention | NeuralNetwork | Stable | 2.2 | nn.functional, Attention | Computes scaled dot product attention on query, key and value tensors,
using an optional attention mask if passed and applying dropout
if a probability greater than 0.0 is specified.
The optional scale argument can only be specified as a keyword argument. |
| 450 | scaled_dot_product_attention_backward | NeuralNetwork | Stable | 2.2 | nn.functional, Attention | The backward case for scaled_dot_product_attention. |
| 451 | scaled_dot_product_attention_forward | NeuralNetwork | Stable | 2.2 | nn.functional, Attention | The forward case for scaled_dot_product_attention. |
| 452 | scaled_mm | BLAS | Beta | 5.1 | aten | Performs a scaled matrix multiplication. The result of self @ mat2 is
multiplied by scale_a and scale_b, then an optional bias is added. |
| 453 | scaled_mm_out | BLAS | Beta | 5.1 | aten | A variant of _scaled_mm that writes the result into out. |
| 454 | scaled_softmax_backward | Reduction | Stable | 4.2 | aten | The backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA)
within Transformer models, computes the gradient of the loss with respect to the input logits,
incorporating a scaling factor to stabilize training. |
| 455 | scaled_softmax_forward | Reduction | Stable | 4.2 | aten | The backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA)
within Transformer models, computes the gradient of the loss with respect to the input logits,
incorporating a scaling factor to stabilize training. |
| 456 | scatter_add_ | Tensor | Stable | 4.2 | aten | Adds all values from the tensor src into self at the indices specified
in the index tensor in a similar fashion as scatter_().
For each value in src, it is added to an index in self which is specified
by its index in src for dimension != dim and by the corresponding value
in index for dimension = dim. |
| 457 | scatter_reduce | Tensor | Stable | 2.2 | aten | Writes all values from the tensor src into provided tensor at the indices
specified in the index tensor. For each value in src, its output index
is specified by its index in src for dimension != dim and by the corresponding value
in index for dimension = dim.
The optional reduce argument allows specification of an optional reduction operation,
which is applied to all values in the tensor src into the tensor at the indices
specified in the index. |
| 458 | scatter_reduce_ | Tensor | Stable | 3.0 | aten, KernelGen | This is the in-place version of scatter_reduce(). |
| 459 | scatter_reduce_two_ | Reduction | Alpha | 5.1 | aten, KernelGen | A specific low-level ATen operator primarily encountered during model compilation
or when using advanced backends like TensorRT or MPS. |
| 460 | scatter_src | Tensor | Stable | 2.2 | aten | Writes all values from the tensor src into provided tensor at the indices
specified in the index tensor. For each value in src, its output index
is specified by its index in src for dimension != dim and by the corresponding value
in index for dimension = dim.
The optional reduce argument allows specification of an optional reduction operation,
which is applied to all values in the tensor src into the tensor at the indices
specified in the index. |
| 461 | scatter_src_ | Tensor | Stable | 3.0 | aten | This is the in-place version of scatter_src(). |
| 462 | select_backward | NeuralNetwork | Beta | 5.1 | aten | Calculate the gradient during the backward pass in the neural network. |
| 463 | select_scatter | Tensor | Stable | 2.2 | aten | Embeds the values of the src tensor into input at the given index.
This function returns a tensor with fresh storage; it does not create a view. |
| 464 | selu | NeuralNetwork | Beta | 5.0 | aten, pointwise, nn.functional, Activation, KernelGen | Applies an element-wise activation function that induces self-normalizing properties in neural networks.
It scales the Exponential Linear Unit (ELU) to ensure activations remain close to zero mean and unit variance. |
| 465 | selu_ | NeuralNetwork | Beta | 5.0 | aten, pointwise, Activation, KernelGen | This is the in-place version of selu. |
| 466 | sgn_ | Math | Beta | 5.0 | aten, KernelGen | Computes the sign of each element in the self tensor, element-wise.
This function is an extension of sign() designed to handle complex tensors
in addition to real-valued ones. |
| 467 | sigmoid | NeuralNetwork | Stable | 2.0 | aten, pointwise | Computes the expit (also known as the logistic sigmoid function) of the elements of input. |
| 468 | sigmoid_ | NeuralNetwork | Stable | 2.2 | aten, pointwise | The in-place version of sigmoid(). |
| 469 | sigmoid_backward | NeuralNetwork | Stable | 3.0 | aten, pointwise | The backward version of sigmoid(). |
| 470 | signbit | Tensor | Beta | 5.1 | aten, pointwise | Tests if each element of input has its sign bit set or not. |
| 471 | signbit_out | Tensor | Beta | 5.1 | aten, pointwise | A variant of signbit that assigns the output to out. |
| 472 | silu | NeuralNetwork | Stable | 1.0 | aten, pointwise, nn.functional | SiLU (Sigmoid Linear Unit), a simple approximation of ReLU but
without any discontinuity of the first derivative. |
| 473 | silu_ | NeuralNetwork | Stable | 2.2 | aten, nn.functional, pointwise | The in-place version of silu(). |
| 474 | silu_and_mul | Activation | Stable | 2.0 | fused, pointwise, vLLM | A custom operator in vLLM as activation function for SwiGLU. |
| 475 | silu_and_mul_out | Activation | Stable | 2.0 | fused, pointwise, vLLM | A variant of silu_and_mul with an extra out argument. |
| 476 | silu_and_mul_with_clamp | Activation | Stable | 5.1 | fused, pointwise, vLLM | A custom operator in vLLM as activation function for SwiGLU. |
| 477 | silu_and_mul_with_clamp_out | Activation | Stable | 5.1 | fused, pointwise, vLLM | A variant of silu_and_mul_with_clamp with an extra out argument. |
| 478 | silu_backward | NeuralNetwork | Stable | 3.0 | aten, pointwise | A variant of silu() for backward case. |
| 479 | sin | Math | Stable | 2.0 | aten, pointwise | Returns a new tensor with the sine of the elements in the input tensor,
where each value in this input tensor is in radians. |
| 480 | sin_ | Math | Stable | 2.2 | aten, pointwise | The in-place version of sin(). |
| 481 | sinh_ | Math | Beta | 5.0 | aten, KernelGen | Computes the hyperbolic sine (e^x-e^{-x})/2 of each element in a tensor.
This is an in-place version. |
| 482 | skip_layer_norm | NeuralNetwork | Stable | 2.0 | fused, Transformer | An optimized operation used in Transformer models to improve performance
by combining residual connection (skip connection) addition and Layer Normalization
(LayerNorm) into a single kernel. |
| 483 | slice_backward | NeuralNetwork | Stable | 5.0 | aten | An automatic differentiation (autograd) function that computes the gradient of a tensor slicing operation
(tensor[start:end]) during backpropagation. |
| 484 | slice_scatter | Tensor | Stable | 2.2 | aten | Embeds the values of the src tensor into input at the given dimension.
This function returns a tensor with fresh storage; it does not create a view. |
| 485 | soft_margin_loss | NeuralNetwork | Beta | 5.1 | nn.functional, KernelGen | Compute the soft margin loss. |
| 486 | softmax | NeuralNetwork | Stable | 1.0 | aten, nn.functional | Apply a softmax function. |
| 487 | softmax_backward | Reduction | Stable | 3.0 | aten, nn.functional | The backward version of softmax(). |
| 488 | softmax_backward_out | Reduction | Stable | 3.0 | aten, nn.functional | A variant of softmax_backward(). |
| 489 | softmax_out | NeuralNetwork | Stable | 1.0 | aten, nn.functional | Apply a softmax function, with given out. |
| 490 | softplus | NeuralNetwork | Stable | 4.0 | aten, nn.functional, pointwise | Applies element-wise, the function Softplus. |
| 491 | softshrink | NeuralNetwork | Beta | 5.0 | aten, nn.functional, Activation, KernelGen | Applies the soft shrinkage function element-wise to an input tensor.
It is an activation function often used in signal processing and sparse representation,
such as image denoising. |
| 492 | softshrink_out | NeuralNetwork | Beta | 5.0 | aten, nn.functional, Activation, KernelGen | This is a variant of softshrink that supports an output tensor. |
| 493 | sort | Tensor | Stable | 2.2 | aten, skip_precision_check | Sorts the elements of the input tensor along a given dimension in ascending order by value. |
| 494 | sort_stable | Tensor | Stable | 3.0 | aten, skip_precision_check | Sorts the elements of the input tensor along a given dimension in ascending order by value.
This is a variant of sort() where stable is set to True to preserve the order of equivalent elements. |
| 495 | sparse_attn_triton | NeuralNetwork | Beta | 5.1 | fused | Sparse attention with attention-sink. |
| 496 | sparse_mla_fwd_interface | DSA | Beta | 5.0 | fused | A generic interface for sparse MLA (Multi-head Latent Attention) for DeepSeek v3/v3.2.
It is currently not exposed as a standalone operator for use. |
| 497 | special_i0e | Math | Beta | 5.0 | aten, pointwise, KernelGen | Computes the exponentially scaled zeroth order modified Bessel function
of the first kind for each element of input. |
| 498 | special_i0e_out | Math | Beta | 5.0 | aten, pointwise, KernelGen | A variant of special_i0e() with output saved to provided out.. |
| 499 | special_i1 | Math | Beta | 5.0 | aten, pointwise, KernelGen | Computes the modified Bessel function of the first kind of order 1 (I_1(x)) for each element
in the input tensor, designed for special mathematical functions. |
| 500 | special_i1_out | Math | Beta | 5.0 | aten, pointwise, KernelGen | A variant of special_i1 that allows the output to be assigned to another tensor. |
| 501 | sqrt | Math | Stable | 4.0 | aten, pointwise | Returns a new tensor with the square-root of the elements of input. |
| 502 | sqrt_ | Math | Stable | 4.0 | aten, pointwise | This is the in-place version of sqrt(). |
| 503 | square | Math | Beta | 5.1 | aten, pointwise | Returns a new tensor with the square of the elements of input. |
| 504 | square_ | Math | Beta | 5.1 | aten, pointwise | The inplace version of square. |
| 505 | square_out | Math | Beta | 5.1 | aten, pointwise | A variant of square that assigns the output to the provided out. |
| 506 | stack | Tensor | Stable | 2.2 | aten | Concatenates a sequence of tensors along a new dimension. |
| 507 | std | Reduction | Stable | 4.0 | aten | Calculates the standard deviation over the dimensions specified by dim.
dim can be a single dimension, list of dimensions, or None
to reduce over all dimensions. |
| 508 | sub | Math | Stable | 1.0 | aten, pointwise | Subtracts other, scaled by alpha, from the input tensor. |
| 509 | sub_ | Math | Stable | 2.2 | aten, pointwise | Subtracts other, scaled by alpha, from the input tensor.
This is the in-place version. |
| 510 | sum | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the sum of all elements in the input tensor. |
| 511 | sum_dim | LinearAlg | Stable | 2.0 | aten, Reduction | Returns the sum of each row of the input tensor in the given dimension dim.
dim is a list of dimensions, reduce over all of them. |
| 512 | sum_dim_out | LinearAlg | Stable | 3.0 | aten, Reduction | A variant of sum_dim() with the out argument. |
| 513 | sum_out | LinearAlg | Stable | 3.0 | aten, Reduction | A variant of sum() with the out argument. |
| 514 | swiglu | NeuralNetwork | Stable | 5.0 | fused, Transformer | Swish-Gated Linear Unit, a variant of GLU with the Swish activation function. |
| 515 | t_copy | Tensor | Beta | 5.0 | aten, KernelGen | Transpose a 2D tensor into a new tensor with contiguous memory layout. |
| 516 | t_copy_out | Tensor | Beta | 5.0 | aten, KernelGen | A variant of t_copy() that allows the output to be assigned to the out tensor. |
| 517 | tan | NeuralNetwork | Stable | 4.1 | aten, pointwise | Returns a new tensor with the tangent of the elements in the input tensor,
where each value in this input tensor is in radians. |
| 518 | tan_ | | Stable | 4.1 | aten, pointwise | This is the in-place version of tan(). |
| 519 | tanh | Math | Stable | 2.0 | aten, pointwise | Returns a new tensor with the hyperbolic tangent of the elements of input. |
| 520 | tanh_ | Math | Stable | 2.2 | aten, pointwise | This is the in-place version of tanh(). |
| 521 | tanh_backward | Math | Stable | 3.0 | aten, pointwise | This is the backward case for tanh(). |
| 522 | threshold | NeuralNetwork | Stable | 3.0 | aten, nn.functional, pointwise | Apply a threshold to each element of the input Tensor. |
| 523 | threshold_backward | NeuralNetwork | Stable | 3.0 | aten, nn.functional, pointwise | This is the backward version for threshold. |
| 524 | tile | Tensor | Stable | 2.1 | aten | Constructs a tensor by repeating the elements of input.
The dims argument specifies the number of repetitions in each dimension. |
| 525 | to_copy | Tensor | Beta | 4.1 | aten, pointwise, skip_precision_check | |
| 526 | top_k_per_row_prefill | NeuralNetwork | Beta | 5.1 | fused | Triton top-K per row for DeepSeek V4 sparse attention prefill phase.
Replaces vLLM persistent_topk CUDA kernel with in-place masking + adaptive topk selection. |
| 527 | top_k_per_row_decode | NeuralNetwork | Beta | 5.1 | fused, vLLM, DeepSeekV4, KernelGen | Triton top-K per row for DeepSeek V4 decode-phase token selection.
Radix-select based approach with three dispatch tiers for different vocab sizes. |
| 528 | topk | Tensor | Stable | 2.1 | aten, skip_precision_check | Returns the k largest elements of the given input tensor along a given dimension.
If dim is not given, the last dimension of the input is chosen.
If largest is False then the k smallest elements are returned. |
| 529 | topk_softmax | MoE | Stable | 4.0 | fused, vLLM | Selects the k most likely next-token candicates, sets all others to zero,
and renormalize the prbabilities of these top candidates. |
| 530 | topk_softplus_sqrt | MoE | Beta | 5.1 | fused, KernelGen, vLLM | Fused softplus + sqrt + top-k selection and optional renormalization
for MoE gating in models like DeepSeek-V3/V4. |
| 531 | trace | Reduction | Stable | 4.0 | aten | Returns the sum of the elements of the diagonal of the input 2-D matrix. |
| 532 | tril | BLAS | Beta | 5.0 | aten, KernelGen | Returns the lower triangular part of an input matrix (or a batch of matrices) and
sets all other elements to zero. |
| 533 | tril_ | BLAS | Beta | 5.1 | aten | The in-place version of tril(). |
| 534 | tril_out | BLAS | Beta | 5.1 | aten | A variant of tril() that explicitly assigns the output to the out parameter. |
| 535 | triton_lighting_indexer_k_tiled_interface | NeuralNetwork | Alpha | 5.1 | fused, DSA | Part of FP8 MQA framework. It is currently not exposed as an operator for use. |
| 536 | triu | BLAS | Stable | 1.0 | aten | Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices input,
the other elements of the result tensor out are set to 0. |
| 537 | triu_ | NeuralNetwork | Stable | 5.0 | aten | The in-place version of triu(). |
| 538 | trunc_divide | Math | Stable | 2.1 | aten | The div function with rounding_mode set to trunc. |
| 539 | trunc_divide_ | Math | Stable | 2.1 | aten | The in-place version of trunc_divide. |
| 540 | unfold_backward | NeuralNetwork | Stable | 5.0 | aten, nn.functional | An operator for calculating the gradient of the unfold operation during backpropagation.
It takes the gradient of the unfolded output and accumulates it back into
the original input shape, reversing sliding local block extraction and resolving overlaps. |
| 541 | uniform_ | Distribution | Stable | 2.1 | aten, skip_precision_check | Fills self tensor with numbers sampled from the continuous uniform distribution. |
| 542 | unique2 | Tensor | Stable | 2.1 | aten | Returns the unique elements of the input tensor. This is an internal PyTorch function. |
| 543 | unique_consecutive | Distribution | Beta | 5.1 | aten, KernelGen | Eliminates all but the first element from every consecutive group of equivalent elements. |
| 544 | unpack_seq_triton | NeuralNetwork | Beta | 5.1 | fused, vLLM, DeepSeekV4 | Unpack a packed sequence tensor back to its original variable-length form. |
| 545 | upsample_bicubic2d | NeuralNetwork | Stable | 5.0 | aten, Reduction | A variant of upsample() that has mode set to bicubic. |
| 546 | upsample_bicubic2d_aa | NeuralNetwork | Stable | 2.2 | aten, Reduction | A variant of upsample() that has mode set to bicubic. |
| 547 | upsample_bicubic2d_aa_backward | NeuralNetwork | Stable | 5.0 | aten, Reduction | A backward case for _upsample_bicubic2d_aa(). |
| 548 | upsample_linear1d | NeuralNetwork | Alpha | 5.0 | aten | Upsamples the input, using linear mode.
The input has to be 3 dimensional, and the output_size is an optional tuple of ints. |
| 549 | upsample_nearest1d | NeuralNetwork | Stable | 5.0 | aten | Upsamples the input, using nearest neighbours' pixel values.
The input has to be 3 dimensional, and the output_size is an optional tuple of ints. |
| 550 | upsample_nearest2d | NeuralNetwork | Stable | 2.2 | aten | Upsamples the input, using nearest neighbours' pixel values. The input has to be 4 dimensional.
The scales can be provided with scales_h and scales_w. |
| 551 | upsample_nearest3d | NeuralNetwork | Stable | 5.0 | aten | Performs 3D nearest-neighbor interpolation to increase the spatial size of volumetric data,
such as 5D tensors. It scales up inputs by copying values from the nearest pixel/voxel,
without calculating new values through linear interpolation. |
| 552 | upsample_nearest_exact1d | NeuralNetwork | Beta | 5.0 | aten, Reduction | Increases the length of a 1D tensor using nearest-neighbor interpolation,
ensuring the output aligns with library-standard algorithms like PIL. |
| 553 | var | Tensor | Beta | 5.1 | aten, KernelGen | Calculates the variance over all dimensions. |
| 554 | var_correction | Tensor | Beta | 5.1 | aten | A variant of the var() operator, with an optional correction for specifying
difference between the sample size and sample degrees of freedom. |
| 555 | var_dim | Tensor | Beta | 5.1 | aten | Calculates the variance over the dimensions specified by dim. |
| 556 | var_mean | LinearAlg | Stable | 2.0 | aten, Reduction | Calculates the variance and mean over the dimensions specified by dim. dim can be a single dimension,
list of dimensions, or None to reduce over all dimensions. |
| 557 | vdot | BLAS | Stable | 2.2 | aten | Computes the dot product of two 1D vectors along a dimension. |
| 558 | vector_norm | LinearAlg
NeuralNetwork | Stable | 2.0 | aten, Reduction | Computes a vector norm. |
| 559 | vstack | Tensor | Stable | 2.2 | aten | Stack tensors in sequence vertically (row wise). |
| 560 | w8a8_block_fp8_matmul | BLAS | Alpha | 5.1 | vLLM | Performs matrix multiplication with block-wise quantization. |
| 561 | weight_norm | NeuralNetwork | Stable | 3.0 | fused | Reparameterizes a module's weight tensor by decoupling its magnitude (g)
from its direction (v). It is a hook that compute the actual weight before
each forward pass. |
| 562 | weight_norm_interface | NeuralNetwork | Stable | 2.2 | aten, fused | Apply weight normalization to neural network layers, decoupling the magnitued
of a weight tensor from its direction. It is used to stabilize training, particularly
for models with small batch sizes. |
| 563 | weight_norm_interface_backward | NeuralNetwork | Stable | 3.0 | aten, fused | Computes the gradients for weight normalization during the backward pass.
It calculates the necessary derivatives for updating both the magnitude (g)
and direction (v) parameters of a weight-normalized layer, based on gradients
received from the previous operation. |
| 564 | where_self | Tensor | Stable | 2.1 | aten, pointwise | Returns a LongTensor. This operation is identical to torch.nonzero(condition, as_tuple=True). |
| 565 | where_self_out | Tensor | Stable | 2.2 | aten, pointwise | This is a variant of where_self() with an argument out. |
| 566 | zero | Tensor | Beta | 5.0 | aten, KernelGen | Fills tensor with zeros. |
| 567 | zero_ | Tensor | Stable | 5.0 | aten | Fills self tensor with zeros. |
| 568 | zero_out | Tensor | Beta | 5.0 | aten, KernelGen | Fills tensor with zeros but assign the output to the out tensor. |
| 569 | zeros | Tensor | Stable | 2.1 | aten, skip_precision_check | Returns a tensor filled with the scalar value 0, with the shape defined by
the variable argument size. |
| 570 | zeros_like | Tensor | Stable | 2.1 | aten | Returns a tensor filled with the scalar value 0, with the same size as input. |
| 571 | _to_copy | Tensor | Alpha | 5.1 | skip_precision_check | Layout/memory operation (to_copy). |
| 572 | view | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout operation (view). |
| 573 | reshape | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout operation (reshape). |
| 574 | expand | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout operation (expand). |
| 575 | permute | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout operation (permute). |
| 576 | transpose | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout operation (transpose). |
| 577 | clone | Tensor | Alpha | 5.1 | skip_precision_check | Pure layout/memory operation (clone). |
| 578 | to | Tensor | Alpha | 5.1 | skip_precision_check | Device/dtype cast operation (to). |
| 579 | empty | Tensor | Alpha | 5.1 | skip_precision_check | Tensor factory operation (empty). |
| 580 | normal_ | Tensor | Alpha | 5.1 | skip_precision_check | Random sampling operator (normal_). |
| 581 | random_ | Tensor | Alpha | 5.1 | skip_precision_check | Random sampling operator (random_). |
| 582 | argsort | Tensor | Alpha | 5.1 | skip_precision_check, KernelGen | Sorting/selection operator (argsort). |