Operator List#

No.	Name	Kind	Stage	Since	Labels	Description
1	`abs`	Math	Stable	1.0	aten, pointwise	Computes the absolute value of each element in `input`. This is a simple wrapper of the existing torch `abs` operator.
2	`abs_`	Math	Stable	2.2	aten, pointwise	The in-place version of `abs()`, which is a simple wrapper of the Torch `abs` operator.
3	`absolute`	Math	Beta	5.0	aaten, KernelGen	This is an alias for `abs()` with the low-level operations implemented by invoking low-level Torch operators.
4	`acos`	Math	Stable	5.0	aten, pointwise	Returns a new tensor with the arccosine (in radians) of each element in `input`.
5	`act_quant_triton`	Quantization	Beta	5.1	fused	This is a fused operator.
6	`adaptive_avg_pool3d`	NeuralNetwork	Alpha	5.0	aten, nn.functional, KernelGen	Apply a 3D adaptive average pooling over an input signal composed of several input planes.
7	`adaptive_avg_pool3d_out`	NeuralNetwork	Alpha	5.0	aten, nn.functional, KernelGen	A variant of `_adaptive_avg_pool3d` that assigns the output to the `out` tensor.
8	`add`	Math	Stable	1.0	aten, pointwise	Add a scalar or tensor to `self` tensor. If both `alpha` and `other` are specified, each element of `other` is scaled by `alpha` before being used.
9	`add_`	Math	Stable	2.2	aten, pointwise	The in-place version of `add()`.
10	`add_rms_norm`	NeuralNetwork	Alpha	5.1	aten, KernelGen, Normalization	Add two inputs element-wise and apply Root Mean Square Layer Normalization.
11	`addcdiv`	LinearAlg	Stable	4.0	aten, pointwise	Performs the element-wise division of `tensor1` by `tensor2`, multiplies the result by the scalar `value` and adds it to `input`.
12	`addcdiv_out`	LinearAlg	Stable	5.1	aten, pointwise, KernelGen	A variant of `addcdiv()` that assigns the output to the given `out` parameter..
13	`addcmul`	LinearAlg	Stable	4.0	aten, pointwise	Performs the element-wise multiplication of `tensor1` by `tensor2`, multiplies the result by the scalar `value` and adds it to `input`.
14	`addcmul_out`	LinearAlg	Alpha	5.0	aten, pointwise, KernelGen	A variant of `addcmul` that allows the output to be assigned to `out`.
15	`addmm`	BLAS	Stable	1.0	aten	Performs a matrix multiplication of the matrices `mat1` and `mat2`. The matrix `input` is added to the final result.
16	`addmm_dtype`	BLAS	Beta	5.1	aten	A variant of `addmm` that allows the dtype of the output tensor to be specified. This is supported only on CUDA and for `torch.float32` given `torch.float16` or `torch.bfloat16` input dtypes.
17	`addmm_dtype_out`	BLAS	Beta	5.1	aten	A variant of `addmm_dtype()` that allows the output to be saved to the provided `out` parameter.
18	`addmm_out`	BLAS	Stable	4.0	aten	A variant of `addmm` that assigns to the output to the provided `out` parameter.
19	`addmv`	BLAS	Stable	4.0	aten	Performs a matrix-vector product of the matrix `mat` and the vector `vec`. The vector `input` is added to the final result.
20	`addmv_out`	BLAS	Stable	4.0	aten	Performs a matrix-vector product of the matrix `mat` and the vector `vec`. The vector `input` is added to the final result.
21	`addr`	BLAS	Stable	4.0	aten	Performs the outer-product of vectors `vec1` and `vec2` and adds it to the matrix `input`.
22	`affine_grid_generator`	Tensor	Alpha	5.1	aten, KernelGen, pointwise	Generates a 2D or 3D flow field (sampling grid), given a batch of affine matrices theta.
23	`alias_copy`	Tensor	Beta	5.0	aten, KernelGen	Creates a new tensor that shares the same storage data as the original tensor, but without preserving the original tensor's metadata (like shape or strides) in a way that links future mutations.
24	`alias_copy_out`	Tensor	Beta	5.0	aten, KernelGen	A variant of `alias_copy()` that assigns the output to the `out` tensor.
25	`all`	Math	Stable	2.0	aten, Reduction	Tests if all elements in input evaluate to True.
26	`all_dim`	Math	Stable	2.0	aten, Reduction	For each row of `input` in the given dimension `dim`, returns True if all elements in the row evaluate to True and False otherwise.
27	`all_dims`	Math	Stable	2.0	aten, Reduction	A variant of `all`.
28	`allclose`	Math	Stable	2.1	aten	This function checks if `input` and `other` satisfy a condition specified via `atol` and `rtol` elementwise, for all elements of `input` and `other`.
29	`amax`	LinearAlg	Stable	2.0	aten, Reduction	Returns the maximum value of each slice of the `input` tensor in the given dimension(s) `dim`.
30	`aminmax`	Tensor	Beta	5.1	aten	Computes the minimum and maximum values of the `input` tensor.
31	`angle`	Math	Stable	3.0	aten, pointwise	Computes the element-wise angle (in radians) of the given `input` tensor.
32	`any`	Math	Stable	2.0	aten, Reduction	Tests if any element in `input` evaluates to True.
33	`any_dim`	Math	Stable	2.0	aten, Reduction	For each row of `input` in the given dimension `dim`, returns True if any element in the row evaluate to True and False otherwise.
34	`any_dims`	Math	Stable	2.0	aten, Reduction	For each row of `input` in the given dimensions in `dims`, returns True if any element in the row evaluate to True and False otherwise. The `dims` contains tuple of ints indicating the dimensions to reduce.
35	`apply_repetition_penalties`	NeuralNetwork	Stable	5.0	fused, vLLM	Modifies logit tensors in place to penalize tokens that have already appeared in the generated sequence.
36	`apply_rotary_pos_emb`	NeuralNetwork	Stable	2.0	fused	A method to incorporate positional information into the Transformer architecture. Rotary Positional Embedding (RoPE) applies position-dependent rotation to the query (Q) and key (K) vectors before computing the attention score.
37	`arange`	Tensor	Stable	2.1	aten	Returns a 1-D tensor of size `ceiling((end−start)/step)` with values from the interval `[start, end)` taken with common difference `step` beginning from start.
38	`arange_start`	tensor	Stable	2.1	aten	A variant of `arange`, with `start` and/or `step` specified.
39	`arange_start_step`	tensor	Stable	2.1	aten	A variant of `arange`, with `start` and/or `step` specified.
40	`arcsinh`	Math	Beta	5.0	aten, KernelGen	Performs an element-wise inverse hyperbolic sine computation on the given tensor.
41	`arcsinh_`	Math	Beta	5.0	aten, KernelGen	The in-place version of `arcsinh()`.
42	`arcsinh_out`	Math	Beta	5.0	aten, KernelGen	A variant of `arcsinh` that allows the output to be assigned to the `out` tensor.
43	`arctanh_`	Math	Beta	5.0	aten, KernelGen	Computes the element-wise inverse hyperbolic tangent of a given input tensor. This is an in-place version.
44	`argmax`	LinearAlg	Stable	2.0	aten, Reduction	Returns the indices of the maximum value of all elements in the `input` tensor.
45	`argmin`	LinearAlg	Stable	2.2	aten, Reduction	Returns the indices of the minimum value(s) of the flattened tensor or along a dimension.
46	`asinh`	Math	Beta	5.1	aten, KernelGen	Returns a new tensor with the inverse hyperbolic sine of the elements of input.
47	`asinh_`	Math	Beta	5.0	aten, KernelGen	Computes the inverse hyperbolic sine for each element of a tensor in-place.
48	`as_strided_copy`	Tensor	Beta	5.1	aten, KernelGen	Creates a contiguous copy of an `as_strided` view of the input tensor.
49	`as_strided_copy_out`	Tensor	Beta	5.1	aten, KernelGen	A variant of `as_strided_copy()` that assigns the output to the `out` tensor.
50	`assert_async`	Tensor	Stable	5.1	utility	A utility used to perform data-dependent assertions on GPU tensors without triggering an immediate, performance-heavy GPU-to-CPU synchronization.
51	`atan`	Math	Stable	4.0	aten, pointwise	Returns a new tensor with the arctangent of the elements (in radians) in the `input` tensor.
52	`atan_`	Math	Stable	4.0	aten, pointwise	The in-place version of `atan()`.
53	`atan2`	Math	Stable	5.1	aten, pointwise	Computes the element-wise arc tangent of `input/other(y/x)`, returning angles in radians between `-PI` and `PI`.
54	`atan2_out`	Math	Beta	5.1	aten, pointwise	A variant of `atan2` that allows the output to be saved into `out`.
55	`avg_pool2d`	NeuralNetwork	Stable	4.1	nn.functional	Applies 2D average-pooling operation in `kH \mul kW` regions by step size `sH \mul sW` steps. The number of output features is equal to the number of input planes. This is for the forward case.
56	`avg_pool2d_backward`	NeuralNetwork	Stable	4.1	aten	The backward version of `avg_pool2d()`.
57	`avg_pool3d`	NeuralNetwork	Beta	5.1	aten	Applies 3D average-pooling operation in `kD \times kH \times kW` regions by step size `sD \times sH \times sW` steps.
58	`avg_pool3d_backward`	NeuralNetwork	Alpha	5.1	aten	This is the backward version of `avg_pool3d()`.
59	`baddbmm`	BLAS	Stable	4.1	aten	Performs a batch matrix-matrix product of matrices in `batch1` and `batch2`. `input` is added to the final result. `batch1` and `batch2` must be 3-D tensors each containing the same number of matrices.
60	`baddbmm.out`	BLAS	Beta	5.1	aten	This is a variant of `baddbmm()`.
61	`batch_norm`	NeuralNetwork	Stable	3.0	aten	An internal operator used for implementing the `BatchNorm` functionality.
62	`batch_norm_backward`	NeuralNetwork	Stable	3.0	aten	The backward version of `batch_norm()`.
63	`bernoulli_`	Tensor	Beta	5.1	aten, skip_precision_check, KernelGen	Draws binary random numbers (0 or 1) from a Bernoulli distribution.
64	`bincount`	Reduction	Stable	5.0	aten, pointwise, KernelGen	Count the frequency of each value in an array of non-negative integers.
65	`bitwise_and_scalar`	Math	Stable	2.0	aten, pointwise	Computes the bitwise AND of `input` and `other` scalar.
66	`bitwise_and_scalar_`	Math	Stable	2.2	aten, pointwise	The in-place, scalar version of `bitwise_and()`.
67	`bitwise_and_scalar_tensor`	Math	Stable	2.0	aten, pointwise	A variant of `bitwise_and()`.
68	`bitwise_and_tensor`	Math	Stable	2.0	aten, pointwise	The Tensor method version of `bitwise_and()`.
69	`bitwise_and_tensor_`	Math	Stable	2.2	aten, pointwise	The in-place, Tensor method version of `bitwise_and()`.
70	`bitwise_left_shift`	Math	Stable	4.0	aten, pointwise	Computes the left arithmetic shift of `input` by `other` bits.
71	`bitwise_not`	Math	Stable	2.0	aten, pointwise	Computes the bitwise NOT of the given `input` tensor.
72	`bitwise_not_`	Math	Stable	2.2	aten, pointwise	The in-place version of `bitwise_not()`.
73	`bitwise_or_scalar`	Math	Stable	2.0	aten, pointwise	Computes the bitwise OR of scalars `input` and `other`.
74	`bitwise_or_scalar_`	Math	Stable	2.2	aten	The in-place version of `bitwise_or_scalar`.
75	`bitwise_or_scalar_tensor`	Math	Stable	2.0	aten, pointwise	Computes the bitwise OR of `input` and `other`.
76	`bitwise_or_tensor`	Math	Stable	2.0	aten, pointwise	Computes the bitwise OR of `input` and `other`, this is the Tensor method variant.
77	`bitwise_or_tensor_`	Math	Stable	2.2	aten, pointwise	The in-place version of `bitwise_or_tensor()`.
78	`bitwise_right_shift`	Math	Stable	4.0	aten, pointwise	Computes the right arithmetic shift of `input` by `other` bits.
79	`bmm`	BLAS	Stable	1.0	aten	Performs a batch matrix-matrix product of matrices stored in `input` and `mat2`.
80	`bmm_out`	BLAS	Stable	5.0	aten	Performs a batch matrix-matrix product of matrices stored in `input` and `mat2`. This is a variant of `bmm` with `out` specified.
81	`bucket_sort_topk`	NeuralNetwork	Beta	5.1	fused, DSA	A wrapper of the TLE version and the Triton version bucket-sort topk operation.
82	`cat`	Tensor	Stable	2.2	aten	Concatenates the given sequence of tensors in tensors in the given dimension.
83	`cat_out`	Tensor	Stable	2.2	aten	A variant of `cat` that assigns the result to the provided `out` parameter.
84	`cauchy`	Distribution	Beta	5.1	aten	Draws random numbers from a Cauchy distribution.
85	`cauchy_`	Distribution	Beta	5.1	aten	Fills the tensor with numbers drawn from the Cauchy distribution.
86	`ceil`	Math	Stable	5.0	aten, pointwise	Returns a new tensor with the ceil of the elements of `input`, the smallest integer greater than or equal to each element.
87	`ceil_`	Math	Stable	5.0	aten, pointwise	The in-place version of `ceil()`.
88	`ceil_out`	Math	Stable	5.0	aten, pointwise	A variant of `ceil()` with `out` specified.
89	`celu`	NeuralNetwork	Stable	4.0	aten, nn.functional, pointwise	Applies the quantized CELU (Continuously Differentiable Exponential Linear Unit) activation function element-wise.
90	`celu_`	NeuralNetwork	Stable	4.0	aten, nn.functional, pointwise	The in-place version of `celu()`.
91	`chunk_gated_delta_rule_fwd`	Attention	Alpha	5.0	fused, FLA	The forward case for `ChunkGatedDeltaRuleFunction` with Flash Linear Attention (FLA).
92	`clamp`	Math	Stable	2.0	aten, pointwise	Clamps all elements in `input` into the range `[min, max]`.
93	`clamp_`	Math	Stable	2.2	aten, pointwise	The in-place version of `clamp()`.
94	`clamp_max`	Math	Alpha	5.1	aten, KernelGen, pointwise	Clamps all elements in `input` to be smaller or equal `max`.
95	`clamp_max_`	Math	Alpha	5.1	aten, KernelGen, pointwise	The in-place version of `clamp_max()`.
96	`clamp_min`	Math	Stable	4.0	aten, pointwise	A variant of `clamp()` with `min` set to `min`.
97	`clamp_min_`	Math	Stable	4.0	aten, pointwise	The in-place version of `clamp_()`.
98	`clamp_tensor`	Math	Stable	2.0	aten, pointwise	The tensor version of `clamp()`.
99	`clamp_tensor_`	Math	Stable	2.2	aten, pointwise	The in-place, tensor version of `clamp()`.
100	`clip`	Math	Beta	5.1	aten, KernelGen	This is identical to `clamp()`.
101	`clip_`	Math	Beta	5.1	aten, KernelGen	This is identical to `clamp_()`.
102	`col2im`	Math	Alpha	5.1	aten, KernelGen	Rearranges column blocks back into a multidimensional tensor (inverse of im2col).
103	`combine_topk_swa_indices`	NeuralNetwork	Beta	5.1	fused, Attention, vLLM, DeepSeekV4	Combines compressed top-k sparse attention indices with sliding-window attention indices for DeepSeekV4 attention.
104	`concat_and_cache_mla`	Attention	Beta	3.0	fused, MLA	Writes the latent and RoPE value into KV cache for Multi-head Latent Attention forward case.
105	`compute_global_topk_indices_and_lens`	NeuralNetwork	Beta	5.1	fused, Attention, vLLM, DeepSeekV4	Converts local top-k sparse attention indices to global KV-cache indices and computes valid top-k lengths for DeepSeekV4 attention.
106	`concatenate`	Tensor	Alpha	5.1	aten, KernelGen	An alias of `cat()`.
107	`conj_physical`	LinearAlg	Beta	5.1	aten	Computes the element-wise conjugate of the given `input` tensor. If `input` has a non-complex dtype, this function just returns `input`.
108	`constant_pad_nd`	NeuralNetwork	Stable	2.2	aten, IR	Pads the input tensor boundaries with a constant value. This is an IR representation, not a public API.
109	`contiguous`	Tensor	Removed	4.1	aten, skip_precision_check	Returns a contiguous in memory tensor containing the same data as `self` tensor.
110	`conv1d`	Convolution	Stable	4.2	aten	Applies a 1D convolution over a quantized 1D input composed of several input planes.
111	`conv1d_padding`	Convolution	Stable	4.2	aten	Applies a 1D convolution over a quantized 1D input composed of several input planes.
112	`conv2d`	Convolution	Stable	4.2	aten	Applies a 2D convolution over a quantized 2D input composed of several input planes.
113	`conv2d_padding`	Convolution	Stable	4.2	aten	Applies a 2D convolution over a quantized 2D input composed of several input planes.
114	`conv3d`	Convolution	Stable	4.2	aten	Applies a 3D convolution over a quantized 3D input composed of several input planes.
115	`conv3d_padding`	Convolution	Stable	4.2	aten	Applies a 3D convolution over a quantized 3D input composed of several input planes.
116	`conv_depthwise2d`	NeuralNetwork	Beta	2.2	aten, Convolution, NoCPU	A depthwise convolution for the conv2d neural network function.
117	`conv_transpose1d`	Convolution	Beta	5.1	aten, KernelGen	Applies a 1D transposed convolution operator over an input image composed of several input planes.
118	`conv_transpose2d`	Convolution	Alpha	5.1	aten, KernelGen	Applies a 2D transposed convolution operator over an input image composed of several input planes.
119	`copy`	Tensor	Beta	4.2	aten, pointwise	As a wrapper of `copy_`, this operator copies elements from `src` to `out` using given template for shapes.
120	`copy_`	Tensor	Stable	4.1	aten, pointwise, skip_precision_check	Copies the elements from `src` into `self` tensor and returns `self`.
121	`copysign`	Tensor	Beta	5.1	aten, pointwise	Create a new floating-point tensor with the magnitude of input and the sign of other, elementwise.
122	`copysign_out`	Tensor	Beta	5.1	aten, pointwise	A variant of `copysign` that allows the output to be saved into `out`.
123	`cos`	Math	Stable	2.0	aten, pointwise	Returns a new tensor with the cosine of the elements of `input` given in radians.
124	`cos_`	Math	Stable	2.2	aten, pointwise	The in-place version of `cos()`.
125	`cosh`	Math	Stable	5.1	aten, pointwise	Returns a new tensor with the hyperbolic cosine of the elements of `input`.
126	`cosh_`	Math	Stable	5.1	aten, pointwise	This is the in-place version of `cosh()`.
127	`cosh_out`	Math	Stable	5.1	aten, pointwise	This is an variant of `cosh()` that assigns the output to the provided `out`.
128	`count_nonzero`	Tensor	Stable	2.2	aten, Reduction	Counts the number of non-zero values in the tensor `input` along the given `dim`. If no `dim` is specified then all non-zeros in the tensor are counted.
129	`cp_gather_indexer_k_quant_cache`	Quantization	Beta	5.1	fused, vLLM	This is a fused operator that gathers FP8 K cache values and scales.
130	`cross_entropy_loss`	NeuralNetwork	Removed	3.0	fused, Reduction	Computes the cross entropy loss between input logits and target.
131	`cudnn_convolution`	NeuralNetwork	Beta	5.1	aten, KernelGen	A wrapper for cuDNN convolution backend.
132	`cummax`	Math	Stable	3.0	aten, Reduction	Returns a named tuple `(values, indices)` where `values` is the cumulative maximum of elements of `input` in the dimension `dim`. And `indices` is the index location of each maximum value found in the dimension `dim`.
133	`cummin`	Math	Stable	2.2	aten, Reduction	Returns a named tuple `(values, indices)` where values is the cumulative minimum of elements of `input` in the dimension `dim`. And `indices` is the index location of each minimum value found in the dimension `dim`.
134	`cumprod`	Math	Beta	5.1	aten, Reduction	Returns the cumulative product of elements of `input` in the dimension `dim`.
135	`cumprod_`	Math	Beta	5.1	aten, Reduction	This is the in-place version of `cumprod()`.
136	`cumsum`	LinearAlg	Stable	1.0	aten
137	`cumsum_out`	Reduction	Stable	3.0	aten
138	`cutlass_scaled_mm`	LinearAlg	Beta	5.0	fused, vLLM
139	`dequantize_and_gather_k_cache`	NeuralNetwork	Beta	5.1	fused, Attention, vLLM, DeepSeekV4	Dequantizes FP8 K-cache entries and gathers them into a BF16 tensor for DeepSeekV4 attention.
140	`dgeglu`	NeuralNetwork	Stable	5.0	fused, Transformer	Gaussian Error Gated Linear Unit with GELU activation instead of sigmoid function. This is for the backward case.
141	`diag`	Tensor	Stable	2.2	aten	If `input` is a vector (1-D tensor), then returns a 2-D square tensor with the elements of `input` as the diagonal. If `input` is a matrix (2-D tensor), then returns a 1-D tensor with the diagonal elements of `input`.
142	`diag_embed`	Tensor	Stable	2.2	aten, pointwise	Creates a tensor whose diagonals of certain 2D planes (specified by `dim1` and `dim2`) are filled by `input`. To facilitate creating batched diagonal matrices, the 2D planes formed by the last two dimensions of the returned tensor are chosen by default.
143	`diagonal_backward`	LinearAlg	Stable	2.2	aten, pointwise	A diagonal operation returns a partial view of `input` with the its diagonal elements with respect to `dim1` and `dim2` appended as a dimension at the end of the shape. This is the backward case for `diagonal()`.
144	`diff`	Math	Beta	5.1	aten, KernelGen	Computes the n-th forward difference along the given dimension.
145	`digamma_`	Math	Beta	5.0	aten, KernelGen	Computes the in-place digamma function, which is the logarithmic derivative of the Gamma function.
146	`dispatch_fused_moe_kernel`	MoE	Beta	5.0	fused, Activation, vLLM	Accelerates neural network training by combining token routing (dispatch/all-to-all communication), expert computation (GEMM), and result aggregation into a single GPU kernel.
147	`div_scalar`	Math	Stable	2.1	aten	This is the scalar version of `div()`.
148	`div_scalar_`	Math	Stable	2.1	aten	This is the in-place version of `div_scalar()`.
149	`div_tensor`	Math	Stable	2.1	aten, pointwise	Divides each element of the input `input` by the corresponding element of `other`. Note that `torch.divide()` is an alias of `torch.div()` and `torch.true_divide()` is an alias of `torch.div()` with `rounding_mode=None`.
150	`div_tensor_`	Math	Stable	2.1	aten	This is the in-place version of `div_tensor()`.
151	`div_out`	Math	Stable	4.2	aten	This is an variant of `div()` with an `out` argument.
152	`div_scalar_mode`	Math	Stable	1.0	aten, pointwise	Divides each element of the `input` by the corresponding element of `other`. An optional `rounding_mode` can be specified.
153	`div_scalar_mode_`	Math	Stable	2.2	aten, pointwise	The in-place version of `div_mode()`.
154	`div_tensor_mode`	Math	Stable	1.0	aten, pointwise	Divides each element of the `input` by the corresponding element of `other`. An optional `rounding_mode` can be specified.
155	`div_tensor_mode_`	Math	Stable	2.2	aten, pointwise	The in-place version of `div_mode()`.
156	`dot`	BLAS	Stable	3.0	aten	Computes the dot product of two 1D tensors.
157	`dreglu`	NeuralNetwork	Alpha	4.2	fused, Transformer	Rectified Gated Linear Unit is a variant of GLU that uses ReLU instead of the sigmoid function for gating. This is the backward case.
158	`dropout`	NeuralNetwork	Stable	1.0	aten, nn.functional	An internal IR for implementing `torch.nn.functional.dropout`.
159	`dropout_backward`	NeuralNetwork	Stable	3.0	aten, nn.functional	The backward case of `dropout()`.
160	`dswiglu`	NeuralNetwork	Alpha	5.0	fused, Transformer	Swish-Gated Linear Unit, a variant of GLU with the Swish activation function. This is for the backward case.
161	`dunder_ior_scalar`	Math	Beta	5.1	aten, KernelGen	The scalar version of `dunder_ior_tensor`.
162	`dunder_ior_tensor`	Math	Beta	5.1	aten, KernelGen	The in-place version of bitwise or operation for tensor and scalar.
163	`dunder_or_scalar`	Math	Beta	5.1	aten, KernelGen	The scalar version of `dunder_or_tensor`.
164	`dunder_or_tensor`	Math	Beta	5.1	aten, KernelGen	The in-place version of bitwise or operation for tensor and scalar.
165	`einsum`	Reduction	Alpha	5.1	aten, KernelGen	Sums the product of the elements of the input `operands` along dimensions specified using a notation based on the Einstein summation convention.
166	`elu`	NeuralNetwork	Stable	2.2	aten, nn.functional, pointwise	Apply the Exponential Linear Unit (ELU) function element-wise.
167	`elu_`	NeuralNetwork	Stable	4.0	aten, pointwise	The in-place version of `elu()`.
168	`elu_backward`	NeuralNetwork	Stable	4.0	aten, pointwise	The backward version of `elu()`.
169	`embedding`	NeuralNetwork	Stable	2.1	aten, nn.functional	Generate a simple lookup table that looks up embeddings in a fixed dictionary and size. Note that the parameter sequence differs from `torch.nn.functional.embedding`.
170	`embedding_backward`	NeuralNetwork	Stable	3.0	aten, NoCPU	The backward version of `embedding()`.
171	`embedding_dense_backward`	NeuralNetwork	Stable	5.0	aten	Calculates the gradient of the weight matrix for a dense embedding layer during backpropagation.
172	`eq`	Math	Stable	2.0	aten, pointwise	Computes element-wise equality.
173	`eq_scalar`	Math	Stable	2.0	aten, pointwise	Computes equality between scalars.
174	`equal`	Math	Stable	5.0	aten, Reduction	Returns True if two tensors have the same size and elements, False otherwise.
175	`euclidean_dist`	Math	Alpha	5.1	aten, KernelGen, pointwise	Computes pairwise Euclidean distances between rows of two 2D tensors.
176	`erf`	Science	Stable	2.1	aten	Computes the error function of `input`.
177	`erf_`	Science	Stable	2.2	aten, pointwise	The in-place version of `erf()`.
178	`exp`	Math	Stable	1.0	aten, pointwise	Returns a new tensor with the exponential of the elements of the input tensor `input`.
179	`exp_`	Math	Stable	2.2	aten, pointwise	The in-place version of `exp()`.
180	`exp_out`	Math	Stable	4.1	aten, pointwise	A variant of `exp2()`, with `out` specified.
181	`exp2`	Math	Stable	4.0	aten, pointwise	Computes the base two exponential function of `input`.
182	`exp2_`	Math	Stable	4.0	aten, pointwise	The in-place version of `exp2()`.
183	`expm1`	Math	Beta	5.1	aten	Computes the exponential of the elements minus 1 of `input`.
184	`expm1_`	Math	Beta	5.1	aten	The inplace version of `expm1`.
185	`expm1_out`	Math	Beta	5.1	aten	A variant of `expm1` that saves the output to the specified `out`.
186	`exponential_`	Distribution	Stable	2.1	aten, skip_precision_check	Fills `self` tensor with elements drawn from a PDF (probability density function).
187	`eye`	LinearAlg	Stable	3.0	aten, Reduction	Returns a 2-D tensor with ones on the diagonal and zeros elsewhere.
188	`eye_m`	LinearAlg	Stable	3.0	aten, Reduction	Triton-based implementation of `torch.eye_m(n, m)`, using 2D tiles to split the matrix into blocks.
189	`feature_dropout`	NeuralNetwork	Alpha	5.1	aten, KernelGen	Applies feature dropout to the `input` tensor. Randomly zeroes out entire channels of the `input` tensor with probability `p`. Each batch element has its own independent channel mask.
190	`feature_dropout_`	NeuralNetwork	Alpha	5.1	aten, KernelGen	The in-place version of `feature_dropout()`.
191	`fill_scalar`	Tensor	Stable	2.2	aten, pointwise	Fills a scalar with the specified value.
192	`fill_scalar_`	Tensor	Stable	2.2	aten, pointwise	The in-place version of `fill_scalar()`.
193	`fill_scalar_out`	Tensor	Stable	5.0	aten, pointwise, KernelGen	A variant of `fill_scalar()` that assigns the output to an `out` tensor.
194	`fill_tensor`	Tensor	Stable	2.2	aten, pointwise	Fills a tensor with the specified value.
195	`fill_tensor_`	Tensor	Stable	2.2	aten, pointwise	The in-place version of `fill_tensor()`.
196	`fill_tensor_out`	Tensor	Stable	5.0	aten, pointwise, KernelGen	A variant of `fill_tensor()` that assigns the output to an `out` tensor.
197	`flash_attention_forward`	NeuralNetwork	Stable	3.0	aten, NoCPU
198	`flash_attn_varlen_func`	NeuralNetwork	Stable	3.1	aten, Attention, FlashAttention	Compute attention for sequences of variable lengths within a single batch. Eliminating the need for padding.
199	`flash_attn_varlen_opt_func`	NeuralNetwork	Beta	5.1	aten, Attention, FlashAttention	A variant of `flash_attn_varlen_func` that has `lse` as an optional parameter.
200	`flash_mla`	NeuralNetwork	Stable	3.0	fused, Attention, vLLM	A variant of Multi-head Latent Attention (MLA).
201	`flash_mla_sparse_fwd`	NeuralNetwork	Alpha	5.1	fused, Attention, vLLM	Part of the FlashMLA.
202	`flip`	Tensor	Stable	2.1	aten, pointwise	Reverse the order of an n-D tensor along given axis in dims.
203	`floor`	Math	Beta	5.0	aten, KernelGen	Performs an element-wise floor operation, rounding each element of a tensor down to the nearest integer less than or equal to itself.
204	`floor_out`	Math	Beta	5.0	aten, KernelGen	Performs an element-wise floor operation with output tensor, rounding each element down to the nearest integer less than or equal to itself.
205	`floor_`	Math	Beta	5.0	aten, KernelGen	Performs an in-place element-wise floor operation, rounding each element of a tensor down to the nearest integer less than or equal to itself.
206	`floor_divide_scalar`	Math	Stable	2.1	aten	Computes `input` divided by `other`, elementwise, and floors the result.
207	`floor_divide_scalar_`	Math	Stable	2.2	aten	Computes `input` divided by `other`, elementwise, and floors the result.
208	`floor_divide_tensor`	Math	Stable	2.1	aten	Computes `input` divided by `other`, elementwise, and floors the result.
209	`floor_divide_tensor_`	Math	Stable	2.2	aten	Computes `input` divided by `other`, elementwise, and floors the result.
210	`fmin`	Math	Beta	5.0	aten, KernelGen	Computes the element-wise minimum of two tensors, specially handling NaN values by prioritizing the numerical value. Unlike `minimum()`, if one input is NaN and the other is a number, `fmin()` returns the number. It supports broadcasting, type promotion, and operates on both CPU and GPU.
211	`fmin_out`	Math	Beta	5.0	aten, KernelGen	A variant of `fmin()` that assigns the output to the `out` tensor.
212	`fmod_scalar`	Math	Alpha	5.1	aten, KernelGen	Computes the element-wise remainder of division of input by a scalar divisor.
213	`fmod_scalar_`	Math	Alpha	5.1	aten, KernelGen	In-place version of fmod with a scalar divisor.
214	`fmod_tensor`	Math	Alpha	5.1	aten, KernelGen	Computes the element-wise remainder of division of input by a tensor divisor.
215	`fmod_tensor_`	Math	Alpha	5.1	aten, KernelGen	In-place version of fmod with a tensor divisor.
216	`fp8_mqa_logits`	NeuralNetwork	Beta	5.1	fused, vLLM	For each token in the given E4M3 tensor, iterate all tokens from two other given tensors, calculate the logit.
217	`full`	Tensor	Stable	2.1	aten, pointwise, skip_precision_check	Creates a tensor of size `size` filled with `fill_value`. The tensor's dtype is inferred from `fill_value`.
218	`full_like`	Tensor	Stable	2.1	aten, pointwise	Returns a tensor with the same size as `input` filled with `fill_value`.
219	`functional_sym_constrain_range_for_size`	Tensor	Beta	5.0	aten, KernelGen	A low-level function used in symbolic shape analysis to restrict the possible numerical range (min/max) of an unbacked symbolic integer.
220	`fused_add_rms_norm`	NeuralNetwork	Stable	2.0	fused, Normalization
221	`fused_deepseek_v4_qnorm_rope_kv_rope_quant_insert`	NeuralNetwork	Beta	5.1	fused, vLLM, DeepSeekV4	Horizontally-fused DeepseekV4-MLA. per-head RMSNorm + GPT-J RoPE for Q, and GPT-J RoPE + UE8M0 FP8 quant + paged cache insert for KV, all in one kernel launch.
222	`fused_experts_impl`	NeuralNetwork	Beta	5.1	fused, vLLM, MoE	An implementation of fused MoE.
223	`fused_moe`	NeuralNetwork	Beta	5.1	fused, vLLM, MoE	The generic interface for fused MoE.
224	`fused_q_kv_rmsnorm`	NeuralNetwork	Beta	5.1	fused, Attention, vLLM, DeepSeekV4	Applies RMSNorm to Q and KV tensors in a single fused kernel for DeepSeekV4 attention.
225	`fused_recurrent_gated_delta_rule_fwd`	Attention	Alpha	5.0	fused, FLA	The forward case for `fused_recurrent_gated_delta_rule` used in Flash Linear Attention (FLA).
226	`gather`	Tensor	Stable	2.2	aten, Reduction	Gathers values along an axis specified by `dim`.
227	`gather_backward`	Tensor	Stable	2.2	aten, Reduction	The backward version of `gather()`.
228	`gcd`	Math	Beta	5.1	aten	Computes the element-wise greatest common divisor (GCD) of `input` and `other`.
229	`gcd_out`	Math	Beta	5.1	aten	A variant of `gcd()` that allows the output to be assigned to the specified `out`.
230	`ge`	Math	Stable	2.0	aten, pointwise	Computes `input` is greater or equal to `other` element-wise.
231	`ge_scalar`	Math	Stable	2.0	aten, pointwise	The scalar version of `ge()`.
232	`geglu`	NeuralNetwork	Alpha	4.2	fused, Activation, Transformer	Gaussian Error Gated Linear Unit with GELU activation instead of sigmoid function.
233	`gelu`	NeuralNetwork	Stable	1.0	aten, pointwise, Activation, nn.functional	Apply Cumulative Distribution Function for Gaussian Distribution function element-wise.
234	`gelu_`	NeuralNetwork	Stable	2.2	aten, Activation, pointwise	The in-place version of `gelu()`.
235	`gelu_and_mul`	NeuralNetwork	Stable	2.0	fused, pointwise, Activation	An activation function for GeGLU.
236	`gelu_backward`	NeuralNetwork	Stable	3.0	aten, Activation, pointwise	The backward version of `gelu()`.
237	`get_paged_mqa_logits_metadata`	NeuralNetwork	Beta	5.1	vLLM	Build scheduling metadata for paged MQA logits.
238	`get_scheduler_metadata`	Attention	Stable	4.0	NoCPU, vLLM	Computes scheduling metadata for attention work partitioning so that CPU computations can be routed to ISA-specific kernel implmentations. The metadata is stored in a tensor.
239	`glu`	NeuralNetwork	Stable	3.0	aten, Activation, pointwise	Gated Linear Unit activation for modulating the output of a linear transformation with a gate.
240	`glu_backward`	NeuralNetwork	Stable	4.0	aten, Activation, pointwise	The backward version of `glu()`.
241	`greater`	Math	Stable	5.1	aten	Test if input is greater than `other` elementwise.
242	`greater_out`	Math	Stable	5.1	aten	A variant of `greater` that saves the output to the specified `out`.
243	`greater_scalar`	Math	Stable	5.1	aten	A variant of `greater` for scalar variables.
244	`greater_scalar_out`	Math	Stable	5.1	aten	A variant of `greater_out` that saves the output to the specified `out`.
245	`grid_sample`	NeuralNetwork	Alpha	5.1	aten, nn.functional	Given an `input` and a flow-field `grid`, computes the `output` using `input` values and pixel locations from `grid`.
246	`group_norm`	NeuralNetwork	Stable	2.0	aten, Reduction	An internal IR for applying Group Normalization for last certain number of dimensions.
247	`group_norm_backward`	NeuralNetwork	Stable	3.0	aten, Reduction	The backward case for `group_norm()`.
248	`grouped_mm`	BLAS	Beta	5.1	aten	Grouped matrix multiply is a functional operator designed to accelerate Mixture-of-Experts (MoE) models by computing multiple matrix multiplications in a single kernel launch.
249	`grouped_topk`	MoE	Stable	5.0	fused, NoCPU, vLLM	A specialized routing mechanism used in Mixture-of-Experts (MoE) models (like DeepSeek-V3/R1) to select top-k experts by first grouping them, rather than selecting globally.
250	`gt`	Math	Stable	2.0	aten, pointwise	Computes that `input` is greater than `other` element-wise.
251	`gt_scalar`	Math	Stable	2.0	aten, pointwise	The scalar version of `gt()`.
252	`hardsigmoid`	NeuralNetwork	Beta	5.0	aten, pointwise, nn.functional, Activation, KernelGen	An activation function that provides a piecewise linear approximation of the standard sigmoid function, mapping inputs to a range between 0 and 1.
253	`hardsigmoid_out`	NeuralNetwork	Beta	5.0	aten, pointwise, nn.functional, Activation, KernelGen	A variant of `hardsigmoid` that supports an output tensor to receive the result.
254	`hardswish_`	NeuralNetwork	Beta	5.0	aten, pointwise, KernelGen, Activation	Applies the Hard Swish activation function, commonly used in models like MobileNetV3 to improve accuracy while reducing computational cost compared to traditional Swish. This is an in-place version.
255	`hc_split_sinkhorn_forward`	NeuralNetwork	Beta	5.1	fused	Computes a differentiable approximation of the Wasserstein distance (Optimal Transport) between two probability distributions or point clouds.
256	`histc`	Math	Alpha	5.1	aten, KernelGen	Computes the histogram of a tensor, binning each element into equal-width bins.
257	`hstack`	Tensor	Stable	2.2	aten	Stack tensors in sequence horizontally (column wise). This is equivalent to concatenation along the first axis for 1-D tensors, and along the second axis for all other tensors.
258	`hypot`	Math	Beta	5.0	aten, KernelGen	Given the legs of a right triangle, return its hypotenuse. The shapes of both input tensors must be broadcastable.
259	`hypot_out`	Math	Beta	5.0	aten, KernelGen	Given the legs of a right triangle, return its hypotenuse. The shapes of both input tensors must be broadcastable. This is a variant of `hypot` that allows the output to be a different tensor.
260	`i0`	Math	Beta	5.0	aten, KernelGen	Computes the modified Bessel function of the first kind of order zero element-wise for a given input tensor.
261	`i0_`	Math	Beta	5.0	aten, KernelGen	The inplace version of `i0`.
262	`i0_out`	Math	Beta	5.0	aten, KernelGen	A variant of `i0` that assigns the output to the `out` tensor.
263	`index`	Reduction	Stable	4.2	aten	Extract, access or modify specific elements, slices, or subsets of data within a tensor. The location of data is specified for each dimension, starting from index 0.
264	`index_add`	Tensor	Stable	2.2	aten	Accumulate the elements of `alpha` times `source` into the `input` tensor by adding to the indices in the order given in `index`.
265	`index_add_`	Tensor	Stable	4.0	aten	The in-place version of `index_add()`.
266	`index_copy`	Tensor	Beta	5.1	aten, KernelGen	Copies the elements from `source` into `input` at the positions specified by `index` along the given `dim`.
267	`index_copy_`	Tensor	Beta	5.1	aten, KernelGen	The in-place version of `index_copy()`.
268	`index_put`	Tensor	Stable	2.2	aten	Puts values from the tensor `values` into the tensor `input` using the indices specified in `indices` (which is a tuple of Tensors).
269	`index_put_`	Tensor	Stable	3.0	aten	The in-place version of `index_put()`.
270	`index_put_impl`	Tensor	Beta	5.1	aten	An internal C++ function that handles the heavy lifting for placing values into a tensor at specific indices.
271	`index_select`	Tensor	Stable	2.1	aten	Returns a new tensor which indexes the `input` tensor along dimension `dim` using the entries in `index`.
272	`indexer_k_quant_and_cache`	Quantization	Beta	5.1	fused, vLLM	This is a fused operator that quantizes K tensors and writes them into the FP8 KV cache.
273	`inplace_fused_experts`	MoE	Beta	5.0	fused, Activation, vLLM	This operator writes output directly to `hidden_states`.
274	`instance_norm`	NeuralNetwork	Beta	5.1	fused	Apply Instance Normalization independently for each channel in every data sample within a batch.
275	`is_all_true`	Tensor	Beta	5.1	aten, pointwise	The low-level implementation for checking if all elements in a tensor are True.
276	`isclose`	Math	Stable	2.1	aten, pointwise	Returns a new tensor with boolean elements representing if each element of `input` is "close" to the corresponding element of `other`. The closeness is defined with `rtol` and `atol`.
277	`isfinite`	Math	Stable	2.1	aten, pointwise	Returns a new tensor with boolean elements representing if each element is finite or not.
278	`isin`	Tensor	Stable	2.2	aten	Tests if each element of `elements` is in `test_elements`. Returns a boolean tensor of the same shape as `elements` that is True for elements in `test_elements` and False otherwise.
279	`isin_scalar_tensor`	Tensor	Stable	2.2	aten	A variant of `isin()`.
280	`isin_tensor_scalar`	Tensor	Stable	2.2	aten	A variant of `isin()`.
281	`isinf`	Math	Stable	2.0	aten, pointwise	Tests if each element of `input` is infinite (positive or negative infinity) or not.
282	`isnan`	Math	Stable	2.0	aten, pointwise	Returns a new tensor with boolean elements representing if each element of `input` is NaN or not.
283	`isneginf`	Math	Stable	5.1	aten, KernelGen, pointwise	Tests if each element of `input` is negative infinity or not.
284	`isneginf_out`	Math	Stable	5.1	aten, KernelGen, pointwise	A variant of `isneginf` that saves the output to the specified `out`.
285	`kron`	LinearAlg	Stable	2.2	aten	Computes the Kronecker product of `input` and `other`.
286	`layer_norm`	NeuralNetwork	Stable	1.0	aten	An internal IR for applying Layer Normalization for last certain number of dimensions.
287	`layer_norm_backward`	Reduction	Stable	3.0	aten	The backward case for `layer_norm()`.
288	`le`	Math	Stable	2.0	aten, pointwise	Computes that `input` is less than or equal to `other` element-wise.
289	`le_scalar`	Math	Stable	2.0	aten	The scalar version of `le()`.
290	`leaky_relu`	NeuralNetwork	Beta	5.1	aten	Applies the LeakyReLU function element-wise.
291	`leaky_relu_`	NeuralNetwork	Beta	5.1	aten	The in-place version of `leaky_relu()`.
292	`leaky_relu_out`	NeuralNetwork	Beta	5.1	aten	A variant of `leaky_relu()`.
293	`lerp_scalar`	LinearAlg	Stable	3.0	aten, pointwise	The scalar version of `lerp()`.
294	`lerp_scalar_`	LinearAlg	Stable	3.0	aten, pointwise	The in-place, scalar version of `lerp()`.
295	`lerp_tensor`	LinearAlg	Stable	3.0	aten, pointwise	Performs a linear interpolation of two tensors `start` (given by `input`) and `end` based on a scalar or tensor `weight` and returns the resulting `out` tensor.
296	`lerp_tensor_`	LinearAlg	Stable	3.0	aten, pointwise	The in-place version of `lerp()`.
297	`lift_fresh_copy`	Tensor	Beta	5.0	aten, KernelGen	Creates a new, independent copy of a tensor within a compiled graph.
298	`linspace`	Tensor	Stable	2.2	aten	Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from `start` to `end`, inclusive.
299	`log`	Math	Stable	2.2	aten, pointwise	Returns a new tensor with the natural logarithm of the elements of `input`.
300	`log10`	Math	Beta	5.1	aten, pointwise	Returns a new tensor with the logarithm to the base 10 of the elements of `input`.
301	`log10_`	Math	Beta	5.1	aten, pointwise	The in-place version of `log10()`.
302	`log10_out`	Math	Beta	5.1	aten, pointwise	A variant of `log10()` that assigns the output to the provided `out`.
303	`log1p`	Math	Beta	5.0	aten, KernelGen	Computes the natural logarithm of `1+x(y_i=log_e(x_i+1))` for each element in the input tensor.
304	`log1p_`	Math	Beta	5.0	aten, KernelGen	Computes the natural logarithm of `1+x(y_i=log_e(x_i+1))` for each element in the input tensor in-place.
305	`log_sigmoid`	NeuralNetwork	Stable	2.2	aten, pointwise, nn.functional	Applies the Logsigmoid function element-wise.
306	`log_softmax`	NeuralNetwork	Stable	3.0	aten, Reduction	An internal IR for applying a softmax followed by a logarithm.
307	`log_softmax_backward_data`	NeuralNetwork	Alpha	5.0	aten, KernelGen	Computes the gradient of the input tensor with respect to a `log_softmax` operation during backpropagation.
308	`log_softmax_backward_data_out`	NeuralNetwork	Alpha	5.0	aten, KernelGen	A variant of `_log_softmax_backward_data` that assigns the output to the `out` tensor.
309	`log_softmax_out`	NeuralNetwork	Stable	3.0	aten, Reduction	An internal IR for applying a softmax followed by a logarithm.
310	`logaddexp`	Math	Beta	5.0	aten, pointwise, KernelGen	Computes the element-wise logarithm of the sum of the exponentials of two input tensors.
311	`logaddexp_out`	Math	Beta	5.0	aten, pointwise, KernelGen	A variant of `logaddexp` that allows the output to be assigned to an `out` tensor.
312	`logical_and`	Math	Stable	2.2	aten, pointwise	Computes the element-wise logical AND of the given `input` tensors. Zeros are treated as False and nonzeros are treated as True.
313	`logical_and_`	Math	Stable	5.0	aten, pointwise	The in-place version of `logical_and()`.
314	`logical_not`	Math	Stable	2.2	aten, pointwise	Computes the element-wise logical NOT of the given `input` tensor.
315	`logical_or`	Math	Stable	2.2	aten, pointwise	Computes the element-wise logical OR of the given input tensors.
316	`logical_or_`	Math	Stable	5.0	aten, pointwise	The in-place version of `logical_or()`.
317	`logical_xor`	Math	Stable	2.2	aten, pointwise	Computes the element-wise logical XOR of the given input tensors.
318	`logit`	LinearAlg	Beta	5.0	aten, pointwise, KernelGen	Returns a new tensor with the logit of the elements of `input`. `input` is clamped to `[eps, 1-eps]` when `eps` is not None. When eps is None and `input<0` or `input>1`, the function will yield NaN.
319	`logit_`	LinearAlg	Beta	5.0	aten, pointwise, KernelGen	The in-place version of `logit()`.
320	`logit_out`	LinearAlg	Beta	5.0	aten, pointwise, KernelGen	A variant of `logit` that allows the output to be assigned to another tensor.
321	`logspace`	tensor	Stable	4.0	aten	Creates a one-dimensional tensor of size `steps` whose values are evenly spaced from `base^start` to `base^end`, inclusive, on a logarithmic scale with base `base`.
322	`logsumexp`	Math	Alpha	5.1	aten, KernelGen	Computes the log of the sum of exponentials of elements in the input tensor along given dimensions.
323	`lt`	Math	Stable	2.0	aten, pointwise	Computes that `input` is less than `other` element-wise.
324	`lt_scalar`	Math	Stable	2.0	aten, pointwise	The scalar version of `lt`.
325	`margin_ranking_loss`	NeuralNetwork	Beta	5.1	aten, nn.functional	Compute the margin ranking loss.
326	`masked_fill`	Tensor	Stable	2.2	aten, pointwise	Fills elements of given tensor with `value` where `mask` is True.
327	`masked_fill_`	Tensor	Stable	2.2	aten, pointwise, skip_precision_check	The in-place version of `masked_fill()`.
328	`masked_fill_scalar`	Tensor	Stable	2.2	aten, pointwise	Fills elements of given tensor with `value` where `mask` is True.
329	`masked_fill_scalar_`	Tensor	Stable	2.2	aten, pointwise, skip_precision_check	The in-place version of `masked_fill()`.
330	`masked_scatter`	tensor	Stable	4.2	aten	Copies elements from `source` into the given tensor at positions where the `mask` is True.
331	`masked_scatter_`	tensor	Stable	4.2	aten	The in-place version of `masked_scatter()`.
332	`masked_select`	Tensor	Stable	2.1	aten	Returns a new 1-D tensor which indexes the `input` tensor according to the boolean mask `mask` which is a BoolTensor.
333	`max`	LinearAlg	Stable	2.0	aten, Reduction	Returns the maximum value of all elements in the `input` tensor.
334	`max_dim`	LinearAlg	Stable	2.0	aten, Reduction	Returns a namedtuple `(values, indices)` where `values` is the maximum value of each row of the `input` tensor in the given dimension `dim`. And `indices` is the index location of each maximum value found (argmax).
335	`max_pool2d_backward`	IR	Stable	4.0	aten	Applies a 2D max pooling over an input signal composed of several input planes. This is an IR representation rather than a public API and it is for the backward step.
336	`max_pool2d_with_indices`	IR	Stable	4.0	aten	Applies a 2D max pooling over an input signal composed of several input planes. This is an IR representation rather than a public API.
337	`max_pool3d_backward`	NeuralNetwork	Beta	5.1	aten, nn.functional	The backward version of `max_pool2d_with_indices()`.
338	`max_pool3d_with_indices`	NeuralNetwork	Beta	5.1	aten, nn.functional	Applies a 3D max pooling over an input signal composed of several input planes.
339	`maximum`	Math	Stable	2.1	aten, pointwise	Computes the element-wise maximum of `input` and `other`.
340	`mean`	LinearAlg	Stable	1.0	aten, Reduction	Returns the mean value of all elements in the `input` tensor. Input must be floating point or complex.
341	`mean_dim`	Reduction	Stable	2.0	aten	Returns the mean value of each row of the `input` tensor in the given dimension `dim`. If `dim` is a list of dimensions, reduce over all of them.
342	`hc_head_fused_kernel`	NeuralNetwork	Beta	5.2	fused, vLLM, DSA	The head fusion kernel for MHC (Manifold-Constrained Hyper-Connections). This fused implementation computes RMS-normalized hidden states and applies per-head weighted mixing to produce the output activations.
343	`mhc_bwd`	NeuralNetwork	Beta	5.1	fused, vLLM, DSA	The backward case for MHC (Manifold-Constrained Hyper-Connections). This is the Triton implmentation for Sinkhorn implicit CG differentiation. It computes the gradient of the Sinkhorn normalization using implicit differentiation via the conjugate gradient method.
344	`mhc_post`	NeuralNetwork	Beta	5.1	fused, vLLM, DSA	Triton implementation of mHC Post operator (optimized v3).
345	`mhc_pre`	NeuralNetwork	Beta	5.1	fused, vLLM, DSA	Triton implementation of mHC Pre operator (optimized v2).
346	`min`	Tensor	Stable	2.0	aten, Reduction	Returns the minimum value of all elements in the `input` tensor.
347	`min_dim`	LinearAlg	Stable	2.0	aten, Reduction	Returns a namedtuple `(values, indices)` where `values` is the minimum value of each row of the `input` tensor in the given dimension `dim`. And `indices` is the index location of each minimum value found (argmin).
348	`minimum`	Math	Stable	2.1	aten, pointwise	Computes the element-wise minimum of `input` and `other`.
349	`mm`	BLAS	Stable	1.0	aten	Performs a matrix multiplication of the two input matrices.
350	`mm_out`	BLAS	Stable	3.0	aten	A variant of `mm()` with `out` specified.
351	`moe_align_block_size_triton`	MoE	Stable	4.2	fused, Reduction, vLLM	Aligns the token distribution across experts to be compatible with block size for matrix multiplication.
352	`moe_sum`	MoE	Stable	4.2	fused, Reduction, vLLM	An implementation of Mixture of Experts (MoE) with sum-based aggregation instead of the more common weighted average.
353	`mse_loss`	NeuralNetwork	Stable	2.2	aten, pointwise, nn.functional	Compute the element-wise mean squared error, with optional weighting.
354	`reflection_pad1d_backward`	Math	Alpha	5.1	aten, KernelGen	Computes the gradient of reflection_pad1d with respect to the input tensor.
355	`smooth_l1_loss`	NeuralNetwork	Alpha	5.1	aten, pointwise, nn.functional	Compute the smooth L1 loss between input and target tensors.
356	`smooth_l1_loss_backward`	NeuralNetwork	Alpha	5.1	aten, pointwise, nn.functional	Compute the gradient of smooth L1 loss with respect to the input tensor.
357	`mul`	Math	Stable	1.0	aten, pointwise	Multiplies `input` by `other`.
358	`mul_`	Math	Stable	2.2	aten, pointwise	The in-place version of `mul()`.
359	`multinomial`	Distribution	Stable	2.1	aten, skip_precision_check	Returns a tensor where each row contains `num_samples` indices sampled from the multinomial probability distribution located in the corresponding row of tensor `input`.
360	`mv`	BLAS	Stable	2.0	aten	Performs a matrix-vector product of the matrix `input` and the vector `vec`.
361	`nan_to_num`	Math	Stable	3.0	aten, pointwise	Replaces `NaN`, positive infinity, and negative infinity values in `input` with the values specified by `nan`, `posinf`, and `neginf`, respectively.
362	`ne`	Math	Stable	2.0	aten, pointwise	Computes that `input` is not equal to `other` element-wise.
363	`ne_scalar`	Math	Stable	2.0	aten, pointwise	The scalar version of `ne()`.
364	`neg`	Math	Stable	2.0	aten, pointwise	Returns a new tensor with the negative of the elements of `input`.
365	`neg_`	Math	Stable	2.2	aten, pointwise	The in-place version of `neg()`.
366	`new_full`	Tensor	Beta	5.1	aten, pointwise	Returns a Tensor of size `size` filled with `fill_value`. By default, the returned Tensor has the same `torch.dtype` and `torch.device` as this tensor.
367	`nll_loss_backward`	NeuralNetwork	Stable	2.2	aten, IR	Compute the negative log likelihood loss. This is the backward case.
368	`nll_loss_forward`	NeuralNetwork	Stable	2.2	aten, IR	Compute the negative log likelihood loss. This is the forward case.
369	`nll_loss2d_backward`	NeuralNetwork	Stable	2.2	aten, IR	An internal IR for supporting `torch.nn.NLLLoss2d`, which has been deprecated and is now integrated into the standard `torch.nn.NLLLoss`. This is the backward case.
370	`nll_loss2d_forward`	NeuralNetwork	Stable	2.2	aten, IR	An internal IR for supporting `torch.nn.NLLLoss2d`, which has been deprecated and is now integrated into the standard `torch.nn.NLLLoss`. This is the forward case.
371	`nll_loss_nd_backward`	NeuralNetwork	Stable	5.0	aten	Measures the performance of a classification model by penalizing low probabilities for correct classe.s This computes the gradients of this loss with respect to model parameters using automatic differentiation.
372	`nll_loss_nd_forward`	NeuralNetwork	Stable	5.0	aten	Measures the performance of a classification model by calculating the negative log probability of the true class. This defines the computation flow, transforming input data through layers to produce output predictions.
373	`nonzero`	Tensor	Stable	2.1	aten	Returns a 2-D tensor where each row is the index for a nonzero value. When `as_tuple` is explicitly set to True, this returns a tuple of 1-D index tensors, allowing for advanced indexing of all nonzero values.
374	`nonzero_numpy`	Tensor	Alpha	5.1	aten, KernelGen	Returns a tuple of 1-D tensors, one for each dimension, containing the indices of the nonzero elements in the input tensor (NumPy-style).
375	`normal_float_float_`	Distribution	Stable	5.0	aten, pointwise, skip_precision_check	Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a float `mean` and a float `std`.
376	`normal_float_tensor`	Distribution	Stable	2.1	aten, pointwise	Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a float `mean` and a tensor `std`.
377	`normal_tensor_float`	Distribution	Stable	2.1	aten, pointwise	Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a tensor `mean` and a float `std`.
378	`normal_tensor_tensor`	Distribution	Stable	2.1	aten, pointwise	Returns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a tensor `mean` and a tensor `std`.
379	`normed_cumsum`	Reduction	Stable	2.1	aten	Get the normalized cumulative sum where each step is divided by the total sum of the dataset, resulting in values ranging from 0 to 1. Internally used by the `multinomial` operator.
380	`one_hot`	NeuralNetwork	Stable	5.0	aten, nn.functional, KernelGen	Takes LongTensor with index values of shape `()` and returns a tensor of shape `(, num_classes)` that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.
381	`ones`	Tensor	Stable	2.1	aten, skip_precision_check	Returns a tensor filled with the scalar value 1, with the shape defined by the variable argument `size`.
382	`ones_like`	Tensor	Stable	2.1	aten	Returns a tensor filled with the scalar value 1, with the same size as `input`.
383	`outer`	BLAS	Stable	2.0	fused	Computes outer product of self and the input vector. If the self tensor is a vector of size `n` and the input tensor is a vector of size `m`, the `out` tensor (if specified) must be a matrix of size `n * m`.
384	`outplace_fused_experts`	MoE	Beta	5.0	fused, Activation, vLLM	This operator allocates and returns a new output tensor.
385	`pack_seq_triton`	NeuralNetwork	Beta	5.1	fused, vLLM, DeepSeekV4	Pack variable-length token sequences into a padded batched tensor.
386	`pad`	NeuralNetwork	Stable	2.1	aten, pointwise, nn.functional	This pads a tenor using the specified mode.
387	`per_token_group_quant_fp8`	Quantization	Alpha	4.2	NoCPU, vLLM	Function to perform per-token-group quantization on an input tensor `x`. It converts the tensor values into signed float8 values and returns the quantized tensor along with the scaling factor used for quantization.
388	`pixel_shuffle`	NeuralNetwork	Beta	5.0	aten, nn.functional	Rearranges elements in a tensor to a new tensor of different shape.
389	`pixel_unshuffle`	NeuralNetwork	Beta	5.0	aten, KernelGen	Rearranges elements from a low-resolution feature map with many channels into a higher-resolution feature map with fewer channels.
390	`pixel_unshuffle_out`	NeuralNetwork	Beta	5.0	aten, KernelGen	A variant of `pixel_unshuffle` that assigns the output to the `out` tensor.
391	`poisson`	Math	Beta	5.1	aten, KernelGen	Returns a tensor of the same size as `input` with each element sampled from a Poisson distribution with rate given by the corresponding element in `input`.
392	`polar`	Math	Stable	3.0	aten, pointwise	Constructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value `abs` and angle `angle`.
393	`pow_scalar`	Math	Stable	1.0	aten	Takes the power of each element in `input` with `exponent` and returns a tensor with the result. The input is a single float, while the `exponent` is a tensor.
394	`pow_tensor_scalar`	Math	Stable	1.0	aten, pointwise	Takes the power of each element in `input` with `exponent` and returns a tensor with the result. The input is a tensor, while the `exponent` is a float.
395	`pow_tensor_scalar_`	Math	Stable	2.2	aten, pointwise	This is the in-place version of `pow_tensor_scalar()`.
396	`pow_tensor_tensor`	Math	Stable	1.0	aten, pointwise	Takes the power of each element in `input` with `exponent` and returns a tensor with the result. The input is a tensor, while the `exponent` is also a tensor.
397	`pow_tensor_tensor_`	Math	Stable	2.2	aten, pointwise	This is the in-place version of `pow_tensor_tensor()`.
398	`prelu`	NeuralNetwork	Beta	5.0	aten, Activation, pointwise, nn.functional, KernelGen	An activation function used in neural networks that improves upon ReLU (Rectified Linear Unit) by allowing the network to learn the slope of negative inputs. It performs an element-wise operation that keeps positive values and scales negative values by a learnable parameter.
399	`prod`	LinearAlg	Stable	2.0	aten, Reduction	Returns the product of all elements in the `input` tensor.
400	`prod_dim_int`	Reduction	Stable	2.0	aten	Returns the product of each row of the `input` tensor in the given dimension `dim`.
401	`quantile`	Tensor	Stable	2.2	aten	Computes the q-th quantiles of each row of the `input` tensor along the dimension `dim`.
402	`rad2deg`	Math	Alpha	5.1	aten, KernelGen, pointwise	Converts each element from angles in radians to degrees.
403	`rad2deg_`	Math	Alpha	5.1	aten, KernelGen, pointwise	In-place version of rad2deg.
404	`rand`	Distribution	Stable	2.1	aten	Returns a tensor filled with random numbers from a uniform distribution on the interval `[0,1)`.
405	`rand_like`	Distribution	Stable	2.1	aten	Returns a tensor with the same size as `input` that is filled with random numbers from a uniform distribution on the interval `[0,1)`.
406	`randn`	Distribution	Stable	2.1	aten	Returns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution).
407	`randn_like`	Distribution	Stable	2.1	aten	Returns a tensor with the same size as `input` that is filled with random numbers from a normal distribution with mean 0 and variance 1.
408	`randint`	Distribution	Alpha	5.1	aten, KernelGen	Returns a tensor filled with random integers generated uniformly between low (inclusive) and high (exclusive).
409	`randperm`	Distribution	Stable	2.2	aten, skip_precision_check	Returns a random permutation of integers from `0` to `n - 1`.
410	`reciprocal`	Math	Stable	1.0	aten, pointwise	Returns a new tensor with the reciprocal of the elements of `input`.
411	`reciprocal_`	Math	Stable	2.2	aten, pointwise	This is the in-place version of `reciprocal()`.
412	`reflection_pad1d`	NeuralNetwork	Beta	5.0	aten, pointwise, KernelGen	Pads the input 3D or 2D tensor (typically representing signals or sequences) by reflecting the boundary values at the edges.
413	`reflection_pad1d_out`	NeuralNetwork	Beta	5.0	aten, pointwise, KernelGen	A variant of `reflection_pad1d` that assigns the output to `out` tensor.
414	`reflection_pad2d`	NeuralNetwork	Beta	5.0	aten, pointwise, KernelGen	Pads the input 3D or 2D tensor (typically representing signals or sequences) by reflecting the boundary values at the both edges.
415	`reflection_pad2d_out`	NeuralNetwork	Beta	5.0	aten, pointwise, KernelGen	A variant of `reflection_pad2d` that assigns the output to `out` tensor.
416	`reglu`	NeuralNetwork	Alpha	4.2	fused, Transformer	Rectified Gated Linear Unit is a variant of GLU that uses ReLU instead of the sigmoid function for gating.
417	`relu`	NeuralNetwork	Stable	1.0	aten, Activation, pointwise, nn.functional	Apply the RELU (Rectified Linear Unit) activation function element-wise.
418	`relu_`	NeuralNetwork	Stable	2.2	aten, pointwise, Activation	This is the in-place version of `relu()`.
419	`relu6`	NeuralNetwork	Beta	5.0	aten, pointwise, Activation, KernelGen	Applies the element-wise function `f(x)=min(max(0,x),6)`. This is a variation of the standard ReLU activation function that "caps" its output at a maximum value of 6.
420	`remainder_scalar`	Math	Stable	2.2	aten	Computes Python's modulus operation entrywise. The result has the same sign as the divisor `other` and its absolute value is less than that of `other`.
421	`remainder_scalar_`	Math	Stable	2.2	aten	This is the in-place version of `remainder()`.
422	`remainder_scalar_tensor`	Math	Stable	2.2	aten	Computes Python's modulus operation entrywise. The result has the same sign as the divisor `other` and its absolute value is less than that of `other`.
423	`remainder_tensor`	Math	Stable	2.2	aten	Computes Python's modulus operation entrywise. The result has the same sign as the divisor `other` and its absolute value is less than that of `other`.
424	`remainder_tensor_`	Math	Stable	2.2	aten	This is the in-place version of `remainder()`.
425	`repeat`	Tensor	Stable	2.1	aten	Repeats this tensor along the specified dimensions.
426	`repeat_interleave_self_int`	Tensor	Stable	2.2	aten, pointwise	Repeats elements of a tensor. The number of repetitions is specified as an integer `repeats`.
427	`repeat_interleave_self_tensor`	Tensor	Stable	2.2	aten, pointwise	Repeats elements of a tensor. The number of repetitions is specified as a tensor `repeats`. `repeats` is broadcasted to fit the shape of the given axis.
428	`repeat_interleave_tensor`	Tensor	Stable	2.2	aten, pointwise	Repeats 0 `repeats[0]` times, 1 `repeats[1]` times, 2 `repeats[2]` times, etc.
429	`replication_pad1d`	Tensor	Beta	5.0	aten, KernelGen	Pads the edge of a 1D input tensor by repeating the boundary values.
430	`replication_pad1d_out`	Tensor	Beta	5.0	aten, KernelGen	A variant of `replication_pad1d` that assigns the output to the `out` tensor.
431	`replication_pad3d`	NeuralNetwork	Alpha	5.0	aten	Pads the edge of a 3D input tensor by repeating the boundary values.
432	`reshape_and_cache`	Attention	Stable	3.0	fused, vLLM	Store the key/value token states into the pre-allcated kv_cache buffers of paged attention.
433	`reshape_and_cache_flash`	Attention	Stable	3.0	fused	Store the key/value token states into the pre-allcated kv_cache buffers of paged attention.
434	`resolve_conj`	Science	Stable	2.1	aten	Returns a new tensor with materialized conjugation if `input`'s conjugate bit is set to True, else returns `input`. The output tensor will always have its conjugate bit set to False.
435	`resolve_neg`	Science	Stable	2.1	aten	Returns a new tensor with materialized negation if `input`'s negative bit is set to True, else returns `input`. The output tensor will always have its negative bit set to False.
436	`rms_norm`	NeuralNetwork	Stable	2.0	aten, nn.functional, Reduction	Apply Root Mean Square Layer Normalization over a mini-batch of inputs.
437	`roll`	BLAS	Beta	5.1	aten, KernelGen	Roll the tensor input along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position.
438	`round`	Math	Beta	5.1	aten, pointwise	Rounds elements of input to the nearest integer.
439	`round_`	Math	Beta	5.1	aten, pointwise	The inplace version of `round`.
440	`round_out`	Math	Beta	5.1	aten, pointwise	A variant of `round` that assigns the output to the specifiec `out`.
441	`rrelu_with_noise_backward`	NeuralNetwork	Beta	5.0	aten, KernelGen	Computes the gradient of the Randomized Leaky ReLU (RReLU) activation function with respect to its input during backpropagation. It uses the noise tensor generated in the forward pass to correctly apply the slope to negative input values.
442	`rsqrt`	Math	Stable	1.0	aten, pointwise	Returns a new tensor with the reciprocal of the square-root of each of the elements of `input`.
443	`rsqrt_`	Math	Stable	2.2	aten, pointwise	The in-place version of `rsqrt()`.
444	`rsub_scalar`	Math	Alpha	5.1	aten, KernelGen	Substracts `other`, scaled by `alpha`, from `input`. This is the scalar version.
445	`rsub_tensor`	Math	Alpha	5.1	aten, KernelGen	Substracts `other`, scaled by `alpha`, from `input`. This is the tensor version.
446	`rwkv_ka_fusion`	RWKV	Stable	4.1	fused	Merges, aligns, and enhances features from different data sources or spatial directions using the efficient, linear-time RWKV framework.
447	`rwkv_mm_sparsity`	RWKV	Stable	4.1	fused	Optimized, lossless sparse matrix multiplication in RWKV-7 models.
448	`safe_softmax`	NeuralNetwork	Alpha	5.1	aten, IR, KernelGen	Apply a softmax function. Note this version may not be functional.
449	`scaled_dot_product_attention`	NeuralNetwork	Stable	2.2	nn.functional, Attention	Computes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed and applying dropout if a probability greater than 0.0 is specified. The optional scale argument can only be specified as a keyword argument.
450	`scaled_dot_product_attention_backward`	NeuralNetwork	Stable	2.2	nn.functional, Attention	The backward case for `scaled_dot_product_attention`.
451	`scaled_dot_product_attention_forward`	NeuralNetwork	Stable	2.2	nn.functional, Attention	The forward case for `scaled_dot_product_attention`.
452	`scaled_mm`	BLAS	Beta	5.1	aten	Performs a scaled matrix multiplication. The result of `self @ mat2` is multiplied by `scale_a` and `scale_b`, then an optional bias is added.
453	`scaled_mm_out`	BLAS	Beta	5.1	aten	A variant of `_scaled_mm` that writes the result into `out`.
454	`scaled_softmax_backward`	Reduction	Stable	4.2	aten	The backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA) within Transformer models, computes the gradient of the loss with respect to the input logits, incorporating a scaling factor to stabilize training.
455	`scaled_softmax_forward`	Reduction	Stable	4.2	aten	The backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA) within Transformer models, computes the gradient of the loss with respect to the input logits, incorporating a scaling factor to stabilize training.
456	`scatter_add_`	Tensor	Stable	4.2	aten	Adds all values from the tensor `src` into `self` at the indices specified in the `index` tensor in a similar fashion as `scatter_()`. For each value in `src`, it is added to an index in `self` which is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`.
457	`scatter_reduce`	Tensor	Stable	2.2	aten	Writes all values from the tensor `src` into provided tensor at the indices specified in the `index` tensor. For each value in `src`, its output index is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`. The optional `reduce` argument allows specification of an optional reduction operation, which is applied to all values in the tensor `src` into the tensor at the indices specified in the `index`.
458	`scatter_reduce_`	Tensor	Stable	3.0	aten, KernelGen	This is the in-place version of `scatter_reduce()`.
459	`scatter_reduce_two_`	Reduction	Alpha	5.1	aten, KernelGen	A specific low-level ATen operator primarily encountered during model compilation or when using advanced backends like TensorRT or MPS.
460	`scatter_src`	Tensor	Stable	2.2	aten	Writes all values from the tensor `src` into provided tensor at the indices specified in the `index` tensor. For each value in `src`, its output index is specified by its index in `src` for `dimension != dim` and by the corresponding value in `index` for `dimension = dim`. The optional `reduce` argument allows specification of an optional reduction operation, which is applied to all values in the tensor `src` into the tensor at the indices specified in the `index`.
461	`scatter_src_`	Tensor	Stable	3.0	aten	This is the in-place version of `scatter_src()`.
462	`select_backward`	NeuralNetwork	Beta	5.1	aten	Calculate the gradient during the backward pass in the neural network.
463	`select_scatter`	Tensor	Stable	2.2	aten	Embeds the values of the `src` tensor into `input` at the given index. This function returns a tensor with fresh storage; it does not create a view.
464	`selu`	NeuralNetwork	Beta	5.0	aten, pointwise, nn.functional, Activation, KernelGen	Applies an element-wise activation function that induces self-normalizing properties in neural networks. It scales the Exponential Linear Unit (ELU) to ensure activations remain close to zero mean and unit variance.
465	`selu_`	NeuralNetwork	Beta	5.0	aten, pointwise, Activation, KernelGen	This is the in-place version of `selu`.
466	`sgn_`	Math	Beta	5.0	aten, KernelGen	Computes the sign of each element in the `self` tensor, element-wise. This function is an extension of `sign()` designed to handle complex tensors in addition to real-valued ones.
467	`sigmoid`	NeuralNetwork	Stable	2.0	aten, pointwise	Computes the expit (also known as the logistic sigmoid function) of the elements of `input`.
468	`sigmoid_`	NeuralNetwork	Stable	2.2	aten, pointwise	The in-place version of `sigmoid()`.
469	`sigmoid_backward`	NeuralNetwork	Stable	3.0	aten, pointwise	The backward version of `sigmoid()`.
470	`signbit`	Tensor	Beta	5.1	aten, pointwise	Tests if each element of `input` has its sign bit set or not.
471	`signbit_out`	Tensor	Beta	5.1	aten, pointwise	A variant of `signbit` that assigns the output to `out`.
472	`silu`	NeuralNetwork	Stable	1.0	aten, pointwise, nn.functional	SiLU (Sigmoid Linear Unit), a simple approximation of ReLU but without any discontinuity of the first derivative.
473	`silu_`	NeuralNetwork	Stable	2.2	aten, nn.functional, pointwise	The in-place version of `silu()`.
474	`silu_and_mul`	Activation	Stable	2.0	fused, pointwise, vLLM	A custom operator in vLLM as activation function for SwiGLU.
475	`silu_and_mul_out`	Activation	Stable	2.0	fused, pointwise, vLLM	A variant of `silu_and_mul` with an extra `out` argument.
476	`silu_and_mul_with_clamp`	Activation	Stable	5.1	fused, pointwise, vLLM	A custom operator in vLLM as activation function for SwiGLU.
477	`silu_and_mul_with_clamp_out`	Activation	Stable	5.1	fused, pointwise, vLLM	A variant of `silu_and_mul_with_clamp` with an extra `out` argument.
478	`silu_backward`	NeuralNetwork	Stable	3.0	aten, pointwise	A variant of `silu()` for backward case.
479	`sin`	Math	Stable	2.0	aten, pointwise	Returns a new tensor with the sine of the elements in the `input` tensor, where each value in this input tensor is in radians.
480	`sin_`	Math	Stable	2.2	aten, pointwise	The in-place version of `sin()`.
481	`sinh_`	Math	Beta	5.0	aten, KernelGen	Computes the hyperbolic sine `(e^x-e^{-x})/2` of each element in a tensor. This is an in-place version.
482	`skip_layer_norm`	NeuralNetwork	Stable	2.0	fused, Transformer	An optimized operation used in Transformer models to improve performance by combining residual connection (skip connection) addition and Layer Normalization (LayerNorm) into a single kernel.
483	`slice_backward`	NeuralNetwork	Stable	5.0	aten	An automatic differentiation (autograd) function that computes the gradient of a tensor slicing operation (`tensor[start:end]`) during backpropagation.
484	`slice_scatter`	Tensor	Stable	2.2	aten	Embeds the values of the `src` tensor into `input` at the given dimension. This function returns a tensor with fresh storage; it does not create a view.
485	`soft_margin_loss`	NeuralNetwork	Beta	5.1	nn.functional, KernelGen	Compute the soft margin loss.
486	`softmax`	NeuralNetwork	Stable	1.0	aten, nn.functional	Apply a softmax function.
487	`softmax_backward`	Reduction	Stable	3.0	aten, nn.functional	The backward version of `softmax()`.
488	`softmax_backward_out`	Reduction	Stable	3.0	aten, nn.functional	A variant of `softmax_backward()`.
489	`softmax_out`	NeuralNetwork	Stable	1.0	aten, nn.functional	Apply a softmax function, with given `out`.
490	`softplus`	NeuralNetwork	Stable	4.0	aten, nn.functional, pointwise	Applies element-wise, the function Softplus.
491	`softshrink`	NeuralNetwork	Beta	5.0	aten, nn.functional, Activation, KernelGen	Applies the soft shrinkage function element-wise to an input tensor. It is an activation function often used in signal processing and sparse representation, such as image denoising.
492	`softshrink_out`	NeuralNetwork	Beta	5.0	aten, nn.functional, Activation, KernelGen	This is a variant of `softshrink` that supports an output tensor.
493	`sort`	Tensor	Stable	2.2	aten, skip_precision_check	Sorts the elements of the `input` tensor along a given dimension in ascending order by value.
494	`sort_stable`	Tensor	Stable	3.0	aten, skip_precision_check	Sorts the elements of the `input` tensor along a given dimension in ascending order by value. This is a variant of `sort()` where `stable` is set to True to preserve the order of equivalent elements.
495	`sparse_attn_triton`	NeuralNetwork	Beta	5.1	fused	Sparse attention with attention-sink.
496	`sparse_mla_fwd_interface`	DSA	Beta	5.0	fused	A generic interface for sparse MLA (Multi-head Latent Attention) for DeepSeek v3/v3.2. It is currently not exposed as a standalone operator for use.
497	`special_i0e`	Math	Beta	5.0	aten, pointwise, KernelGen	Computes the exponentially scaled zeroth order modified Bessel function of the first kind for each element of `input`.
498	`special_i0e_out`	Math	Beta	5.0	aten, pointwise, KernelGen	A variant of `special_i0e()` with output saved to provided `out`..
499	`special_i1`	Math	Beta	5.0	aten, pointwise, KernelGen	Computes the modified Bessel function of the first kind of order 1 (I_1(x)) for each element in the input tensor, designed for special mathematical functions.
500	`special_i1_out`	Math	Beta	5.0	aten, pointwise, KernelGen	A variant of `special_i1` that allows the output to be assigned to another tensor.
501	`sqrt`	Math	Stable	4.0	aten, pointwise	Returns a new tensor with the square-root of the elements of `input`.
502	`sqrt_`	Math	Stable	4.0	aten, pointwise	This is the in-place version of `sqrt()`.
503	`square`	Math	Beta	5.1	aten, pointwise	Returns a new tensor with the square of the elements of input.
504	`square_`	Math	Beta	5.1	aten, pointwise	The inplace version of `square`.
505	`square_out`	Math	Beta	5.1	aten, pointwise	A variant of `square` that assigns the output to the provided `out`.
506	`stack`	Tensor	Stable	2.2	aten	Concatenates a sequence of tensors along a new dimension.
507	`std`	Reduction	Stable	4.0	aten	Calculates the standard deviation over the dimensions specified by `dim`. `dim` can be a single dimension, list of dimensions, or `None` to reduce over all dimensions.
508	`sub`	Math	Stable	1.0	aten, pointwise	Subtracts `other`, scaled by `alpha`, from the `input` tensor.
509	`sub_`	Math	Stable	2.2	aten, pointwise	Subtracts `other`, scaled by `alpha`, from the `input` tensor. This is the in-place version.
510	`sum`	LinearAlg	Stable	2.0	aten, Reduction	Returns the sum of all elements in the `input` tensor.
511	`sum_dim`	LinearAlg	Stable	2.0	aten, Reduction	Returns the sum of each row of the `input` tensor in the given dimension `dim`. `dim` is a list of dimensions, reduce over all of them.
512	`sum_dim_out`	LinearAlg	Stable	3.0	aten, Reduction	A variant of `sum_dim()` with the `out` argument.
513	`sum_out`	LinearAlg	Stable	3.0	aten, Reduction	A variant of `sum()` with the `out` argument.
514	`swiglu`	NeuralNetwork	Stable	5.0	fused, Transformer	Swish-Gated Linear Unit, a variant of GLU with the Swish activation function.
515	`t_copy`	Tensor	Beta	5.0	aten, KernelGen	Transpose a 2D tensor into a new tensor with contiguous memory layout.
516	`t_copy_out`	Tensor	Beta	5.0	aten, KernelGen	A variant of `t_copy()` that allows the output to be assigned to the `out` tensor.
517	`tan`	NeuralNetwork	Stable	4.1	aten, pointwise	Returns a new tensor with the tangent of the elements in the `input` tensor, where each value in this `input` tensor is in radians.
518	`tan_`		Stable	4.1	aten, pointwise	This is the in-place version of `tan()`.
519	`tanh`	Math	Stable	2.0	aten, pointwise	Returns a new tensor with the hyperbolic tangent of the elements of `input`.
520	`tanh_`	Math	Stable	2.2	aten, pointwise	This is the in-place version of `tanh()`.
521	`tanh_backward`	Math	Stable	3.0	aten, pointwise	This is the backward case for `tanh()`.
522	`threshold`	NeuralNetwork	Stable	3.0	aten, nn.functional, pointwise	Apply a threshold to each element of the input Tensor.
523	`threshold_backward`	NeuralNetwork	Stable	3.0	aten, nn.functional, pointwise	This is the backward version for `threshold`.
524	`tile`	Tensor	Stable	2.1	aten	Constructs a tensor by repeating the elements of `input`. The `dims` argument specifies the number of repetitions in each dimension.
525	`to_copy`	Tensor	Beta	4.1	aten, pointwise, skip_precision_check
526	`top_k_per_row_prefill`	NeuralNetwork	Beta	5.1	fused	Triton top-K per row for DeepSeek V4 sparse attention prefill phase. Replaces vLLM persistent_topk CUDA kernel with in-place masking + adaptive topk selection.
527	`top_k_per_row_decode`	NeuralNetwork	Beta	5.1	fused, vLLM, DeepSeekV4, KernelGen	Triton top-K per row for DeepSeek V4 decode-phase token selection. Radix-select based approach with three dispatch tiers for different vocab sizes.
528	`topk`	Tensor	Stable	2.1	aten, skip_precision_check	Returns the `k` largest elements of the given `input` tensor along a given dimension. If `dim` is not given, the last dimension of the `input` is chosen. If `largest` is False then the `k` smallest elements are returned.
529	`topk_softmax`	MoE	Stable	4.0	fused, vLLM	Selects the k most likely next-token candicates, sets all others to zero, and renormalize the prbabilities of these top candidates.
530	`topk_softplus_sqrt`	MoE	Beta	5.1	fused, KernelGen, vLLM	Fused softplus + sqrt + top-k selection and optional renormalization for MoE gating in models like DeepSeek-V3/V4.
531	`trace`	Reduction	Stable	4.0	aten	Returns the sum of the elements of the diagonal of the input 2-D matrix.
532	`tril`	BLAS	Beta	5.0	aten, KernelGen	Returns the lower triangular part of an input matrix (or a batch of matrices) and sets all other elements to zero.
533	`tril_`	BLAS	Beta	5.1	aten	The in-place version of `tril()`.
534	`tril_out`	BLAS	Beta	5.1	aten	A variant of `tril()` that explicitly assigns the output to the `out` parameter.
535	`triton_lighting_indexer_k_tiled_interface`	NeuralNetwork	Alpha	5.1	fused, DSA	Part of FP8 MQA framework. It is currently not exposed as an operator for use.
536	`triu`	BLAS	Stable	1.0	aten	Returns the upper triangular part of a matrix (2-D tensor) or batch of matrices `input`, the other elements of the result tensor `out` are set to 0.
537	`triu_`	NeuralNetwork	Stable	5.0	aten	The in-place version of `triu()`.
538	`trunc_divide`	Math	Stable	2.1	aten	The `div` function with rounding_mode set to `trunc`.
539	`trunc_divide_`	Math	Stable	2.1	aten	The in-place version of `trunc_divide`.
540	`unfold_backward`	NeuralNetwork	Stable	5.0	aten, nn.functional	An operator for calculating the gradient of the `unfold` operation during backpropagation. It takes the gradient of the unfolded output and accumulates it back into the original input shape, reversing sliding local block extraction and resolving overlaps.
541	`uniform_`	Distribution	Stable	2.1	aten, skip_precision_check	Fills `self` tensor with numbers sampled from the continuous uniform distribution.
542	`unique2`	Tensor	Stable	2.1	aten	Returns the unique elements of the input tensor. This is an internal PyTorch function.
543	`unique_consecutive`	Distribution	Beta	5.1	aten, KernelGen	Eliminates all but the first element from every consecutive group of equivalent elements.
544	`unpack_seq_triton`	NeuralNetwork	Beta	5.1	fused, vLLM, DeepSeekV4	Unpack a packed sequence tensor back to its original variable-length form.
545	`upsample_bicubic2d`	NeuralNetwork	Stable	5.0	aten, Reduction	A variant of `upsample()` that has `mode` set to `bicubic`.
546	`upsample_bicubic2d_aa`	NeuralNetwork	Stable	2.2	aten, Reduction	A variant of `upsample()` that has `mode` set to `bicubic`.
547	`upsample_bicubic2d_aa_backward`	NeuralNetwork	Stable	5.0	aten, Reduction	A backward case for `_upsample_bicubic2d_aa()`.
548	`upsample_linear1d`	NeuralNetwork	Alpha	5.0	aten	Upsamples the `input`, using linear mode. The input has to be 3 dimensional, and the `output_size` is an optional tuple of ints.
549	`upsample_nearest1d`	NeuralNetwork	Stable	5.0	aten	Upsamples the `input`, using nearest neighbours' pixel values. The input has to be 3 dimensional, and the `output_size` is an optional tuple of ints.
550	`upsample_nearest2d`	NeuralNetwork	Stable	2.2	aten	Upsamples the `input`, using nearest neighbours' pixel values. The input has to be 4 dimensional. The scales can be provided with `scales_h` and `scales_w`.
551	`upsample_nearest3d`	NeuralNetwork	Stable	5.0	aten	Performs 3D nearest-neighbor interpolation to increase the spatial size of volumetric data, such as 5D tensors. It scales up inputs by copying values from the nearest pixel/voxel, without calculating new values through linear interpolation.
552	`upsample_nearest_exact1d`	NeuralNetwork	Beta	5.0	aten, Reduction	Increases the length of a 1D tensor using nearest-neighbor interpolation, ensuring the output aligns with library-standard algorithms like PIL.
553	`var`	Tensor	Beta	5.1	aten, KernelGen	Calculates the variance over all dimensions.
554	`var_correction`	Tensor	Beta	5.1	aten	A variant of the `var()` operator, with an optional `correction` for specifying difference between the sample size and sample degrees of freedom.
555	`var_dim`	Tensor	Beta	5.1	aten	Calculates the variance over the dimensions specified by dim.
556	`var_mean`	LinearAlg	Stable	2.0	aten, Reduction	Calculates the variance and mean over the dimensions specified by `dim`. `dim` can be a single dimension, list of dimensions, or `None` to reduce over all dimensions.
557	`vdot`	BLAS	Stable	2.2	aten	Computes the dot product of two 1D vectors along a dimension.
558	`vector_norm`	LinearAlg NeuralNetwork	Stable	2.0	aten, Reduction	Computes a vector norm.
559	`vstack`	Tensor	Stable	2.2	aten	Stack tensors in sequence vertically (row wise).
560	`w8a8_block_fp8_matmul`	BLAS	Alpha	5.1	vLLM	Performs matrix multiplication with block-wise quantization.
561	`weight_norm`	NeuralNetwork	Stable	3.0	fused	Reparameterizes a module's weight tensor by decoupling its magnitude (g) from its direction (v). It is a hook that compute the actual weight before each forward pass.
562	`weight_norm_interface`	NeuralNetwork	Stable	2.2	aten, fused	Apply weight normalization to neural network layers, decoupling the magnitued of a weight tensor from its direction. It is used to stabilize training, particularly for models with small batch sizes.
563	`weight_norm_interface_backward`	NeuralNetwork	Stable	3.0	aten, fused	Computes the gradients for weight normalization during the backward pass. It calculates the necessary derivatives for updating both the magnitude (g) and direction (v) parameters of a weight-normalized layer, based on gradients received from the previous operation.
564	`where_self`	Tensor	Stable	2.1	aten, pointwise	Returns a LongTensor. This operation is identical to `torch.nonzero(condition, as_tuple=True)`.
565	`where_self_out`	Tensor	Stable	2.2	aten, pointwise	This is a variant of `where_self()` with an argument `out`.
566	`zero`	Tensor	Beta	5.0	aten, KernelGen	Fills tensor with zeros.
567	`zero_`	Tensor	Stable	5.0	aten	Fills `self` tensor with zeros.
568	`zero_out`	Tensor	Beta	5.0	aten, KernelGen	Fills tensor with zeros but assign the output to the `out` tensor.
569	`zeros`	Tensor	Stable	2.1	aten, skip_precision_check	Returns a tensor filled with the scalar value 0, with the shape defined by the variable argument `size`.
570	`zeros_like`	Tensor	Stable	2.1	aten	Returns a tensor filled with the scalar value 0, with the same size as `input`.
571	`_to_copy`	Tensor	Alpha	5.1	skip_precision_check	Layout/memory operation (to_copy).
572	`view`	Tensor	Alpha	5.1	skip_precision_check	Pure layout operation (view).
573	`reshape`	Tensor	Alpha	5.1	skip_precision_check	Pure layout operation (reshape).
574	`expand`	Tensor	Alpha	5.1	skip_precision_check	Pure layout operation (expand).
575	`permute`	Tensor	Alpha	5.1	skip_precision_check	Pure layout operation (permute).
576	`transpose`	Tensor	Alpha	5.1	skip_precision_check	Pure layout operation (transpose).
577	`clone`	Tensor	Alpha	5.1	skip_precision_check	Pure layout/memory operation (clone).
578	`to`	Tensor	Alpha	5.1	skip_precision_check	Device/dtype cast operation (to).
579	`empty`	Tensor	Alpha	5.1	skip_precision_check	Tensor factory operation (empty).
580	`normal_`	Tensor	Alpha	5.1	skip_precision_check	Random sampling operator (normal_).
581	`random_`	Tensor	Alpha	5.1	skip_precision_check	Random sampling operator (random_).
582	`argsort`	Tensor	Alpha	5.1	skip_precision_check, KernelGen	Sorting/selection operator (argsort).