Operator List#

No.NameKindStageSinceLabelsDescription
1absMathStable1.0aten, pointwiseComputes the absolute value of each element in input. This is a simple wrapper of the existing torch abs operator.
2abs_MathStable2.2aten, pointwiseThe in-place version of abs(), which is a simple wrapper of the Torch abs operator.
3absoluteMathBeta5.0aaten, KernelGenThis is an alias for abs() with the low-level operations implemented by invoking low-level Torch operators.
4acosMathStable5.0aten, pointwiseReturns a new tensor with the arccosine (in radians) of each element in input.
5act_quant_tritonQuantizationBeta5.1fusedThis is a fused operator.
6adaptive_avg_pool3dNeuralNetworkAlpha5.0aten, nn.functional, KernelGenApply a 3D adaptive average pooling over an input signal composed of several input planes.
7adaptive_avg_pool3d_outNeuralNetworkAlpha5.0aten, nn.functional, KernelGenA variant of _adaptive_avg_pool3d that assigns the output to the out tensor.
8addMathStable1.0aten, pointwiseAdd a scalar or tensor to self tensor. If both alpha and other are specified, each element of other is scaled by alpha before being used.
9add_MathStable2.2aten, pointwiseThe in-place version of add().
10add_rms_normNeuralNetworkAlpha5.1aten, KernelGen, NormalizationAdd two inputs element-wise and apply Root Mean Square Layer Normalization.
11addcdivLinearAlgStable4.0aten, pointwisePerforms the element-wise division of tensor1 by tensor2, multiplies the result by the scalar value and adds it to input.
12addcdiv_outLinearAlgStable5.1aten, pointwise, KernelGenA variant of addcdiv() that assigns the output to the given out parameter..
13addcmulLinearAlgStable4.0aten, pointwisePerforms the element-wise multiplication of tensor1 by tensor2, multiplies the result by the scalar value and adds it to input.
14addcmul_outLinearAlgAlpha5.0aten, pointwise, KernelGenA variant of addcmul that allows the output to be assigned to out.
15addmmBLASStable1.0atenPerforms a matrix multiplication of the matrices mat1 and mat2. The matrix input is added to the final result.
16addmm_dtypeBLASBeta5.1atenA variant of addmm that allows the dtype of the output tensor to be specified. This is supported only on CUDA and for torch.float32 given torch.float16 or torch.bfloat16 input dtypes.
17addmm_dtype_outBLASBeta5.1atenA variant of addmm_dtype() that allows the output to be saved to the provided out parameter.
18addmm_outBLASStable4.0atenA variant of addmm that assigns to the output to the provided out parameter.
19addmvBLASStable4.0atenPerforms a matrix-vector product of the matrix mat and the vector vec. The vector input is added to the final result.
20addmv_outBLASStable4.0atenPerforms a matrix-vector product of the matrix mat and the vector vec. The vector input is added to the final result.
21addrBLASStable4.0atenPerforms the outer-product of vectors vec1 and vec2 and adds it to the matrix input.
22affine_grid_generatorTensorAlpha5.1aten, KernelGen, pointwiseGenerates a 2D or 3D flow field (sampling grid), given a batch of affine matrices theta.
23alias_copyTensorBeta5.0aten, KernelGenCreates a new tensor that shares the same storage data as the original tensor, but without preserving the original tensor's metadata (like shape or strides) in a way that links future mutations.
24alias_copy_outTensorBeta5.0aten, KernelGenA variant of alias_copy() that assigns the output to the out tensor.
25allMathStable2.0aten, ReductionTests if all elements in input evaluate to True.
26all_dimMathStable2.0aten, ReductionFor each row of input in the given dimension dim, returns True if all elements in the row evaluate to True and False otherwise.
27all_dimsMathStable2.0aten, ReductionA variant of all.
28allcloseMathStable2.1atenThis function checks if input and other satisfy a condition specified via atol and rtol elementwise, for all elements of input and other.
29amaxLinearAlgStable2.0aten, ReductionReturns the maximum value of each slice of the input tensor in the given dimension(s) dim.
30aminmaxTensorBeta5.1atenComputes the minimum and maximum values of the input tensor.
31angleMathStable3.0aten, pointwiseComputes the element-wise angle (in radians) of the given input tensor.
32anyMathStable2.0aten, ReductionTests if any element in input evaluates to True.
33any_dimMathStable2.0aten, ReductionFor each row of input in the given dimension dim, returns True if any element in the row evaluate to True and False otherwise.
34any_dimsMathStable2.0aten, ReductionFor each row of input in the given dimensions in dims, returns True if any element in the row evaluate to True and False otherwise. The dims contains tuple of ints indicating the dimensions to reduce.
35apply_repetition_penaltiesNeuralNetworkStable5.0fused, vLLMModifies logit tensors in place to penalize tokens that have already appeared in the generated sequence.
36apply_rotary_pos_embNeuralNetworkStable2.0fusedA method to incorporate positional information into the Transformer architecture. Rotary Positional Embedding (RoPE) applies position-dependent rotation to the query (Q) and key (K) vectors before computing the attention score.
37arangeTensorStable2.1atenReturns a 1-D tensor of size ceiling((end−start)/step) with values from the interval [start, end) taken with common difference step beginning from start.
38arange_starttensorStable2.1atenA variant of arange, with start and/or step specified.
39arange_start_steptensorStable2.1atenA variant of arange, with start and/or step specified.
40arcsinhMathBeta5.0aten, KernelGenPerforms an element-wise inverse hyperbolic sine computation on the given tensor.
41arcsinh_MathBeta5.0aten, KernelGenThe in-place version of arcsinh().
42arcsinh_outMathBeta5.0aten, KernelGenA variant of arcsinh that allows the output to be assigned to the out tensor.
43arctanh_MathBeta5.0aten, KernelGenComputes the element-wise inverse hyperbolic tangent of a given input tensor. This is an in-place version.
44argmaxLinearAlgStable2.0aten, ReductionReturns the indices of the maximum value of all elements in the input tensor.
45argminLinearAlgStable2.2aten, ReductionReturns the indices of the minimum value(s) of the flattened tensor or along a dimension.
46asinhMathBeta5.1aten, KernelGenReturns a new tensor with the inverse hyperbolic sine of the elements of input.
47asinh_MathBeta5.0aten, KernelGenComputes the inverse hyperbolic sine for each element of a tensor in-place.
48as_strided_copyTensorBeta5.1aten, KernelGenCreates a contiguous copy of an as_strided view of the input tensor.
49as_strided_copy_outTensorBeta5.1aten, KernelGenA variant of as_strided_copy() that assigns the output to the out tensor.
50assert_asyncTensorStable5.1utilityA utility used to perform data-dependent assertions on GPU tensors without triggering an immediate, performance-heavy GPU-to-CPU synchronization.
51atanMathStable4.0aten, pointwiseReturns a new tensor with the arctangent of the elements (in radians) in the input tensor.
52atan_MathStable4.0aten, pointwiseThe in-place version of atan().
53atan2MathStable5.1aten, pointwiseComputes the element-wise arc tangent of input/other(y/x), returning angles in radians between -PI and PI.
54atan2_outMathBeta5.1aten, pointwiseA variant of atan2 that allows the output to be saved into out.
55avg_pool2dNeuralNetworkStable4.1nn.functionalApplies 2D average-pooling operation in kH \mul kW regions by step size sH \mul sW steps. The number of output features is equal to the number of input planes. This is for the forward case.
56avg_pool2d_backwardNeuralNetworkStable4.1atenThe backward version of avg_pool2d().
57avg_pool3dNeuralNetworkBeta5.1atenApplies 3D average-pooling operation in kD \times kH \times kW regions by step size sD \times sH \times sW steps.
58avg_pool3d_backwardNeuralNetworkAlpha5.1atenThis is the backward version of avg_pool3d().
59baddbmmBLASStable4.1atenPerforms a batch matrix-matrix product of matrices in batch1 and batch2. input is added to the final result. batch1 and batch2 must be 3-D tensors each containing the same number of matrices.
60baddbmm.outBLASBeta5.1atenThis is a variant of baddbmm().
61batch_normNeuralNetworkStable3.0atenAn internal operator used for implementing the BatchNorm functionality.
62batch_norm_backwardNeuralNetworkStable3.0atenThe backward version of batch_norm().
63bernoulli_TensorBeta5.1aten, skip_precision_check, KernelGenDraws binary random numbers (0 or 1) from a Bernoulli distribution.
64bincountReductionStable5.0aten, pointwise, KernelGenCount the frequency of each value in an array of non-negative integers.
65bitwise_and_scalarMathStable2.0aten, pointwiseComputes the bitwise AND of input and other scalar.
66bitwise_and_scalar_MathStable2.2aten, pointwiseThe in-place, scalar version of bitwise_and().
67bitwise_and_scalar_tensorMathStable2.0aten, pointwiseA variant of bitwise_and().
68bitwise_and_tensorMathStable2.0aten, pointwiseThe Tensor method version of bitwise_and().
69bitwise_and_tensor_MathStable2.2aten, pointwiseThe in-place, Tensor method version of bitwise_and().
70bitwise_left_shiftMathStable4.0aten, pointwiseComputes the left arithmetic shift of input by other bits.
71bitwise_notMathStable2.0aten, pointwiseComputes the bitwise NOT of the given input tensor.
72bitwise_not_MathStable2.2aten, pointwiseThe in-place version of bitwise_not().
73bitwise_or_scalarMathStable2.0aten, pointwiseComputes the bitwise OR of scalars input and other.
74bitwise_or_scalar_MathStable2.2atenThe in-place version of bitwise_or_scalar.
75bitwise_or_scalar_tensorMathStable2.0aten, pointwiseComputes the bitwise OR of input and other.
76bitwise_or_tensorMathStable2.0aten, pointwiseComputes the bitwise OR of input and other, this is the Tensor method variant.
77bitwise_or_tensor_MathStable2.2aten, pointwiseThe in-place version of bitwise_or_tensor().
78bitwise_right_shiftMathStable4.0aten, pointwiseComputes the right arithmetic shift of input by other bits.
79bmmBLASStable1.0atenPerforms a batch matrix-matrix product of matrices stored in input and mat2.
80bmm_outBLASStable5.0atenPerforms a batch matrix-matrix product of matrices stored in input and mat2. This is a variant of bmm with out specified.
81bucket_sort_topkNeuralNetworkBeta5.1fused, DSAA wrapper of the TLE version and the Triton version bucket-sort topk operation.
82catTensorStable2.2atenConcatenates the given sequence of tensors in tensors in the given dimension.
83cat_outTensorStable2.2atenA variant of cat that assigns the result to the provided out parameter.
84cauchyDistributionBeta5.1atenDraws random numbers from a Cauchy distribution.
85cauchy_DistributionBeta5.1atenFills the tensor with numbers drawn from the Cauchy distribution.
86ceilMathStable5.0aten, pointwiseReturns a new tensor with the ceil of the elements of input, the smallest integer greater than or equal to each element.
87ceil_MathStable5.0aten, pointwiseThe in-place version of ceil().
88ceil_outMathStable5.0aten, pointwiseA variant of ceil() with out specified.
89celuNeuralNetworkStable4.0aten, nn.functional, pointwiseApplies the quantized CELU (Continuously Differentiable Exponential Linear Unit) activation function element-wise.
90celu_NeuralNetworkStable4.0aten, nn.functional, pointwiseThe in-place version of celu().
91chunk_gated_delta_rule_fwdAttentionAlpha5.0fused, FLAThe forward case for ChunkGatedDeltaRuleFunction with Flash Linear Attention (FLA).
92clampMathStable2.0aten, pointwiseClamps all elements in input into the range [min, max].
93clamp_MathStable2.2aten, pointwiseThe in-place version of clamp().
94clamp_maxMathAlpha5.1aten, KernelGen, pointwiseClamps all elements in input to be smaller or equal max.
95clamp_max_MathAlpha5.1aten, KernelGen, pointwiseThe in-place version of clamp_max().
96clamp_minMathStable4.0aten, pointwiseA variant of clamp() with min set to min.
97clamp_min_MathStable4.0aten, pointwiseThe in-place version of clamp_().
98clamp_tensorMathStable2.0aten, pointwiseThe tensor version of clamp().
99clamp_tensor_MathStable2.2aten, pointwiseThe in-place, tensor version of clamp().
100clipMathBeta5.1aten, KernelGenThis is identical to clamp().
101clip_MathBeta5.1aten, KernelGenThis is identical to clamp_().
102col2imMathAlpha5.1aten, KernelGenRearranges column blocks back into a multidimensional tensor (inverse of im2col).
103combine_topk_swa_indicesNeuralNetworkBeta5.1fused, Attention, vLLM, DeepSeekV4Combines compressed top-k sparse attention indices with sliding-window attention indices for DeepSeekV4 attention.
104concat_and_cache_mlaAttentionBeta3.0fused, MLAWrites the latent and RoPE value into KV cache for Multi-head Latent Attention forward case.
105compute_global_topk_indices_and_lensNeuralNetworkBeta5.1fused, Attention, vLLM, DeepSeekV4Converts local top-k sparse attention indices to global KV-cache indices and computes valid top-k lengths for DeepSeekV4 attention.
106concatenateTensorAlpha5.1aten, KernelGenAn alias of cat().
107conj_physicalLinearAlgBeta5.1atenComputes the element-wise conjugate of the given input tensor. If input has a non-complex dtype, this function just returns input.
108constant_pad_ndNeuralNetworkStable2.2aten, IRPads the input tensor boundaries with a constant value. This is an IR representation, not a public API.
109contiguousTensorRemoved4.1aten, skip_precision_checkReturns a contiguous in memory tensor containing the same data as self tensor.
110conv1dConvolutionStable4.2atenApplies a 1D convolution over a quantized 1D input composed of several input planes.
111conv1d_paddingConvolutionStable4.2atenApplies a 1D convolution over a quantized 1D input composed of several input planes.
112conv2dConvolutionStable4.2atenApplies a 2D convolution over a quantized 2D input composed of several input planes.
113conv2d_paddingConvolutionStable4.2atenApplies a 2D convolution over a quantized 2D input composed of several input planes.
114conv3dConvolutionStable4.2atenApplies a 3D convolution over a quantized 3D input composed of several input planes.
115conv3d_paddingConvolutionStable4.2atenApplies a 3D convolution over a quantized 3D input composed of several input planes.
116conv_depthwise2dNeuralNetworkBeta2.2aten, Convolution, NoCPUA depthwise convolution for the conv2d neural network function.
117conv_transpose1dConvolutionBeta5.1aten, KernelGenApplies a 1D transposed convolution operator over an input image composed of several input planes.
118conv_transpose2dConvolutionAlpha5.1aten, KernelGenApplies a 2D transposed convolution operator over an input image composed of several input planes.
119copyTensorBeta4.2aten, pointwiseAs a wrapper of copy_, this operator copies elements from src to out using given template for shapes.
120copy_TensorStable4.1aten, pointwise, skip_precision_checkCopies the elements from src into self tensor and returns self.
121copysignTensorBeta5.1aten, pointwiseCreate a new floating-point tensor with the magnitude of input and the sign of other, elementwise.
122copysign_outTensorBeta5.1aten, pointwiseA variant of copysign that allows the output to be saved into out.
123cosMathStable2.0aten, pointwiseReturns a new tensor with the cosine of the elements of input given in radians.
124cos_MathStable2.2aten, pointwiseThe in-place version of cos().
125coshMathStable5.1aten, pointwiseReturns a new tensor with the hyperbolic cosine of the elements of input.
126cosh_MathStable5.1aten, pointwiseThis is the in-place version of cosh().
127cosh_outMathStable5.1aten, pointwiseThis is an variant of cosh() that assigns the output to the provided out.
128count_nonzeroTensorStable2.2aten, ReductionCounts the number of non-zero values in the tensor input along the given dim. If no dim is specified then all non-zeros in the tensor are counted.
129cp_gather_indexer_k_quant_cacheQuantizationBeta5.1fused, vLLMThis is a fused operator that gathers FP8 K cache values and scales.
130cross_entropy_lossNeuralNetworkRemoved3.0fused, ReductionComputes the cross entropy loss between input logits and target.
131cudnn_convolutionNeuralNetworkBeta5.1aten, KernelGenA wrapper for cuDNN convolution backend.
132cummaxMathStable3.0aten, ReductionReturns a named tuple (values, indices) where values is the cumulative maximum of elements of input in the dimension dim. And indices is the index location of each maximum value found in the dimension dim.
133cumminMathStable2.2aten, ReductionReturns a named tuple (values, indices) where values is the cumulative minimum of elements of input in the dimension dim. And indices is the index location of each minimum value found in the dimension dim.
134cumprodMathBeta5.1aten, ReductionReturns the cumulative product of elements of input in the dimension dim.
135cumprod_MathBeta5.1aten, ReductionThis is the in-place version of cumprod().
136cumsumLinearAlgStable1.0aten
137cumsum_outReductionStable3.0aten
138cutlass_scaled_mmLinearAlgBeta5.0fused, vLLM
139dequantize_and_gather_k_cacheNeuralNetworkBeta5.1fused, Attention, vLLM, DeepSeekV4Dequantizes FP8 K-cache entries and gathers them into a BF16 tensor for DeepSeekV4 attention.
140dgegluNeuralNetworkStable5.0fused, TransformerGaussian Error Gated Linear Unit with GELU activation instead of sigmoid function. This is for the backward case.
141diagTensorStable2.2aten
  • If input is a vector (1-D tensor), then returns a 2-D square tensor with the elements of input as the diagonal.
  • If input is a matrix (2-D tensor), then returns a 1-D tensor with the diagonal elements of input.
142diag_embedTensorStable2.2aten, pointwiseCreates a tensor whose diagonals of certain 2D planes (specified by dim1 and dim2) are filled by input. To facilitate creating batched diagonal matrices, the 2D planes formed by the last two dimensions of the returned tensor are chosen by default.
143diagonal_backwardLinearAlgStable2.2aten, pointwiseA diagonal operation returns a partial view of input with the its diagonal elements with respect to dim1 and dim2 appended as a dimension at the end of the shape. This is the backward case for diagonal().
144diffMathBeta5.1aten, KernelGenComputes the n-th forward difference along the given dimension.
145digamma_MathBeta5.0aten, KernelGenComputes the in-place digamma function, which is the logarithmic derivative of the Gamma function.
146dispatch_fused_moe_kernelMoEBeta5.0fused, Activation, vLLMAccelerates neural network training by combining token routing (dispatch/all-to-all communication), expert computation (GEMM), and result aggregation into a single GPU kernel.
147div_scalarMathStable2.1atenThis is the scalar version of div().
148div_scalar_MathStable2.1atenThis is the in-place version of div_scalar().
149div_tensorMathStable2.1aten, pointwiseDivides each element of the input input by the corresponding element of other. Note that torch.divide() is an alias of torch.div() and torch.true_divide() is an alias of torch.div() with rounding_mode=None.
150div_tensor_MathStable2.1atenThis is the in-place version of div_tensor().
151div_outMathStable4.2atenThis is an variant of div() with an out argument.
152div_scalar_modeMathStable1.0aten, pointwiseDivides each element of the input by the corresponding element of other. An optional rounding_mode can be specified.
153div_scalar_mode_MathStable2.2aten, pointwiseThe in-place version of div_mode().
154div_tensor_modeMathStable1.0aten, pointwiseDivides each element of the input by the corresponding element of other. An optional rounding_mode can be specified.
155div_tensor_mode_MathStable2.2aten, pointwiseThe in-place version of div_mode().
156dotBLASStable3.0atenComputes the dot product of two 1D tensors.
157dregluNeuralNetworkAlpha4.2fused, TransformerRectified Gated Linear Unit is a variant of GLU that uses ReLU instead of the sigmoid function for gating. This is the backward case.
158dropoutNeuralNetworkStable1.0aten, nn.functionalAn internal IR for implementing torch.nn.functional.dropout.
159dropout_backwardNeuralNetworkStable3.0aten, nn.functionalThe backward case of dropout().
160dswigluNeuralNetworkAlpha5.0fused, TransformerSwish-Gated Linear Unit, a variant of GLU with the Swish activation function. This is for the backward case.
161dunder_ior_scalarMathBeta5.1aten, KernelGenThe scalar version of dunder_ior_tensor.
162dunder_ior_tensorMathBeta5.1aten, KernelGenThe in-place version of bitwise or operation for tensor and scalar.
163dunder_or_scalarMathBeta5.1aten, KernelGenThe scalar version of dunder_or_tensor.
164dunder_or_tensorMathBeta5.1aten, KernelGenThe in-place version of bitwise or operation for tensor and scalar.
165einsumReductionAlpha5.1aten, KernelGenSums the product of the elements of the input operands along dimensions specified using a notation based on the Einstein summation convention.
166eluNeuralNetworkStable2.2aten, nn.functional, pointwiseApply the Exponential Linear Unit (ELU) function element-wise.
167elu_NeuralNetworkStable4.0aten, pointwiseThe in-place version of elu().
168elu_backwardNeuralNetworkStable4.0aten, pointwiseThe backward version of elu().
169embeddingNeuralNetworkStable2.1aten, nn.functionalGenerate a simple lookup table that looks up embeddings in a fixed dictionary and size. Note that the parameter sequence differs from torch.nn.functional.embedding.
170embedding_backwardNeuralNetworkStable3.0aten, NoCPUThe backward version of embedding().
171embedding_dense_backwardNeuralNetworkStable5.0atenCalculates the gradient of the weight matrix for a dense embedding layer during backpropagation.
172eqMathStable2.0aten, pointwiseComputes element-wise equality.
173eq_scalarMathStable2.0aten, pointwiseComputes equality between scalars.
174equalMathStable5.0aten, ReductionReturns True if two tensors have the same size and elements, False otherwise.
175euclidean_distMathAlpha5.1aten, KernelGen, pointwiseComputes pairwise Euclidean distances between rows of two 2D tensors.
176erfScienceStable2.1atenComputes the error function of input.
177erf_ScienceStable2.2aten, pointwiseThe in-place version of erf().
178expMathStable1.0aten, pointwiseReturns a new tensor with the exponential of the elements of the input tensor input.
179exp_MathStable2.2aten, pointwiseThe in-place version of exp().
180exp_outMathStable4.1aten, pointwiseA variant of exp2(), with out specified.
181exp2MathStable4.0aten, pointwiseComputes the base two exponential function of input.
182exp2_MathStable4.0aten, pointwiseThe in-place version of exp2().
183expm1MathBeta5.1atenComputes the exponential of the elements minus 1 of input.
184expm1_MathBeta5.1atenThe inplace version of expm1.
185expm1_outMathBeta5.1atenA variant of expm1 that saves the output to the specified out.
186exponential_DistributionStable2.1aten, skip_precision_checkFills self tensor with elements drawn from a PDF (probability density function).
187eyeLinearAlgStable3.0aten, ReductionReturns a 2-D tensor with ones on the diagonal and zeros elsewhere.
188eye_mLinearAlgStable3.0aten, ReductionTriton-based implementation of torch.eye_m(n, m), using 2D tiles to split the matrix into blocks.
189feature_dropoutNeuralNetworkAlpha5.1aten, KernelGenApplies feature dropout to the input tensor. Randomly zeroes out entire channels of the input tensor with probability p. Each batch element has its own independent channel mask.
190feature_dropout_NeuralNetworkAlpha5.1aten, KernelGenThe in-place version of feature_dropout().
191fill_scalarTensorStable2.2aten, pointwiseFills a scalar with the specified value.
192fill_scalar_TensorStable2.2aten, pointwiseThe in-place version of fill_scalar().
193fill_scalar_outTensorStable5.0aten, pointwise, KernelGenA variant of fill_scalar() that assigns the output to an out tensor.
194fill_tensorTensorStable2.2aten, pointwiseFills a tensor with the specified value.
195fill_tensor_TensorStable2.2aten, pointwiseThe in-place version of fill_tensor().
196fill_tensor_outTensorStable5.0aten, pointwise, KernelGenA variant of fill_tensor() that assigns the output to an out tensor.
197flash_attention_forwardNeuralNetworkStable3.0aten, NoCPU
198flash_attn_varlen_funcNeuralNetworkStable3.1aten, Attention, FlashAttentionCompute attention for sequences of variable lengths within a single batch. Eliminating the need for padding.
199flash_attn_varlen_opt_funcNeuralNetworkBeta5.1aten, Attention, FlashAttentionA variant of flash_attn_varlen_func that has lse as an optional parameter.
200flash_mlaNeuralNetworkStable3.0fused, Attention, vLLMA variant of Multi-head Latent Attention (MLA).
201flash_mla_sparse_fwdNeuralNetworkAlpha5.1fused, Attention, vLLMPart of the FlashMLA.
202flipTensorStable2.1aten, pointwiseReverse the order of an n-D tensor along given axis in dims.
203floorMathBeta5.0aten, KernelGenPerforms an element-wise floor operation, rounding each element of a tensor down to the nearest integer less than or equal to itself.
204floor_outMathBeta5.0aten, KernelGenPerforms an element-wise floor operation with output tensor, rounding each element down to the nearest integer less than or equal to itself.
205floor_MathBeta5.0aten, KernelGenPerforms an in-place element-wise floor operation, rounding each element of a tensor down to the nearest integer less than or equal to itself.
206floor_divide_scalarMathStable2.1atenComputes input divided by other, elementwise, and floors the result.
207floor_divide_scalar_MathStable2.2atenComputes input divided by other, elementwise, and floors the result.
208floor_divide_tensorMathStable2.1atenComputes input divided by other, elementwise, and floors the result.
209floor_divide_tensor_MathStable2.2atenComputes input divided by other, elementwise, and floors the result.
210fminMathBeta5.0aten, KernelGenComputes the element-wise minimum of two tensors, specially handling NaN values by prioritizing the numerical value. Unlike minimum(), if one input is NaN and the other is a number, fmin() returns the number. It supports broadcasting, type promotion, and operates on both CPU and GPU.
211fmin_outMathBeta5.0aten, KernelGenA variant of fmin() that assigns the output to the out tensor.
212fmod_scalarMathAlpha5.1aten, KernelGenComputes the element-wise remainder of division of input by a scalar divisor.
213fmod_scalar_MathAlpha5.1aten, KernelGenIn-place version of fmod with a scalar divisor.
214fmod_tensorMathAlpha5.1aten, KernelGenComputes the element-wise remainder of division of input by a tensor divisor.
215fmod_tensor_MathAlpha5.1aten, KernelGenIn-place version of fmod with a tensor divisor.
216fp8_mqa_logitsNeuralNetworkBeta5.1fused, vLLMFor each token in the given E4M3 tensor, iterate all tokens from two other given tensors, calculate the logit.
217fullTensorStable2.1aten, pointwise, skip_precision_checkCreates a tensor of size size filled with fill_value. The tensor's dtype is inferred from fill_value.
218full_likeTensorStable2.1aten, pointwiseReturns a tensor with the same size as input filled with fill_value.
219functional_sym_constrain_range_for_sizeTensorBeta5.0aten, KernelGenA low-level function used in symbolic shape analysis to restrict the possible numerical range (min/max) of an unbacked symbolic integer.
220fused_add_rms_normNeuralNetworkStable2.0fused, Normalization
221fused_deepseek_v4_qnorm_rope_kv_rope_quant_insertNeuralNetworkBeta5.1fused, vLLM, DeepSeekV4Horizontally-fused DeepseekV4-MLA. per-head RMSNorm + GPT-J RoPE for Q, and GPT-J RoPE + UE8M0 FP8 quant + paged cache insert for KV, all in one kernel launch.
222fused_experts_implNeuralNetworkBeta5.1fused, vLLM, MoEAn implementation of fused MoE.
223fused_moeNeuralNetworkBeta5.1fused, vLLM, MoEThe generic interface for fused MoE.
224fused_q_kv_rmsnormNeuralNetworkBeta5.1fused, Attention, vLLM, DeepSeekV4Applies RMSNorm to Q and KV tensors in a single fused kernel for DeepSeekV4 attention.
225fused_recurrent_gated_delta_rule_fwdAttentionAlpha5.0fused, FLAThe forward case for fused_recurrent_gated_delta_rule used in Flash Linear Attention (FLA).
226gatherTensorStable2.2aten, ReductionGathers values along an axis specified by dim.
227gather_backwardTensorStable2.2aten, ReductionThe backward version of gather().
228gcdMathBeta5.1atenComputes the element-wise greatest common divisor (GCD) of input and other.
229gcd_outMathBeta5.1atenA variant of gcd() that allows the output to be assigned to the specified out.
230geMathStable2.0aten, pointwiseComputes input is greater or equal to other element-wise.
231ge_scalarMathStable2.0aten, pointwiseThe scalar version of ge().
232gegluNeuralNetworkAlpha4.2fused, Activation, TransformerGaussian Error Gated Linear Unit with GELU activation instead of sigmoid function.
233geluNeuralNetworkStable1.0aten, pointwise, Activation, nn.functionalApply Cumulative Distribution Function for Gaussian Distribution function element-wise.
234gelu_NeuralNetworkStable2.2aten, Activation, pointwiseThe in-place version of gelu().
235gelu_and_mulNeuralNetworkStable2.0fused, pointwise, ActivationAn activation function for GeGLU.
236gelu_backwardNeuralNetworkStable3.0aten, Activation, pointwiseThe backward version of gelu().
237get_paged_mqa_logits_metadataNeuralNetworkBeta5.1vLLMBuild scheduling metadata for paged MQA logits.
238get_scheduler_metadataAttentionStable4.0NoCPU, vLLMComputes scheduling metadata for attention work partitioning so that CPU computations can be routed to ISA-specific kernel implmentations. The metadata is stored in a tensor.
239gluNeuralNetworkStable3.0aten, Activation, pointwiseGated Linear Unit activation for modulating the output of a linear transformation with a gate.
240glu_backwardNeuralNetworkStable4.0aten, Activation, pointwiseThe backward version of glu().
241greaterMathStable5.1atenTest if input is greater than other elementwise.
242greater_outMathStable5.1atenA variant of greater that saves the output to the specified out.
243greater_scalarMathStable5.1atenA variant of greater for scalar variables.
244greater_scalar_outMathStable5.1atenA variant of greater_out that saves the output to the specified out.
245grid_sampleNeuralNetworkAlpha5.1aten, nn.functionalGiven an input and a flow-field grid, computes the output using input values and pixel locations from grid.
246group_normNeuralNetworkStable2.0aten, ReductionAn internal IR for applying Group Normalization for last certain number of dimensions.
247group_norm_backwardNeuralNetworkStable3.0aten, ReductionThe backward case for group_norm().
248grouped_mmBLASBeta5.1atenGrouped matrix multiply is a functional operator designed to accelerate Mixture-of-Experts (MoE) models by computing multiple matrix multiplications in a single kernel launch.
249grouped_topkMoEStable5.0fused, NoCPU, vLLMA specialized routing mechanism used in Mixture-of-Experts (MoE) models (like DeepSeek-V3/R1) to select top-k experts by first grouping them, rather than selecting globally.
250gtMathStable2.0aten, pointwiseComputes that input is greater than other element-wise.
251gt_scalarMathStable2.0aten, pointwiseThe scalar version of gt().
252hardsigmoidNeuralNetworkBeta5.0aten, pointwise, nn.functional, Activation, KernelGenAn activation function that provides a piecewise linear approximation of the standard sigmoid function, mapping inputs to a range between 0 and 1.
253hardsigmoid_outNeuralNetworkBeta5.0aten, pointwise, nn.functional, Activation, KernelGenA variant of hardsigmoid that supports an output tensor to receive the result.
254hardswish_NeuralNetworkBeta5.0aten, pointwise, KernelGen, ActivationApplies the Hard Swish activation function, commonly used in models like MobileNetV3 to improve accuracy while reducing computational cost compared to traditional Swish. This is an in-place version.
255hc_split_sinkhorn_forwardNeuralNetworkBeta5.1fusedComputes a differentiable approximation of the Wasserstein distance (Optimal Transport) between two probability distributions or point clouds.
256histcMathAlpha5.1aten, KernelGenComputes the histogram of a tensor, binning each element into equal-width bins.
257hstackTensorStable2.2atenStack tensors in sequence horizontally (column wise). This is equivalent to concatenation along the first axis for 1-D tensors, and along the second axis for all other tensors.
258hypotMathBeta5.0aten, KernelGenGiven the legs of a right triangle, return its hypotenuse. The shapes of both input tensors must be broadcastable.
259hypot_outMathBeta5.0aten, KernelGenGiven the legs of a right triangle, return its hypotenuse. The shapes of both input tensors must be broadcastable. This is a variant of hypot that allows the output to be a different tensor.
260i0MathBeta5.0aten, KernelGenComputes the modified Bessel function of the first kind of order zero element-wise for a given input tensor.
261i0_MathBeta5.0aten, KernelGenThe inplace version of i0.
262i0_outMathBeta5.0aten, KernelGenA variant of i0 that assigns the output to the out tensor.
263indexReductionStable4.2atenExtract, access or modify specific elements, slices, or subsets of data within a tensor. The location of data is specified for each dimension, starting from index 0.
264index_addTensorStable2.2atenAccumulate the elements of alpha times source into the input tensor by adding to the indices in the order given in index.
265index_add_TensorStable4.0atenThe in-place version of index_add().
266index_copyTensorBeta5.1aten, KernelGenCopies the elements from source into input at the positions specified by index along the given dim.
267index_copy_TensorBeta5.1aten, KernelGenThe in-place version of index_copy().
268index_putTensorStable2.2atenPuts values from the tensor values into the tensor input using the indices specified in indices (which is a tuple of Tensors).
269index_put_TensorStable3.0atenThe in-place version of index_put().
270index_put_implTensorBeta5.1atenAn internal C++ function that handles the heavy lifting for placing values into a tensor at specific indices.
271index_selectTensorStable2.1atenReturns a new tensor which indexes the input tensor along dimension dim using the entries in index.
272indexer_k_quant_and_cacheQuantizationBeta5.1fused, vLLMThis is a fused operator that quantizes K tensors and writes them into the FP8 KV cache.
273inplace_fused_expertsMoEBeta5.0fused, Activation, vLLMThis operator writes output directly to hidden_states.
274instance_normNeuralNetworkBeta5.1fusedApply Instance Normalization independently for each channel in every data sample within a batch.
275is_all_trueTensorBeta5.1aten, pointwiseThe low-level implementation for checking if all elements in a tensor are True.
276iscloseMathStable2.1aten, pointwiseReturns a new tensor with boolean elements representing if each element of input is "close" to the corresponding element of other. The closeness is defined with rtol and atol.
277isfiniteMathStable2.1aten, pointwiseReturns a new tensor with boolean elements representing if each element is finite or not.
278isinTensorStable2.2atenTests if each element of elements is in test_elements. Returns a boolean tensor of the same shape as elements that is True for elements in test_elements and False otherwise.
279isin_scalar_tensorTensorStable2.2atenA variant of isin().
280isin_tensor_scalarTensorStable2.2atenA variant of isin().
281isinfMathStable2.0aten, pointwiseTests if each element of input is infinite (positive or negative infinity) or not.
282isnanMathStable2.0aten, pointwiseReturns a new tensor with boolean elements representing if each element of input is NaN or not.
283isneginfMathStable5.1aten, KernelGen, pointwiseTests if each element of input is negative infinity or not.
284isneginf_outMathStable5.1aten, KernelGen, pointwiseA variant of isneginf that saves the output to the specified out.
285kronLinearAlgStable2.2atenComputes the Kronecker product of input and other.
286layer_normNeuralNetworkStable1.0atenAn internal IR for applying Layer Normalization for last certain number of dimensions.
287layer_norm_backwardReductionStable3.0atenThe backward case for layer_norm().
288leMathStable2.0aten, pointwiseComputes that input is less than or equal to other element-wise.
289le_scalarMathStable2.0atenThe scalar version of le().
290leaky_reluNeuralNetworkBeta5.1atenApplies the LeakyReLU function element-wise.
291leaky_relu_NeuralNetworkBeta5.1atenThe in-place version of leaky_relu().
292leaky_relu_outNeuralNetworkBeta5.1atenA variant of leaky_relu().
293lerp_scalarLinearAlgStable3.0aten, pointwiseThe scalar version of lerp().
294lerp_scalar_LinearAlgStable3.0aten, pointwiseThe in-place, scalar version of lerp().
295lerp_tensorLinearAlgStable3.0aten, pointwisePerforms a linear interpolation of two tensors start (given by input) and end based on a scalar or tensor weight and returns the resulting out tensor.
296lerp_tensor_LinearAlgStable3.0aten, pointwiseThe in-place version of lerp().
297lift_fresh_copyTensorBeta5.0aten, KernelGenCreates a new, independent copy of a tensor within a compiled graph.
298linspaceTensorStable2.2atenCreates a one-dimensional tensor of size steps whose values are evenly spaced from start to end, inclusive.
299logMathStable2.2aten, pointwiseReturns a new tensor with the natural logarithm of the elements of input.
300log10MathBeta5.1aten, pointwiseReturns a new tensor with the logarithm to the base 10 of the elements of input.
301log10_MathBeta5.1aten, pointwiseThe in-place version of log10().
302log10_outMathBeta5.1aten, pointwiseA variant of log10() that assigns the output to the provided out.
303log1pMathBeta5.0aten, KernelGenComputes the natural logarithm of 1+x(y_i=log_e(x_i+1)) for each element in the input tensor.
304log1p_MathBeta5.0aten, KernelGenComputes the natural logarithm of 1+x(y_i=log_e(x_i+1)) for each element in the input tensor in-place.
305log_sigmoidNeuralNetworkStable2.2aten, pointwise, nn.functionalApplies the Logsigmoid function element-wise.
306log_softmaxNeuralNetworkStable3.0aten, ReductionAn internal IR for applying a softmax followed by a logarithm.
307log_softmax_backward_dataNeuralNetworkAlpha5.0aten, KernelGenComputes the gradient of the input tensor with respect to a log_softmax operation during backpropagation.
308log_softmax_backward_data_outNeuralNetworkAlpha5.0aten, KernelGenA variant of _log_softmax_backward_data that assigns the output to the out tensor.
309log_softmax_outNeuralNetworkStable3.0aten, ReductionAn internal IR for applying a softmax followed by a logarithm.
310logaddexpMathBeta5.0aten, pointwise, KernelGenComputes the element-wise logarithm of the sum of the exponentials of two input tensors.
311logaddexp_outMathBeta5.0aten, pointwise, KernelGenA variant of logaddexp that allows the output to be assigned to an out tensor.
312logical_andMathStable2.2aten, pointwiseComputes the element-wise logical AND of the given input tensors. Zeros are treated as False and nonzeros are treated as True.
313logical_and_MathStable5.0aten, pointwiseThe in-place version of logical_and().
314logical_notMathStable2.2aten, pointwiseComputes the element-wise logical NOT of the given input tensor.
315logical_orMathStable2.2aten, pointwiseComputes the element-wise logical OR of the given input tensors.
316logical_or_MathStable5.0aten, pointwiseThe in-place version of logical_or().
317logical_xorMathStable2.2aten, pointwiseComputes the element-wise logical XOR of the given input tensors.
318logitLinearAlgBeta5.0aten, pointwise, KernelGenReturns a new tensor with the logit of the elements of input. input is clamped to [eps, 1-eps] when eps is not None. When eps is None and input<0 or input>1, the function will yield NaN.
319logit_LinearAlgBeta5.0aten, pointwise, KernelGenThe in-place version of logit().
320logit_outLinearAlgBeta5.0aten, pointwise, KernelGenA variant of logit that allows the output to be assigned to another tensor.
321logspacetensorStable4.0atenCreates a one-dimensional tensor of size steps whose values are evenly spaced from base^start to base^end, inclusive, on a logarithmic scale with base base.
322logsumexpMathAlpha5.1aten, KernelGenComputes the log of the sum of exponentials of elements in the input tensor along given dimensions.
323ltMathStable2.0aten, pointwiseComputes that input is less than other element-wise.
324lt_scalarMathStable2.0aten, pointwiseThe scalar version of lt.
325margin_ranking_lossNeuralNetworkBeta5.1aten, nn.functionalCompute the margin ranking loss.
326masked_fillTensorStable2.2aten, pointwiseFills elements of given tensor with value where mask is True.
327masked_fill_TensorStable2.2aten, pointwise, skip_precision_checkThe in-place version of masked_fill().
328masked_fill_scalarTensorStable2.2aten, pointwiseFills elements of given tensor with value where mask is True.
329masked_fill_scalar_TensorStable2.2aten, pointwise, skip_precision_checkThe in-place version of masked_fill().
330masked_scattertensorStable4.2atenCopies elements from source into the given tensor at positions where the mask is True.
331masked_scatter_tensorStable4.2atenThe in-place version of masked_scatter().
332masked_selectTensorStable2.1atenReturns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor.
333maxLinearAlgStable2.0aten, ReductionReturns the maximum value of all elements in the input tensor.
334max_dimLinearAlgStable2.0aten, ReductionReturns a namedtuple (values, indices) where values is the maximum value of each row of the input tensor in the given dimension dim. And indices is the index location of each maximum value found (argmax).
335max_pool2d_backwardIRStable4.0atenApplies a 2D max pooling over an input signal composed of several input planes. This is an IR representation rather than a public API and it is for the backward step.
336max_pool2d_with_indicesIRStable4.0atenApplies a 2D max pooling over an input signal composed of several input planes. This is an IR representation rather than a public API.
337max_pool3d_backwardNeuralNetworkBeta5.1aten, nn.functionalThe backward version of max_pool2d_with_indices().
338max_pool3d_with_indicesNeuralNetworkBeta5.1aten, nn.functionalApplies a 3D max pooling over an input signal composed of several input planes.
339maximumMathStable2.1aten, pointwiseComputes the element-wise maximum of input and other.
340meanLinearAlgStable1.0aten, ReductionReturns the mean value of all elements in the input tensor. Input must be floating point or complex.
341mean_dimReductionStable2.0atenReturns the mean value of each row of the input tensor in the given dimension dim. If dim is a list of dimensions, reduce over all of them.
342hc_head_fused_kernelNeuralNetworkBeta5.2fused, vLLM, DSAThe head fusion kernel for MHC (Manifold-Constrained Hyper-Connections). This fused implementation computes RMS-normalized hidden states and applies per-head weighted mixing to produce the output activations.
343mhc_bwdNeuralNetworkBeta5.1fused, vLLM, DSAThe backward case for MHC (Manifold-Constrained Hyper-Connections). This is the Triton implmentation for Sinkhorn implicit CG differentiation. It computes the gradient of the Sinkhorn normalization using implicit differentiation via the conjugate gradient method.
344mhc_postNeuralNetworkBeta5.1fused, vLLM, DSATriton implementation of mHC Post operator (optimized v3).
345mhc_preNeuralNetworkBeta5.1fused, vLLM, DSATriton implementation of mHC Pre operator (optimized v2).
346minTensorStable2.0aten, ReductionReturns the minimum value of all elements in the input tensor.
347min_dimLinearAlgStable2.0aten, ReductionReturns a namedtuple (values, indices) where values is the minimum value of each row of the input tensor in the given dimension dim. And indices is the index location of each minimum value found (argmin).
348minimumMathStable2.1aten, pointwiseComputes the element-wise minimum of input and other.
349mmBLASStable1.0atenPerforms a matrix multiplication of the two input matrices.
350mm_outBLASStable3.0atenA variant of mm() with out specified.
351moe_align_block_size_tritonMoEStable4.2fused, Reduction, vLLMAligns the token distribution across experts to be compatible with block size for matrix multiplication.
352moe_sumMoEStable4.2fused, Reduction, vLLMAn implementation of Mixture of Experts (MoE) with sum-based aggregation instead of the more common weighted average.
353mse_lossNeuralNetworkStable2.2aten, pointwise, nn.functionalCompute the element-wise mean squared error, with optional weighting.
354reflection_pad1d_backwardMathAlpha5.1aten, KernelGenComputes the gradient of reflection_pad1d with respect to the input tensor.
355smooth_l1_lossNeuralNetworkAlpha5.1aten, pointwise, nn.functionalCompute the smooth L1 loss between input and target tensors.
356smooth_l1_loss_backwardNeuralNetworkAlpha5.1aten, pointwise, nn.functionalCompute the gradient of smooth L1 loss with respect to the input tensor.
357mulMathStable1.0aten, pointwiseMultiplies input by other.
358mul_MathStable2.2aten, pointwiseThe in-place version of mul().
359multinomialDistributionStable2.1aten, skip_precision_checkReturns a tensor where each row contains num_samples indices sampled from the multinomial probability distribution located in the corresponding row of tensor input.
360mvBLASStable2.0atenPerforms a matrix-vector product of the matrix input and the vector vec.
361nan_to_numMathStable3.0aten, pointwiseReplaces NaN, positive infinity, and negative infinity values in input with the values specified by nan, posinf, and neginf, respectively.
362neMathStable2.0aten, pointwiseComputes that input is not equal to other element-wise.
363ne_scalarMathStable2.0aten, pointwiseThe scalar version of ne().
364negMathStable2.0aten, pointwiseReturns a new tensor with the negative of the elements of input.
365neg_MathStable2.2aten, pointwiseThe in-place version of neg().
366new_fullTensorBeta5.1aten, pointwiseReturns a Tensor of size size filled with fill_value. By default, the returned Tensor has the same torch.dtype and torch.device as this tensor.
367nll_loss_backwardNeuralNetworkStable2.2aten, IRCompute the negative log likelihood loss. This is the backward case.
368nll_loss_forwardNeuralNetworkStable2.2aten, IRCompute the negative log likelihood loss. This is the forward case.
369nll_loss2d_backwardNeuralNetworkStable2.2aten, IRAn internal IR for supporting torch.nn.NLLLoss2d, which has been deprecated and is now integrated into the standard torch.nn.NLLLoss. This is the backward case.
370nll_loss2d_forwardNeuralNetworkStable2.2aten, IRAn internal IR for supporting torch.nn.NLLLoss2d, which has been deprecated and is now integrated into the standard torch.nn.NLLLoss. This is the forward case.
371nll_loss_nd_backwardNeuralNetworkStable5.0atenMeasures the performance of a classification model by penalizing low probabilities for correct classe.s This computes the gradients of this loss with respect to model parameters using automatic differentiation.
372nll_loss_nd_forwardNeuralNetworkStable5.0atenMeasures the performance of a classification model by calculating the negative log probability of the true class. This defines the computation flow, transforming input data through layers to produce output predictions.
373nonzeroTensorStable2.1atenReturns a 2-D tensor where each row is the index for a nonzero value. When as_tuple is explicitly set to True, this returns a tuple of 1-D index tensors, allowing for advanced indexing of all nonzero values.
374nonzero_numpyTensorAlpha5.1aten, KernelGenReturns a tuple of 1-D tensors, one for each dimension, containing the indices of the nonzero elements in the input tensor (NumPy-style).
375normal_float_float_DistributionStable5.0aten, pointwise, skip_precision_checkReturns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a float mean and a float std.
376normal_float_tensorDistributionStable2.1aten, pointwiseReturns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a float mean and a tensor std.
377normal_tensor_floatDistributionStable2.1aten, pointwiseReturns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a tensor mean and a float std.
378normal_tensor_tensorDistributionStable2.1aten, pointwiseReturns a tensor of random numbers drawn from separate normal distributions whose mean and standard deviation are given. This is one of the variants that takes a tensor mean and a tensor std.
379normed_cumsumReductionStable2.1atenGet the normalized cumulative sum where each step is divided by the total sum of the dataset, resulting in values ranging from 0 to 1. Internally used by the multinomial operator.
380one_hotNeuralNetworkStable5.0aten, nn.functional, KernelGenTakes LongTensor with index values of shape (*) and returns a tensor of shape (*, num_classes) that have zeros everywhere except where the index of last dimension matches the corresponding value of the input tensor, in which case it will be 1.
381onesTensorStable2.1aten, skip_precision_checkReturns a tensor filled with the scalar value 1, with the shape defined by the variable argument size.
382ones_likeTensorStable2.1atenReturns a tensor filled with the scalar value 1, with the same size as input.
383outerBLASStable2.0fusedComputes outer product of self and the input vector. If the self tensor is a vector of size n and the input tensor is a vector of size m, the out tensor (if specified) must be a matrix of size n * m.
384outplace_fused_expertsMoEBeta5.0fused, Activation, vLLMThis operator allocates and returns a new output tensor.
385pack_seq_tritonNeuralNetworkBeta5.1fused, vLLM, DeepSeekV4Pack variable-length token sequences into a padded batched tensor.
386padNeuralNetworkStable2.1aten, pointwise, nn.functionalThis pads a tenor using the specified mode.
387per_token_group_quant_fp8QuantizationAlpha4.2NoCPU, vLLMFunction to perform per-token-group quantization on an input tensor x. It converts the tensor values into signed float8 values and returns the quantized tensor along with the scaling factor used for quantization.
388pixel_shuffleNeuralNetworkBeta5.0aten, nn.functionalRearranges elements in a tensor to a new tensor of different shape.
389pixel_unshuffleNeuralNetworkBeta5.0aten, KernelGenRearranges elements from a low-resolution feature map with many channels into a higher-resolution feature map with fewer channels.
390pixel_unshuffle_outNeuralNetworkBeta5.0aten, KernelGenA variant of pixel_unshuffle that assigns the output to the out tensor.
391poissonMathBeta5.1aten, KernelGenReturns a tensor of the same size as input with each element sampled from a Poisson distribution with rate given by the corresponding element in input.
392polarMathStable3.0aten, pointwiseConstructs a complex tensor whose elements are Cartesian coordinates corresponding to the polar coordinates with absolute value abs and angle angle.
393pow_scalarMathStable1.0atenTakes the power of each element in input with exponent and returns a tensor with the result. The input is a single float, while the exponent is a tensor.
394pow_tensor_scalarMathStable1.0aten, pointwiseTakes the power of each element in input with exponent and returns a tensor with the result. The input is a tensor, while the exponent is a float.
395pow_tensor_scalar_MathStable2.2aten, pointwiseThis is the in-place version of pow_tensor_scalar().
396pow_tensor_tensorMathStable1.0aten, pointwiseTakes the power of each element in input with exponent and returns a tensor with the result. The input is a tensor, while the exponent is also a tensor.
397pow_tensor_tensor_MathStable2.2aten, pointwiseThis is the in-place version of pow_tensor_tensor().
398preluNeuralNetworkBeta5.0aten, Activation, pointwise, nn.functional, KernelGenAn activation function used in neural networks that improves upon ReLU (Rectified Linear Unit) by allowing the network to learn the slope of negative inputs. It performs an element-wise operation that keeps positive values and scales negative values by a learnable parameter.
399prodLinearAlgStable2.0aten, ReductionReturns the product of all elements in the input tensor.
400prod_dim_intReductionStable2.0atenReturns the product of each row of the input tensor in the given dimension dim.
401quantileTensorStable2.2atenComputes the q-th quantiles of each row of the input tensor along the dimension dim.
402rad2degMathAlpha5.1aten, KernelGen, pointwiseConverts each element from angles in radians to degrees.
403rad2deg_MathAlpha5.1aten, KernelGen, pointwiseIn-place version of rad2deg.
404randDistributionStable2.1atenReturns a tensor filled with random numbers from a uniform distribution on the interval [0,1).
405rand_likeDistributionStable2.1atenReturns a tensor with the same size as input that is filled with random numbers from a uniform distribution on the interval [0,1).
406randnDistributionStable2.1atenReturns a tensor filled with random numbers from a normal distribution with mean 0 and variance 1 (also called the standard normal distribution).
407randn_likeDistributionStable2.1atenReturns a tensor with the same size as input that is filled with random numbers from a normal distribution with mean 0 and variance 1.
408randintDistributionAlpha5.1aten, KernelGenReturns a tensor filled with random integers generated uniformly between low (inclusive) and high (exclusive).
409randpermDistributionStable2.2aten, skip_precision_checkReturns a random permutation of integers from 0 to n - 1.
410reciprocalMathStable1.0aten, pointwiseReturns a new tensor with the reciprocal of the elements of input.
411reciprocal_MathStable2.2aten, pointwiseThis is the in-place version of reciprocal().
412reflection_pad1dNeuralNetworkBeta5.0aten, pointwise, KernelGenPads the input 3D or 2D tensor (typically representing signals or sequences) by reflecting the boundary values at the edges.
413reflection_pad1d_outNeuralNetworkBeta5.0aten, pointwise, KernelGenA variant of reflection_pad1d that assigns the output to out tensor.
414reflection_pad2dNeuralNetworkBeta5.0aten, pointwise, KernelGenPads the input 3D or 2D tensor (typically representing signals or sequences) by reflecting the boundary values at the both edges.
415reflection_pad2d_outNeuralNetworkBeta5.0aten, pointwise, KernelGenA variant of reflection_pad2d that assigns the output to out tensor.
416regluNeuralNetworkAlpha4.2fused, TransformerRectified Gated Linear Unit is a variant of GLU that uses ReLU instead of the sigmoid function for gating.
417reluNeuralNetworkStable1.0aten, Activation, pointwise, nn.functionalApply the RELU (Rectified Linear Unit) activation function element-wise.
418relu_NeuralNetworkStable2.2aten, pointwise, ActivationThis is the in-place version of relu().
419relu6NeuralNetworkBeta5.0aten, pointwise, Activation, KernelGenApplies the element-wise function f(x)=min(max(0,x),6). This is a variation of the standard ReLU activation function that "caps" its output at a maximum value of 6.
420remainder_scalarMathStable2.2atenComputes Python's modulus operation entrywise. The result has the same sign as the divisor other and its absolute value is less than that of other.
421remainder_scalar_MathStable2.2atenThis is the in-place version of remainder().
422remainder_scalar_tensorMathStable2.2atenComputes Python's modulus operation entrywise. The result has the same sign as the divisor other and its absolute value is less than that of other.
423remainder_tensorMathStable2.2atenComputes Python's modulus operation entrywise. The result has the same sign as the divisor other and its absolute value is less than that of other.
424remainder_tensor_MathStable2.2atenThis is the in-place version of remainder().
425repeatTensorStable2.1atenRepeats this tensor along the specified dimensions.
426repeat_interleave_self_intTensorStable2.2aten, pointwiseRepeats elements of a tensor. The number of repetitions is specified as an integer repeats.
427repeat_interleave_self_tensorTensorStable2.2aten, pointwiseRepeats elements of a tensor. The number of repetitions is specified as a tensor repeats. repeats is broadcasted to fit the shape of the given axis.
428repeat_interleave_tensorTensorStable2.2aten, pointwiseRepeats 0 repeats[0] times, 1 repeats[1] times, 2 repeats[2] times, etc.
429replication_pad1dTensorBeta5.0aten, KernelGenPads the edge of a 1D input tensor by repeating the boundary values.
430replication_pad1d_outTensorBeta5.0aten, KernelGenA variant of replication_pad1d that assigns the output to the out tensor.
431replication_pad3dNeuralNetworkAlpha5.0atenPads the edge of a 3D input tensor by repeating the boundary values.
432reshape_and_cacheAttentionStable3.0fused, vLLMStore the key/value token states into the pre-allcated kv_cache buffers of paged attention.
433reshape_and_cache_flashAttentionStable3.0fusedStore the key/value token states into the pre-allcated kv_cache buffers of paged attention.
434resolve_conjScienceStable2.1atenReturns a new tensor with materialized conjugation if input's conjugate bit is set to True, else returns input. The output tensor will always have its conjugate bit set to False.
435resolve_negScienceStable2.1atenReturns a new tensor with materialized negation if input's negative bit is set to True, else returns input. The output tensor will always have its negative bit set to False.
436rms_normNeuralNetworkStable2.0aten, nn.functional, ReductionApply Root Mean Square Layer Normalization over a mini-batch of inputs.
437rollBLASBeta5.1aten, KernelGenRoll the tensor input along the given dimension(s). Elements that are shifted beyond the last position are re-introduced at the first position.
438roundMathBeta5.1aten, pointwiseRounds elements of input to the nearest integer.
439round_MathBeta5.1aten, pointwiseThe inplace version of round.
440round_outMathBeta5.1aten, pointwiseA variant of round that assigns the output to the specifiec out.
441rrelu_with_noise_backwardNeuralNetworkBeta5.0aten, KernelGenComputes the gradient of the Randomized Leaky ReLU (RReLU) activation function with respect to its input during backpropagation. It uses the noise tensor generated in the forward pass to correctly apply the slope to negative input values.
442rsqrtMathStable1.0aten, pointwiseReturns a new tensor with the reciprocal of the square-root of each of the elements of input.
443rsqrt_MathStable2.2aten, pointwiseThe in-place version of rsqrt().
444rsub_scalarMathAlpha5.1aten, KernelGenSubstracts other, scaled by alpha, from input. This is the scalar version.
445rsub_tensorMathAlpha5.1aten, KernelGenSubstracts other, scaled by alpha, from input. This is the tensor version.
446rwkv_ka_fusionRWKVStable4.1fusedMerges, aligns, and enhances features from different data sources or spatial directions using the efficient, linear-time RWKV framework.
447rwkv_mm_sparsityRWKVStable4.1fusedOptimized, lossless sparse matrix multiplication in RWKV-7 models.
448safe_softmaxNeuralNetworkAlpha5.1aten, IR, KernelGenApply a softmax function. Note this version may not be functional.
449scaled_dot_product_attentionNeuralNetworkStable2.2nn.functional, AttentionComputes scaled dot product attention on query, key and value tensors, using an optional attention mask if passed and applying dropout if a probability greater than 0.0 is specified. The optional scale argument can only be specified as a keyword argument.
450scaled_dot_product_attention_backwardNeuralNetworkStable2.2nn.functional, AttentionThe backward case for scaled_dot_product_attention.
451scaled_dot_product_attention_forwardNeuralNetworkStable2.2nn.functional, AttentionThe forward case for scaled_dot_product_attention.
452scaled_mmBLASBeta5.1atenPerforms a scaled matrix multiplication. The result of self @ mat2 is multiplied by scale_a and scale_b, then an optional bias is added.
453scaled_mm_outBLASBeta5.1atenA variant of _scaled_mm that writes the result into out.
454scaled_softmax_backwardReductionStable4.2atenThe backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA) within Transformer models, computes the gradient of the loss with respect to the input logits, incorporating a scaling factor to stabilize training.
455scaled_softmax_forwardReductionStable4.2atenThe backward pass for a scaled softmax function, commonly used in Scaled Dot-Product Attention (SDPA) within Transformer models, computes the gradient of the loss with respect to the input logits, incorporating a scaling factor to stabilize training.
456scatter_add_TensorStable4.2atenAdds all values from the tensor src into self at the indices specified in the index tensor in a similar fashion as scatter_(). For each value in src, it is added to an index in self which is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim.
457scatter_reduceTensorStable2.2atenWrites all values from the tensor src into provided tensor at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim. The optional reduce argument allows specification of an optional reduction operation, which is applied to all values in the tensor src into the tensor at the indices specified in the index.
458scatter_reduce_TensorStable3.0aten, KernelGenThis is the in-place version of scatter_reduce().
459scatter_reduce_two_ReductionAlpha5.1aten, KernelGenA specific low-level ATen operator primarily encountered during model compilation or when using advanced backends like TensorRT or MPS.
460scatter_srcTensorStable2.2atenWrites all values from the tensor src into provided tensor at the indices specified in the index tensor. For each value in src, its output index is specified by its index in src for dimension != dim and by the corresponding value in index for dimension = dim. The optional reduce argument allows specification of an optional reduction operation, which is applied to all values in the tensor src into the tensor at the indices specified in the index.
461scatter_src_TensorStable3.0atenThis is the in-place version of scatter_src().
462select_backwardNeuralNetworkBeta5.1atenCalculate the gradient during the backward pass in the neural network.
463select_scatterTensorStable2.2atenEmbeds the values of the src tensor into input at the given index. This function returns a tensor with fresh storage; it does not create a view.
464seluNeuralNetworkBeta5.0aten, pointwise, nn.functional, Activation, KernelGenApplies an element-wise activation function that induces self-normalizing properties in neural networks. It scales the Exponential Linear Unit (ELU) to ensure activations remain close to zero mean and unit variance.
465selu_NeuralNetworkBeta5.0aten, pointwise, Activation, KernelGenThis is the in-place version of selu.
466sgn_MathBeta5.0aten, KernelGenComputes the sign of each element in the self tensor, element-wise. This function is an extension of sign() designed to handle complex tensors in addition to real-valued ones.
467sigmoidNeuralNetworkStable2.0aten, pointwiseComputes the expit (also known as the logistic sigmoid function) of the elements of input.
468sigmoid_NeuralNetworkStable2.2aten, pointwiseThe in-place version of sigmoid().
469sigmoid_backwardNeuralNetworkStable3.0aten, pointwiseThe backward version of sigmoid().
470signbitTensorBeta5.1aten, pointwiseTests if each element of input has its sign bit set or not.
471signbit_outTensorBeta5.1aten, pointwiseA variant of signbit that assigns the output to out.
472siluNeuralNetworkStable1.0aten, pointwise, nn.functionalSiLU (Sigmoid Linear Unit), a simple approximation of ReLU but without any discontinuity of the first derivative.
473silu_NeuralNetworkStable2.2aten, nn.functional, pointwiseThe in-place version of silu().
474silu_and_mulActivationStable2.0fused, pointwise, vLLMA custom operator in vLLM as activation function for SwiGLU.
475silu_and_mul_outActivationStable2.0fused, pointwise, vLLMA variant of silu_and_mul with an extra out argument.
476silu_and_mul_with_clampActivationStable5.1fused, pointwise, vLLMA custom operator in vLLM as activation function for SwiGLU.
477silu_and_mul_with_clamp_outActivationStable5.1fused, pointwise, vLLMA variant of silu_and_mul_with_clamp with an extra out argument.
478silu_backwardNeuralNetworkStable3.0aten, pointwiseA variant of silu() for backward case.
479sinMathStable2.0aten, pointwiseReturns a new tensor with the sine of the elements in the input tensor, where each value in this input tensor is in radians.
480sin_MathStable2.2aten, pointwiseThe in-place version of sin().
481sinh_MathBeta5.0aten, KernelGenComputes the hyperbolic sine (e^x-e^{-x})/2 of each element in a tensor. This is an in-place version.
482skip_layer_normNeuralNetworkStable2.0fused, TransformerAn optimized operation used in Transformer models to improve performance by combining residual connection (skip connection) addition and Layer Normalization (LayerNorm) into a single kernel.
483slice_backwardNeuralNetworkStable5.0atenAn automatic differentiation (autograd) function that computes the gradient of a tensor slicing operation (tensor[start:end]) during backpropagation.
484slice_scatterTensorStable2.2atenEmbeds the values of the src tensor into input at the given dimension. This function returns a tensor with fresh storage; it does not create a view.
485soft_margin_lossNeuralNetworkBeta5.1nn.functional, KernelGenCompute the soft margin loss.
486softmaxNeuralNetworkStable1.0aten, nn.functionalApply a softmax function.
487softmax_backwardReductionStable3.0aten, nn.functionalThe backward version of softmax().
488softmax_backward_outReductionStable3.0aten, nn.functionalA variant of softmax_backward().
489softmax_outNeuralNetworkStable1.0aten, nn.functionalApply a softmax function, with given out.
490softplusNeuralNetworkStable4.0aten, nn.functional, pointwiseApplies element-wise, the function Softplus.
491softshrinkNeuralNetworkBeta5.0aten, nn.functional, Activation, KernelGenApplies the soft shrinkage function element-wise to an input tensor. It is an activation function often used in signal processing and sparse representation, such as image denoising.
492softshrink_outNeuralNetworkBeta5.0aten, nn.functional, Activation, KernelGenThis is a variant of softshrink that supports an output tensor.
493sortTensorStable2.2aten, skip_precision_checkSorts the elements of the input tensor along a given dimension in ascending order by value.
494sort_stableTensorStable3.0aten, skip_precision_checkSorts the elements of the input tensor along a given dimension in ascending order by value. This is a variant of sort() where stable is set to True to preserve the order of equivalent elements.
495sparse_attn_tritonNeuralNetworkBeta5.1fusedSparse attention with attention-sink.
496sparse_mla_fwd_interfaceDSABeta5.0fusedA generic interface for sparse MLA (Multi-head Latent Attention) for DeepSeek v3/v3.2. It is currently not exposed as a standalone operator for use.
497special_i0eMathBeta5.0aten, pointwise, KernelGenComputes the exponentially scaled zeroth order modified Bessel function of the first kind for each element of input.
498special_i0e_outMathBeta5.0aten, pointwise, KernelGenA variant of special_i0e() with output saved to provided out..
499special_i1MathBeta5.0aten, pointwise, KernelGenComputes the modified Bessel function of the first kind of order 1 (I_1(x)) for each element in the input tensor, designed for special mathematical functions.
500special_i1_outMathBeta5.0aten, pointwise, KernelGenA variant of special_i1 that allows the output to be assigned to another tensor.
501sqrtMathStable4.0aten, pointwiseReturns a new tensor with the square-root of the elements of input.
502sqrt_MathStable4.0aten, pointwiseThis is the in-place version of sqrt().
503squareMathBeta5.1aten, pointwiseReturns a new tensor with the square of the elements of input.
504square_MathBeta5.1aten, pointwiseThe inplace version of square.
505square_outMathBeta5.1aten, pointwiseA variant of square that assigns the output to the provided out.
506stackTensorStable2.2atenConcatenates a sequence of tensors along a new dimension.
507stdReductionStable4.0atenCalculates the standard deviation over the dimensions specified by dim. dim can be a single dimension, list of dimensions, or None to reduce over all dimensions.
508subMathStable1.0aten, pointwiseSubtracts other, scaled by alpha, from the input tensor.
509sub_MathStable2.2aten, pointwiseSubtracts other, scaled by alpha, from the input tensor. This is the in-place version.
510sumLinearAlgStable2.0aten, ReductionReturns the sum of all elements in the input tensor.
511sum_dimLinearAlgStable2.0aten, ReductionReturns the sum of each row of the input tensor in the given dimension dim. dim is a list of dimensions, reduce over all of them.
512sum_dim_outLinearAlgStable3.0aten, ReductionA variant of sum_dim() with the out argument.
513sum_outLinearAlgStable3.0aten, ReductionA variant of sum() with the out argument.
514swigluNeuralNetworkStable5.0fused, TransformerSwish-Gated Linear Unit, a variant of GLU with the Swish activation function.
515t_copyTensorBeta5.0aten, KernelGenTranspose a 2D tensor into a new tensor with contiguous memory layout.
516t_copy_outTensorBeta5.0aten, KernelGenA variant of t_copy() that allows the output to be assigned to the out tensor.
517tanNeuralNetworkStable4.1aten, pointwiseReturns a new tensor with the tangent of the elements in the input tensor, where each value in this input tensor is in radians.
518tan_Stable4.1aten, pointwiseThis is the in-place version of tan().
519tanhMathStable2.0aten, pointwiseReturns a new tensor with the hyperbolic tangent of the elements of input.
520tanh_MathStable2.2aten, pointwiseThis is the in-place version of tanh().
521tanh_backwardMathStable3.0aten, pointwiseThis is the backward case for tanh().
522thresholdNeuralNetworkStable3.0aten, nn.functional, pointwiseApply a threshold to each element of the input Tensor.
523threshold_backwardNeuralNetworkStable3.0aten, nn.functional, pointwiseThis is the backward version for threshold.
524tileTensorStable2.1atenConstructs a tensor by repeating the elements of input. The dims argument specifies the number of repetitions in each dimension.
525to_copyTensorBeta4.1aten, pointwise, skip_precision_check
526top_k_per_row_prefillNeuralNetworkBeta5.1fusedTriton top-K per row for DeepSeek V4 sparse attention prefill phase. Replaces vLLM persistent_topk CUDA kernel with in-place masking + adaptive topk selection.
527top_k_per_row_decodeNeuralNetworkBeta5.1fused, vLLM, DeepSeekV4, KernelGenTriton top-K per row for DeepSeek V4 decode-phase token selection. Radix-select based approach with three dispatch tiers for different vocab sizes.
528topkTensorStable2.1aten, skip_precision_checkReturns the k largest elements of the given input tensor along a given dimension. If dim is not given, the last dimension of the input is chosen. If largest is False then the k smallest elements are returned.
529topk_softmaxMoEStable4.0fused, vLLMSelects the k most likely next-token candicates, sets all others to zero, and renormalize the prbabilities of these top candidates.
530topk_softplus_sqrtMoEBeta5.1fused, KernelGen, vLLMFused softplus + sqrt + top-k selection and optional renormalization for MoE gating in models like DeepSeek-V3/V4.
531traceReductionStable4.0atenReturns the sum of the elements of the diagonal of the input 2-D matrix.
532trilBLASBeta5.0aten, KernelGenReturns the lower triangular part of an input matrix (or a batch of matrices) and sets all other elements to zero.
533tril_BLASBeta5.1atenThe in-place version of tril().
534tril_outBLASBeta5.1atenA variant of tril() that explicitly assigns the output to the out parameter.
535triton_lighting_indexer_k_tiled_interfaceNeuralNetworkAlpha5.1fused, DSAPart of FP8 MQA framework. It is currently not exposed as an operator for use.
536triuBLASStable1.0atenReturns the upper triangular part of a matrix (2-D tensor) or batch of matrices input, the other elements of the result tensor out are set to 0.
537triu_NeuralNetworkStable5.0atenThe in-place version of triu().
538trunc_divideMathStable2.1atenThe div function with rounding_mode set to trunc.
539trunc_divide_MathStable2.1atenThe in-place version of trunc_divide.
540unfold_backwardNeuralNetworkStable5.0aten, nn.functionalAn operator for calculating the gradient of the unfold operation during backpropagation. It takes the gradient of the unfolded output and accumulates it back into the original input shape, reversing sliding local block extraction and resolving overlaps.
541uniform_DistributionStable2.1aten, skip_precision_checkFills self tensor with numbers sampled from the continuous uniform distribution.
542unique2TensorStable2.1atenReturns the unique elements of the input tensor. This is an internal PyTorch function.
543unique_consecutiveDistributionBeta5.1aten, KernelGenEliminates all but the first element from every consecutive group of equivalent elements.
544unpack_seq_tritonNeuralNetworkBeta5.1fused, vLLM, DeepSeekV4Unpack a packed sequence tensor back to its original variable-length form.
545upsample_bicubic2dNeuralNetworkStable5.0aten, ReductionA variant of upsample() that has mode set to bicubic.
546upsample_bicubic2d_aaNeuralNetworkStable2.2aten, ReductionA variant of upsample() that has mode set to bicubic.
547upsample_bicubic2d_aa_backwardNeuralNetworkStable5.0aten, ReductionA backward case for _upsample_bicubic2d_aa().
548upsample_linear1dNeuralNetworkAlpha5.0atenUpsamples the input, using linear mode. The input has to be 3 dimensional, and the output_size is an optional tuple of ints.
549upsample_nearest1dNeuralNetworkStable5.0atenUpsamples the input, using nearest neighbours' pixel values. The input has to be 3 dimensional, and the output_size is an optional tuple of ints.
550upsample_nearest2dNeuralNetworkStable2.2atenUpsamples the input, using nearest neighbours' pixel values. The input has to be 4 dimensional. The scales can be provided with scales_h and scales_w.
551upsample_nearest3dNeuralNetworkStable5.0atenPerforms 3D nearest-neighbor interpolation to increase the spatial size of volumetric data, such as 5D tensors. It scales up inputs by copying values from the nearest pixel/voxel, without calculating new values through linear interpolation.
552upsample_nearest_exact1dNeuralNetworkBeta5.0aten, ReductionIncreases the length of a 1D tensor using nearest-neighbor interpolation, ensuring the output aligns with library-standard algorithms like PIL.
553varTensorBeta5.1aten, KernelGenCalculates the variance over all dimensions.
554var_correctionTensorBeta5.1atenA variant of the var() operator, with an optional correction for specifying difference between the sample size and sample degrees of freedom.
555var_dimTensorBeta5.1atenCalculates the variance over the dimensions specified by dim.
556var_meanLinearAlgStable2.0aten, ReductionCalculates the variance and mean over the dimensions specified by dim. dim can be a single dimension, list of dimensions, or None to reduce over all dimensions.
557vdotBLASStable2.2atenComputes the dot product of two 1D vectors along a dimension.
558vector_normLinearAlg NeuralNetworkStable2.0aten, ReductionComputes a vector norm.
559vstackTensorStable2.2atenStack tensors in sequence vertically (row wise).
560w8a8_block_fp8_matmulBLASAlpha5.1vLLMPerforms matrix multiplication with block-wise quantization.
561weight_normNeuralNetworkStable3.0fusedReparameterizes a module's weight tensor by decoupling its magnitude (g) from its direction (v). It is a hook that compute the actual weight before each forward pass.
562weight_norm_interfaceNeuralNetworkStable2.2aten, fusedApply weight normalization to neural network layers, decoupling the magnitued of a weight tensor from its direction. It is used to stabilize training, particularly for models with small batch sizes.
563weight_norm_interface_backwardNeuralNetworkStable3.0aten, fusedComputes the gradients for weight normalization during the backward pass. It calculates the necessary derivatives for updating both the magnitude (g) and direction (v) parameters of a weight-normalized layer, based on gradients received from the previous operation.
564where_selfTensorStable2.1aten, pointwiseReturns a LongTensor. This operation is identical to torch.nonzero(condition, as_tuple=True).
565where_self_outTensorStable2.2aten, pointwiseThis is a variant of where_self() with an argument out.
566zeroTensorBeta5.0aten, KernelGenFills tensor with zeros.
567zero_TensorStable5.0atenFills self tensor with zeros.
568zero_outTensorBeta5.0aten, KernelGenFills tensor with zeros but assign the output to the out tensor.
569zerosTensorStable2.1aten, skip_precision_checkReturns a tensor filled with the scalar value 0, with the shape defined by the variable argument size.
570zeros_likeTensorStable2.1atenReturns a tensor filled with the scalar value 0, with the same size as input.
571_to_copyTensorAlpha5.1skip_precision_checkLayout/memory operation (to_copy).
572viewTensorAlpha5.1skip_precision_checkPure layout operation (view).
573reshapeTensorAlpha5.1skip_precision_checkPure layout operation (reshape).
574expandTensorAlpha5.1skip_precision_checkPure layout operation (expand).
575permuteTensorAlpha5.1skip_precision_checkPure layout operation (permute).
576transposeTensorAlpha5.1skip_precision_checkPure layout operation (transpose).
577cloneTensorAlpha5.1skip_precision_checkPure layout/memory operation (clone).
578toTensorAlpha5.1skip_precision_checkDevice/dtype cast operation (to).
579emptyTensorAlpha5.1skip_precision_checkTensor factory operation (empty).
580normal_TensorAlpha5.1skip_precision_checkRandom sampling operator (normal_).
581random_TensorAlpha5.1skip_precision_checkRandom sampling operator (random_).
582argsortTensorAlpha5.1skip_precision_check, KernelGenSorting/selection operator (argsort).