-
Notifications
You must be signed in to change notification settings - Fork 111
Expand file tree
/
Copy pathGL_ARM_tensors.txt
More file actions
370 lines (279 loc) · 17 KB
/
GL_ARM_tensors.txt
File metadata and controls
370 lines (279 loc) · 17 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
Name
ARM_tensors
Name Strings
GL_ARM_tensors
GL_ARM_tensors_bfloat16
GL_ARM_tensors_float_e5m2
GL_ARM_tensors_float_e4m3
Contact
Kevin Petit (kevin.petit 'at' arm.com), Arm
Sven van Haastregt (sven.vanhaastregt 'at' arm.com), Arm
Contributors
Erik Hogeman, Arm
Jacob Bohlin, Arm
Jan-Harald Fredriksen, Arm
Kevin Petit, Arm
Neil Hickey, Arm
Sven van Haastregt, Arm
Status
Complete
Version
Last Modified Date: Feb 27, 2026
Revision: 2
Number
TBD
Dependencies
This extension can be applied to OpenGL GLSL versions 4.60
(#version 460) and higher.
This extension can be applied to OpenGL ES ESSL versions 3.20
(#version 320) and higher.
This extension interacts with the
GL_EXT_bfloat16,
GL_EXT_float_e5m2,
GL_EXT_float_e4m3,
GL_EXT_shader_explicit_arithmetic_types_int8,
GL_EXT_shader_explicit_arithmetic_types_int16,
GL_EXT_shader_explicit_arithmetic_types_int32,
GL_EXT_shader_explicit_arithmetic_types_int64,
GL_EXT_shader_explicit_arithmetic_types_float16,
GL_EXT_shader_explicit_arithmetic_types_float32, and
GL_EXT_shader_explicit_arithmetic_types_float64 extensions.
Overview
This extension adds support for tensor objects. Tensor objects are an opaque
abstraction for multidimensional arrays. Elements in a tensor are read or
written using accessor functions.
Mapping to SPIR-V
For informational purposes (non-specification), the following is an
expected way for an implementation to map GLSL constructs from this
extension to SPIR-V constructs:
tensorARM -> OpTypeTensorARM
tensorSizeARM -> OpTensorQuerySizeARM
tensorReadARM -> OpTensorReadARM
tensorWriteARM -> OpTensorWriteARM
Modifications to the OpenGL Shading Language Specification, Version 4.60
Including the following lines in a shader can be used to control the
language features described in this extension:
#extension GL_ARM_tensors : <behaviour>
#extension GL_ARM_tensors_bfloat16 : <behaviour>
#extension GL_ARM_tensors_float_e5m2 : <behaviour>
#extension GL_ARM_tensors_float_e4m3 : <behaviour>
Where <behaviour> is as specified in Section 3.3.
GL_EXT_bfloat16 and GL_ARM_tensors_bfloat16 must be enabled to use overloads
with bfloat16_t type.
GL_EXT_float_e5m2 and GL_ARM_tensors_float_e5m2 must be enabled to use
overloads with floate5m2_t type.
GL_EXT_float_e4m3 and GL_ARM_tensors_float_e4m3 must be enabled to use
overloads with floate4m3_t type.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_ARM_tensors 1
#define GL_ARM_tensors_bfloat16 1
#define GL_ARM_tensors_float_e5m2 1
#define GL_ARM_tensors_float_e4m3 1
Modify Section 3.6, Keywords:
(add to list of keywords)
tensor
Add a new Section 4.1.X, Tensor Types:
Tensor types are opaque types, declared and behaving as described in
Section 4.1.7 "Opaque Types". They can be further qualified with memory
qualifiers. When aggregated into arrays within a shader, tensor arrays
can only be indexed with a dynamically uniform integral expression.
Tensor types are parameterized by two type parameters: data type and rank.
The parameters are specified in order between angle brackets ('<' and '>')
and comma separated. The data type parameter must be a scalar basic type.
The rank parameter must be an integral constant expression.
There are no implicit conversions between tensor types.
Examples of tensor declarations:
// float16 tensor of rank 1 that can only be read.
layout(set=0, binding=0) uniform readonly tensorARM<float16_t, 1> t0;
// int8 tensor of rank 4 that can only be written.
layout(set=0, binding=1) uniform writeonly tensorARM<int8_t, 4> t1;
// int tensor of rank 1 that can be read and written.
layout(set=0, binding=2) uniform tensorARM<int, 1> t2;
// Function with a float32 tensor of rank 5 parameter that can be read
// and written.
func(tensorARM<float32_t, 5> tp) {}
Modify Section 4.1.7 Opaque Types:
Add tensor types to the list of types that take memory qualifiers.
Add to Section 4.10 Memory Qualifiers:
Add variables declared as tensor types to the types that can be qualified with
a memory access qualifier.
Add a new Section 7.3.X, Tensor Built-In Constants:
The following constants are used to construct a bitmask to be passed as the
tensorOperands argument of tensorReadARM and tensorWriteARM calls.
// Specifies that the elements accessed by this call are not likely to be
// accessed again in the near future.
const uint gl_TensorOperandsNonTemporalARM = 0x1U;
// Specifies that the following argument is a value returned when reading
// elements outside of the bounds of a tensor. The type of the following
// argument must match the tensor element type. Applies to tensorReadARM
// calls only.
const uint gl_TensorOperandsOutOfBoundsValueARM = 0x2U;
Add a new Section 8.X, Tensor Functions:
The following functions are used to read/write one or more elements from/to
a tensor.
void tensorReadARM(tensorARM t, uint coords[], out bool data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out int8_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out int16_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out int32_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out int64_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out uint8_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out uint16_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out uint32_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out uint64_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out float16_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out float32_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out float64_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out bfloat16_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out floate5m2_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], out floate4m3_t data, uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], bool data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], int8_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], int16_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], int32_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], int64_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], uint8_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], uint16_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], uint32_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], uint64_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], float16_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], float32_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], float64_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], bfloat16_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], floate5m2_t data[], uint tensorOperands = 0U, ...)
void tensorReadARM(tensorARM t, uint coords[], floate4m3_t data[], uint tensorOperands = 0U, ...)
Description: Read one, or more, elements from Tensor *t*. The number of
elements read is specified by the type of the *data* parameter.
If *data* is a scalar type, a single element is read.
If *data* is an array, the number of elements read equals the array size.
The type of element(s) read is given by the tensorARM type.
*coords* specifies the coordinates of the first element to read.
The first element of the *coords* array corresponds to the outermost
dimension of the tensor.
It is a compile-time error if the number of supplied coordinates does not
equal the rank of the tensor.
*tensorOperands* is a constant bitmask of zero or more gl_TensorOperands*
values.
Depending on the specified tensorOperands, additional arguments may have to
be specified.
The behavior when one or more tensor elements read are outside of the tensor
bounds and gl_TensorOperandsOutOfBoundsValueARM is not provided, is specified
by the client API.
It is a compile-time error to read from a tensor that is decorated with the
writeonly memory access qualifier.
void tensorWriteARM(tensorARM t, uint coords[], bool data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int8_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int16_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int32_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int64_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint8_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint16_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint32_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint64_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float16_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float32_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float64_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], bfloat16_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], floate5m2_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], floate4m3_t data, uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], bool data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int8_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int16_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int32_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], int64_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint8_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint16_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint32_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], uint64_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float16_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float32_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], float64_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], bfloat16_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], floate5m2_t data[], uint tensorOperands = 0U, ...)
void tensorWriteARM(tensorARM t, uint coords[], floate4m3_t data[], uint tensorOperands = 0U, ...)
Description: Write one or more elements to Tensor *t*. The number and type of
the elements written are given by the parameter *data*.
If *data* is a scalar type, a single element is written.
If *data* is an array, the number of elements written equals the array size.
*coords* specifies the coordinates of where to write the first element.
The first element of the *coords* array corresponds to the outermost
dimension of the tensor.
It is a compile-time error if the number of supplied coordinates does not
equal the rank of the tensor.
*tensorOperands* is a constant bitmask of zero or more gl_TensorOperands*
values.
Depending on the specified tensorOperands, additional arguments may have to
be specified.
The behavior when one or more tensor elements written to are outside of the
tensor bounds is specified by the client API.
It is a compile-time error to write to a tensor that is decorated with the
readonly memory access qualifier.
uint tensorSizeARM(tensorARM t, uint dim);
Description: Return the size of the dimension specified in *dim* of Tensor *t*.
Dimension 0 corresponds to the outermost dimension of the tensor.
*dim* must be less than the rank of tensor *t*.
Modify Section 9, Shading Language Grammar for Core Profile:
(Add to token list)
TENSOR
(modify type_specifier to add type_parameter_specifier_opt)
type_specifier:
type_specifier_nonarray type_parameter_specifier_opt
type_specifier_nonarray type_parameter_specifier_opt array_specifier
(new rules)
type_parameter_specifier_opt:
type_parameter_specifier
/*empty*/
type_parameter_specifier:
LEFT_ANGLE type_parameter_specifier_list RIGHT_ANGLE
type_parameter_specifier_element:
type_specifier
unary_expression
type_parameter_specifier_list:
type_parameter_specifier_element
type_parameter_specifier_list COMMA type_parameter_specifier_element
Interactions with GL_EXT_shader_explicit_arithmetic_types_float16
If GL_EXT_shader_explicit_arithmetic_types_float16 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use float16_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_float32
If GL_EXT_shader_explicit_arithmetic_types_float32 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use float32_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_float64
If GL_EXT_shader_explicit_arithmetic_types_float64 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use float64_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_int8
If GL_EXT_shader_explicit_arithmetic_types_int8 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use int8_t and
uint8_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_int16
If GL_EXT_shader_explicit_arithmetic_types_int16 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use int16_t and
uint16_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_int32
If GL_EXT_shader_explicit_arithmetic_types_int32 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use int32_t and
uint32_t.
Interactions with GL_EXT_shader_explicit_arithmetic_types_int64
If GL_EXT_shader_explicit_arithmetic_types_int64 is not supported,
remove the tensorReadARM/tensorWriteARM overloads that use int64_t and
uint64_t.
Interactions with GL_EXT_bfloat16
If GL_EXT_bfloat16 is not supported, remove the
tensorReadARM/tensorWriteARM overloads that use bfloat16_t and
the GL_ARM_tensors_bfloat16 extension.
Interactions with GL_EXT_float_e5m2
If GL_EXT_float_e5m2 is not supported, remove the
tensorReadARM/tensorWriteARM overloads that use floate5m2_t and
the GL_ARM_tensors_float_e5m2 extension.
Interactions with GL_EXT_float_e4m3
If GL_EXT_float_e4m3 is not supported, remove the
tensorReadARM/tensorWriteARM overloads that use floate4m3_t and
the GL_ARM_tensors_float_e4m3 extension.
Issues
1) Converting between vectors and arrays is cumbersome for shader writers.
Resolution:
This extension will not change or extend the language rules for conversions
between vector and array types.
Revision History
Rev. Date Author Changes
---- ---------- -------------------- -----------------------------------------
2 2026-02-27 Kevin Petit Add support for bfloat16_t, floate5m2_t, and floate4m3_t
1 2025-06-19 Sven van Haastregt Initial revision.