-
Notifications
You must be signed in to change notification settings - Fork 111
Expand file tree
/
Copy pathGLSL_EXT_mesh_shader.txt
More file actions
1074 lines (822 loc) · 50.1 KB
/
GLSL_EXT_mesh_shader.txt
File metadata and controls
1074 lines (822 loc) · 50.1 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Name
EXT_mesh_shader
Name String
GL_EXT_mesh_shader
Contact
Christoph Kubisch, NVIDIA (ckubisch 'at' nvidia.com)
Pat Brown, NVIDIA (pbrown 'at' nvidia.com)
Contributors
Yury Uralsky, NVIDIA
Daniel Koch, NVIDIA
Sahil Parmar, NVIDIA
Patrick Mours, NVIDIA
Slawomir Grajewski, Intel
Timur Kristóf, Valve
Pankaj Mistry, NVIDIA
Graeme Leese, Broadcom
Caio Oliveira, Intel
Mariusz Merecki, Intel
Ricardo Garcia, Igalia
Timur Kristof, Valve
Status
Complete
Version
Last Modified Date: August 31, 2022
Revision: 7
Dependencies
This extension can be applied to OpenGL GLSL versions 4.50
(#version 450) and higher.
This extension is written against the OpenGL Shading Language
Specification, version 4.50.6, dated April 14, 2016.
This extension interacts with GLSL 4.60 and GL_KHR_vulkan_glsl.
This extension interacts with GL_ARB_shader_draw_parameters.
This extension interacts with GL_EXT_clip_cull_distance.
This extension interacts with GL_KHR_shader_subgroup.
This extension interacts with GL_EXT_multiview.
This extension interacts with GL_EXT_fragment_shading_rate.
Overview
This extension provides a new mechanism allowing applications to use two
new programmable shader types -- the task and mesh shader -- to generate
collections of geometric primitives to be processed by fixed-function
primitive assembly and rasterization logic. When the task and mesh
shaders are dispatched, they replace the standard programmable vertex
processing pipeline, including vertex array attribute fetching, vertex
shader processing, tessellation, and geometry shader processing.
Both new shader types have execution environments similar to that of
compute shaders, where a collection of shader invocations form a work
group and cooperate to produce a set of outputs. Unlike traditional
vertex, tessellation, and geometry shaders that typically process a vertex
or primitive at a time, the mesh and task shaders process and generate a
batch of primitives at once. The optional task shader pre-processes
geometry and generates a variable number of mesh shader tasks. The mesh
shader evaluates the geometry corresponding to its task and emits a mesh
-- a collection of vertices arranged into point, line, or triangle
primitives. The primitives emitted by the mesh shader are then processed
by fixed-function primitive assembly and rasterization logic and generate
fragments that will be processed by the fragment shader.
Work is submitted to the mesh pipeline by launching the work from the API
which spawns a three-dimensional array of tasks, similar to the dispatch
API for compute that spawns a three-dimensional array of compute
shader workgroups. If a task shader is present, each task generated by
this launch spawns a task shader workgroup. If no task shader is
present, each task generated by the launch spawns a mesh shader
workgroup.
When a task shader workgroup is executed, its invocations execute in
parallel and evaluate geometry associated with the task. The task shader
has no built-in or user-defined input variables other than the built-ins
identifying the workgroup and invocation being executed. The task shader
can use that information to read properties of the geometry associated
with the task from memory, using shader storage buffers, textures, or
other resources. The task shader determines the number of mesh shader
tasks that should be spawned for the task it is processing and emits the
task count via the EmitMeshTasksEXT function. Additionally, the
task shader can compute and write additional properties of the geometry it
processes to a user-defined variable qualified with "taskPayloadSharedEXT",
which can be read from a user-defined variable of the same type qualified
with "taskPayloadSharedEXT" by all of the mesh shaders that it spawns.
The task shader can be used to drive level-of-detail
calculations for procedurally generated geometry, to perform coarse-level
culling for batches of static or dynamic geometry, and for other forms of
work reduction or amplification.
When a mesh shader workgroup is executed, its invocations execute in
parallel to evaluate geometry corresponding to its task and emit a mesh
for further processing by subsequent pipeline stages. As with task
shaders, mesh shaders have no built-in inputs other than those identifying
the workgroup and invocation being executed, and must fetch their inputs
explicitly from memory. The mesh shader invocations collectively must
produce a mesh, which consists of:
* a primitive and vertex count, set by calling the SetMeshOutputsEXT
function;
* a collection of vertex attributes, where each vertex in the mesh has a
set of built-in and user-defined per-vertex output variables and blocks;
* a collection of primitive attributes, where each primitive in the mesh
has a set of built-in and user-defined per-primitive output variables
and blocks; and
* an array of vertex index values written to one of the appropriate
built-in output arrays (gl_PrimitivePointIndicesEXT,
gl_PrimitiveLineIndicesEXT or gl_PrimitiveTriangleIndicesEXT), where each
output element contains one, two, or three indices that
identify the output vertices in the mesh used to form the primitive.
The number of primitives and vertices emitted by the mesh shader can be
variable, but the mesh shader must specify maximum vertex and primitive
counts. There are implementation-dependent limits on the number of
vertices and primitives emitted by the mesh shader, and are also
implementation-dependent limits on the total amount of memory consumed by
a mesh. In the initial implementation of this extension, implementation
limits are sufficiently low that complex geometry will need to be
decomposed into multiple tasks.
A typical mesh shader used to render static triangle data might operate in
three phases. The first phase fetches vertex position data and local
index data of the primitives that the mesh represents. The index data
would have been prepared offline to leverage vertex re-use within the
mesh. In the second phase, triangles would be culled and output primitive
indices written. Finally, other vertex attributes of the surviving subset
of vertices would be loaded and computed. During this process, the
invocations would sometimes work on a per-vertex and sometimes on a
per-primitive level.
Additionally, mesh shaders include infrastructure to allow a single mesh
shader workgroup to compute a mesh with multiple "views" (e.g., left and
right eye views for stereoscopic rendering), using a "view index" from
VK_KHR_multiview (Vulkan) extension.
Conventional From Application
Vertex |
Pipeline v
Launch Mesh Tasks
(Fig 3.1) |
| +---+-----+
| | |
| | |
| | Task Shader ---+
| | | |
| | v |
| | Task Generation | Image Load/Store
| | | | Atomic Counter
| +---+-----+ |<--> Shader Storage
| | | Texture Fetch
| v | Uniform Block
| Mesh Shader ----------+
| | |
+-------------> + |
| |
v |
Rasterization |
| |
v |
Fragment Shader ------+
|
v
Per-Fragment Operations
|
v
Framebuffer
Mesh Processing Pipeline
Mapping to SPIR-V
-----------------
For informational purposes (non-normative), the following is an
expected way for an implementation to map GLSL constructs to SPIR-V
constructs:
task shader -> TaskEXT Execution model
mesh shader -> MeshEXT Execution model
shared qualifier -> Workgroup storage class (existing)
taskPayloadSharedEXT storage qualifier -> TaskPayloadWorkgroupEXT storage class
points layout qualifier -> OutputPoints Execution Mode (existing)
lines layout qualifier -> OutputLinesEXT Execution Mode
triangles layout qualifier -> OutputTrianglesEXT Execution Mode
max_vertices layout qualifier -> OutputVertices Execution Mode (existing)
max_primitives layout qualifier -> OutputPrimitivesEXT Execution Mode
local_size_(xyz) layout qualifiers -> LocalSize Execution Mode (existing)
local_size_(xyz)_id layout qualifiers -> LocalSizeId Execution Mode (existing)
perprimitiveEXT auxiliary storage qualifier -> PerPrimitiveEXT Decoration
gl_NumWorkGroups -> NumWorkgroups decorated OpVariable (existing)
gl_WorkGroupSize -> WorkgroupSize decorated OpVariable (existing)
gl_WorkGroupID -> WorkgroupId decorated OpVariable (existing)
gl_LocalInvocationID -> LocalInvocationId decorated OpVariable (existing)
gl_GlobalInvocationID -> GlobalInvocationId decorated OpVariable (existing)
gl_LocalInvocationIndex -> LocalInvocationIndex decorated OpVariable (existing)
gl_PrimitivePointIndicesEXT -> PrimitivePointIndicesEXT decorated OpVariable
gl_PrimitiveLineIndicesEXT -> PrimitiveLineIndicesEXT decorated OpVariable
gl_PrimitiveTriangleIndicesEXT -> PrimitiveTriangleIndicesEXT decorated OpVariable
gl_Position -> Position decorated OpVariable (existing)
gl_PointSize -> PointSize decorated OpVariable (existing)
gl_ClipDistance -> ClipDistance decorated OpVariable (existing)
gl_CullDistance -> CullDistance decorated OpVariable (existing)
gl_PrimitiveID -> PrimitiveId decorated OpVariable (existing)
gl_Layer -> Layer decorated OpVariable (existing)
gl_ViewportIndex -> ViewportIndex decorated OpVariable (existing)
gl_CullPrimitiveEXT -> CullPrimitiveEXT decorated OpVariable
gl_DrawID -> DrawIndex decorated OpVariable (existing 1.3, extension)
gl_MeshPerVertexEXT -> block name, not needed
gl_MeshPerPrimitiveEXT -> block name, not needed
EmitMeshTasksEXT -> OpEmitMeshTasksEXT()
SetMeshOutputsEXT -> OpSetMeshOutputsEXT()
Modifications to the OpenGL Shading Language Specification, Version 4.50.6
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_EXT_mesh_shader : <behavior>
where <behavior> is as specified in section 3.3.
A new preprocessor #define is added to the OpenGL Shading Language:
#define GL_EXT_mesh_shader 1
Modify the introduction to Chapter 2, Overview of OpenGL Shading (p. 7)
(modify first paragraph) ... Currently, these processors are the vertex,
tessellation control, tessellation evaluation, geometry, fragment,
compute, task, and mesh processors.
(modify second paragraph) ... The specific languages will be referred to
by the name of the processor they target: vertex, tessellation control,
tessellation evaluation, geometry, fragment, compute, task, or mesh.
Insert new sections at the end of Chapter 2 (p. 9)
Section 2.7, Task Processor
The task processor is a programmable unit that operates in conjunction
with the mesh processor to produce a collection of primitives that will be
processed by subsequent stages of the graphics pipeline. The task and
mesh processors form a primitive processing pipeline that can be used
instead of the conventional primitive processing pipeline that includes
the vertex, tessellation control, tessellation evaluation, and geometry
processors. Compilation units written in the OpenGL Shading Language to
run on this processor are called task shaders. When a set of task shaders
is successfully compiled and linked, they result in a task shader
executable that runs on the task processor.
A task shader has access to many of the same resources as fragment and
other shader processors, including textures, buffers, image variables, and
atomic counters. The task shader has no fixed-function inputs other than
variables identifying the specific workgroup and invocation; any vertex
attributes or other data required by the task shader must be fetched from
memory. The only fixed output of the task shader is a task count,
identifying the number of mesh shader workgroups to spawn. The task
shader can write additional outputs to task memory, which can be read by
all of the mesh shader workgroups it spawns.
A task shader operates on a group of work items called a workgroup. A
workgroup is a collection of shader invocations that execute the same
code, potentially in parallel. An invocation within a workgroup may share
data with other members of the same workgroup through shared variables
and issue memory and control barriers to synchronize with other members of
the same workgroup.
Section 2.8, Mesh Processor
The mesh processor is a programmable unit that operates in conjunction
with the task processor to produce a collection of primitives that will be
processed by subsequent stages of the graphics pipeline. The task and
mesh processors form a primitive processing pipeline that can be used
instead of the conventional primitive processing pipeline that includes
the vertex, tessellation control, tessellation evaluation, and geometry
processors. Compilation units written in the OpenGL Shading Language to
run on this processor are called mesh shaders. When a set of mesh shaders
is successfully compiled and linked, they result in a mesh shader
executable that runs on the mesh processor.
A mesh shader has access to many of the same resources as fragment and
other shader processors, including textures, buffers, image variables, and
atomic counters. The only inputs available to the mesh shader are
variables identifying the specific workgroup and invocation and any
outputs written to task memory by the task shader that spawned the mesh
shader's workgroup. Any vertex attributes or other data required by the
mesh shader must be fetched from memory. The invocations of the mesh
shader workgroup write an output mesh, comprising a set of primitives
with per-primitive attributes, a set of vertices with per-vertex
attributes, and an array of indices identifying the mesh vertices that
belong to each primitive. The primitives of this mesh are then processed
by subsequent graphics pipeline stages, where the outputs of the mesh
shader form an interface with the fragment shader.
A mesh shader operates on a group of work items called a workgroup. A
workgroup is a collection of shader invocations that execute the same
code, potentially in parallel. An invocation within a workgroup may share
data with other members of the same workgroup through shared variables
and issue memory and control barriers to synchronize with other members of
the same workgroup.
Modify Section 3.6, Keywords (p. 18)
(add to the end of the list of keywords, p. 19)
perprimitiveEXT
taskPayloadSharedEXT
Modify Section 3.8.2, Dynamically Uniform Expressions and Uniform Control
Flow (p. 21)
(modify third paragraph of this section)
An invocation group is the complete set of invocations collectively
processing a particular compute, task, or mesh shader workgroup, or a
graphical operation, where the scope ...
Modify Section 4.3, Storage Qualifiers (p. 43)
(modify table of base storage qualifiers, p. 43)
Qualifier Meaning
------------------ -----------------------------------------------
shared compute, task, and mesh shader only; variable storage
is shared across all work items in a local workgroup
(add to table of base storage qualifiers, p. 43)
Qualifier Meaning
------------------ -----------------------------------------------
taskPayloadSharedEXT task and mesh shader only; storage that is visible to
task shader work items and the mesh shader work items
they spawn and is shared across all work items in a
local workgroup
(add to table of auxiliary storage qualifiers, p. 44)
Auxiliary Storage
Qualifier Meaning
------------------ -----------------------------------------------
perprimitiveEXT mesh shader outputs with per-primitive instances
Add a sub-section to Section 4.3 (Storage Qualifiers)
4.3.X taskPayloadSharedEXT Variables
These are allowed only in task and mesh shaders. It is a compile time error
to use them in any other stage. They can be both read from and written to
in task shaders, but only read from in mesh shaders. Storage is shared
between all work items in a task or mesh shader local workgroup.
There can be only a single variable at global scope with this qualifier in
stages where this qualifier is permitted. It is a compile-time error to
declare multiple variables or unsized arrays of this type.
Modify Section 4.3.4, Input Variables (p. 46)
(modify third paragraph, p. 47, to treat all mesh shader outputs as
"arrayed" interfaces)
Some inputs and outputs are arrayed ... Geometry shader inputs,
tessellation control shader inputs and outputs, tessellation evaluation
inputs, and mesh shader outputs all have an additional level of arrayness
relative to other shader inputs and outputs. Component limits for these
arrayed interfaces (e.g., gl_MaxTessControlInputComponents) are limits for
a single instance and not for the entire interface.
(insert before the last paragraph, p. 47, "Fragment shader inputs get")
Task shaders do not permit user-defined input variables and do not form a
formal interface with any previous shader stage. See section 7.1 "Built-In
Variables" for a description of built-in task shader input variables.
All other input to a task shader is retrieved explicitly through image
loads, texture fetches, loads from uniforms, uniform buffers, or shader
storage buffers, or other user supplied code. Redeclaration of built-in
input variables in task shaders is not permitted.
Mesh shaders do not permit user-defined input variables. See section 7.1
"Built-In Variables" for a description of built-in mesh shader input
variables.
All other input to a mesh shader is retrieved explicitly through image
loads, texture fetches, loads from uniforms, uniform buffers, or shader
storage buffers, or other user supplied code. Redeclaration of built-in
input variables in mesh shaders is not permitted.
(modify last paragraph, p. 47)
Fragment shader inputs get... The auxiliary storage qualifiers centroid,
sample, and perprimitiveEXT can also be applied, as well as...
(modify first paragraph, p. 48)
Fragment shader inputs that are signed or unsigned integers, integer
vectors, or any double-precision floating-point type must be qualified
with the interpolation qualifier flat or with the auxillary storage
qualifier perprimitiveEXT.
(add a new example to the second paragraph, p. 48)
perprimitiveEXT in vec3 triangleNormal;
(modify third paragraph, p. 48)
The fragment shader inputs form an interface with the mesh shader or last
active shader in the conventional vertex processing pipeline (e.g.,
vertex, tessellation evaluation, geometry). ... Also, interpolation
qualification (e.g., flat) and auxiliary qualification other than
"perprimitiveEXT" (e.g. centroid) may differ. ...
Modify Section 4.3.6, Output Variables (p. 49)
(modify last paragraph, p. 49 to add task and mesh shaders)
It is a compile-time error to declare a vertex, tessellation evaluation,
tessellation control, geometry, task, or mesh shader output that contains
any of the following: ...
(insert before the next-to-last paragraph "The order of execution", p. 50)
Task shaders have no built-in output variables and do not permit
user-defined output variables.
Mesh shader output variables may be used to write per-vertex or
per-primitive data. Output variables qualified with "perprimitiveEXT"
have separate instances for each primitive in the output mesh; all other
output variables have separate instances for each vertex in the output
mesh. It is a compile-time error to use the "perprimitiveEXT" qualifier
in output declarations in any other shader stage. Both types of output
variables are arrayed (see "arrayed" under 4.3.4, Inputs) and each
per-vertex or per-primitive output variable (or output block, see
interface blocks below) needs to be declared as an array. For example,
out float vertexColor[]; // per-vertex color
perprimitiveEXT out vec3 triangleNormal[]; // per-triangle normal
Each element of such an array corresponds to one vertex or primitive of
the output mesh. Each array can optionally have a size declared. The
array size will be set by (or if provided must be consistent with) the
output layout declaration(s) establishing the maximum number of vertices
and primitives in the output mesh. When checking a mesh shader against
implementation limits on the total number of output variable components,
the compiler adds the number of per-vertex outputs for a single vertex
instance and the number of per-primitive outputs for a single primitive
instance. Unlike tessellation control shaders, a mesh shader invocation
may write to outputs for any vertex or primitive.
(modify the next-to-last and last paragraph, p. 50)
The order of execution of tessellation control, task, and mesh shader
invocations relative to the other invocations for the same input patch or
local workgroup is undefined unless the built-in function barrier() is
used to provide some control over relative execution order. When a shader
invocation calls barrier(), ...
Because tessellation control, task, and mesh shader invocations execute in
undefined order between barriers, the values of output variables will
sometimes be undefined. ...
Modify Section 4.3.8, Shared Variables (p. 52)
(modify first paragraph of the section, p. 52)
The shared or taskPayloadSharedEXT qualifier is used to declare variables
that have storage shared between all work items in a compute, task, or mesh
shader local workgroup. Variables declared as shared may only be used in
compute, task, or mesh shaders. Variables declared as taskPayloadSharedEXT
may only be used in task or mesh shaders. ...
(modify last paragraph of the section, p. 52)
There is a limit to the total size of all variables declared as shared and
taskPayloadSharedEXT in a single shader stage. This limit, expressed in
units of basic machine units may be determined by using the OpenGL API to
query the value of MAX_COMPUTE_SHARED_MEMORY_SIZE (compute shaders),
MAX_TASK_SHARED_MEMORY_SIZE_EXT (task shaders), or
MAX_MESH_SHARED_MEMORY_SIZE_EXT (mesh shaders)
Modify Section 4.3.9, Interface Blocks, p. 52
(rework grammar rules, p. 53, to allow "perprimitiveEXT" to qualify blocks)
interface-qualifier:
in-out-block-qualifier(_opt) in // Note: Qualifiers can be in any order.
in-out-block-qualifier(_opt) out
uniform
buffer
// Note: Not shown for simplicity, but memory qualifiers may also be used
in-out-block-qualifier:
patch
perprimitiveEXT
Modify Section 4.4, Layout Qualifiers, p. 57
(modify the layout qualifier table, pp. 58-59)
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces
| only | variable | | Member |
-------------------+-----------+------------+-------+--------+--------------------
local_size_x = | | | | | compute in
local_size_y = | X | | | | mesh in
local_size_z = | | | | | task in
-------------------+-----------+------------+-------+--------+--------------------
max_vertices = | X | | | | geometry out
| | | | | mesh out
-------------------+-----------+------------+-------+--------+--------------------
max_primitives = | X | | | | mesh out
-------------------+-----------+------------+-------+--------+--------------------
[ points ] | | | | |
[ lines ] | X | | | | mesh out
[ triangles ] | | | | |
Add new Section 4.4.1.5, Task and Mesh Shader Inputs, p. 67
(note: the content of this section is nearly identical to the content of
section 4.4.1.4, Compute Shader Inputs)
There are no layout location qualifiers for task shader inputs.
Layout qualifier identifiers for task shader inputs are the workgroup
size qualifiers:
layout-qualifier-id :
local_size_x = integer-constant-expression
local_size_y = integer-constant-expression
local_size_z = integer-constant-expression
These task and mesh shader input layout qualifiers behave identically to
the equivalent compute shader qualifiers and specify a fixed local group
size used for each respective task or mesh shader workgroup.
If no size is specified in any of the three dimensions, a default size of
one will be used.
If the fixed local group size of the shader in any dimension is greater
than the maximum size supported by the implementation for that dimension,
a compile-time error results. Also, if such a layout qualifier is
declared more than once in the same shader, all those declarations must
set the same set of local workgroup sizes and set them to the same values;
otherwise a compile-time error results. If multiple task shaders attached
to a single program object declare a fixed local group size, the
declarations must be identical; otherwise a link-time error results.
Similarily, if multiple mesh shaders attached to a single program object
declare a fixed local group size, the declarations must be identical;
otherwise a link-time error results.
Furthermore, if a program object contains any task and/or mesh shaders, at
least one respectively must contain an input layout qualifier specifying a
fixed local group size for the program, or a link-time error will occur.
Modify section 4.4.2.1, Transform Feedback Layout Qualifiers, p. 69
(add a new paragraph at the end of the section, p. 71)
Transform feedback is not supported to capture the outputs of task and
mesh shaders. Use of transform feedback layout qualifiers in these shader
types will result in a compile-time error.
Add new Section 4.4.2.5, Mesh Shader Outputs, p. 75
Mesh shaders can have three additional types of output layout identifiers:
an output primitive type, a maximum output vertex count, and a maximum
output primitive count. The primitive type, vertex and primitive count
identifiers are allowed only on the interface qualifier out, not on an
output block, block member, or variable declaration.
The layout qualifier identifiers for mesh shader outputs are
layout-qualifier-id :
points
lines
triangles
max_vertices = integer-constant-expression
max_primitives = integer-constant-expression
The primitive type identifiers "points", "lines", and "triangles" are used
to specify the type of output primitive produced by the mesh shader, and
only one of these is accepted. At least one mesh shader (compilation
unit) in a program must declare an output primitive type, and all mesh
shader output primitive type declarations in a program must declare the
same primitive type. It is not required that all mesh shaders in a
program declare an output primitive type.
The vertex count identifier "max_vertices" is used to specify the maximum
number of vertices the shader will ever emit for the invocation group. At
least one mesh shader (compilation unit) in a program must declare a
maximum output vertex count, and all mesh shader output vertex count
declarations in a program must declare the same count. It is not required
that all mesh shaders in a program declare a count.
The primitive count identifier "max_primitives" is used to specify the
maximum number of primitives the shader will ever emit for the invocation
group. At least one mesh shader (compilation unit) in a program must
declare a maximum output primitive count, and all mesh shader output
primitive count declarations in a program must declare the same count. It
is not required that all mesh shaders in a program declare a count.
All output variables must only be accessed after SetMeshOutputsEXT
has been called in the current invocation group. Only through the use of this
function primitives can be emitted from the group. In contrast to other
shaders, mesh shaders do not permit read access on output variables and
doing so results in undefined behavior.
The intrinsically declared output block gl_MeshVerticesEXT[] and any user-defined
output variables or blocks not qualified with "perprimitiveEXT" will be
sized by the "max_vertices" output declaration. The intrinsically
declared output block gl_MeshPrimitivesEXT[] and any user-defined output
variables or blocks qualified with "perprimitiveEXT" will be sized by the
"max_primitives" output declaration. The intrinsically declared arrays
gl_PrimitivePointIndicesEXT[], gl_PrimitiveLineIndicesEXT[] or
gl_PrimitiveTriangleIndicesEXT[] will be sized according "max_primitives"
declarations, where only one of the arrays is available for write access:
* "gl_PrimitivePointIndicesEXT" if "points" is declared
* "gl_PrimitiveLineIndicesEXT" if "lines" is declared, or
* "gl_PrimitiveTriangleIndicesEXT" if "triangles" is declared.
For outputs declared without an array size, including intrinsically
declared outputs (e.g., gl_MeshVerticesEXT), a layout must be declared before any use
of the method length() or other array use that requires its size to be
known. It is a compile- or link-time error if an output array is declared with an
explicit size that does not match the array size derived from the layout
qualifier.
For such array outputs, SetMeshOutputsEXT sets up the actual vertex and primitive count.
Hence indexing these arrays beyond the respective vertex and primitive count
arguments used when calling SetMeshOutputsEXT results in undefined behavior.
Modify Section 4.5, Interpolation Qualifiers, p. 83
(modify first paragraph of the section, p. 83)
The presence of and type of interpolation is controlled by the above
interpolation qualifiers as well as the auxiliary storage qualifiers
centroid and sample. The auxiliary storage qualifiers "patch" and
"perprimitiveEXT" are not used for interpolation; it is a compile-time
error to use interpolation qualifiers with those auxillary storage
qualifiers.
(add a new paragraph at the end of the section, p. 84)
A variable qualified with the auxillary storage qualifier
"perprimitiveEXT" will also not be interpolated. Instead, it will use
the same per-primitive value for all fragments generated by each
primitive.
Modify Section 7.1, Built-In Language Variables (p. 120)
(insert after the first paragraph and variable list, p. 123)
In the task language, built-in variables are intrinsically declared as:
// workgroup dimensions
in uvec3 gl_NumWorkGroups;
const uvec3 gl_WorkGroupSize;
// workgroup and invocation IDs
in uvec3 gl_WorkGroupID;
in uvec3 gl_LocalInvocationID;
in uvec3 gl_GlobalInvocationID;
in uint gl_LocalInvocationIndex;
In the mesh language, built-in variables are intrinsically declared as:
// workgroup dimensions
in uvec3 gl_NumWorkGroups;
const uvec3 gl_WorkGroupSize;
// workgroup and invocation IDs
in uvec3 gl_WorkGroupID;
in uvec3 gl_LocalInvocationID;
in uvec3 gl_GlobalInvocationID;
in uint gl_LocalInvocationIndex;
// write only access
out uint gl_PrimitivePointIndicesEXT[];
out uvec2 gl_PrimitiveLineIndicesEXT[];
out uvec3 gl_PrimitiveTriangleIndicesEXT[];
// write only access
out gl_MeshPerVertexEXT {
vec4 gl_Position;
float gl_PointSize;
float gl_ClipDistance[];
float gl_CullDistance[];
} gl_MeshVerticesEXT[];
// write only access
perprimitiveEXT out gl_MeshPerPrimitiveEXT {
int gl_PrimitiveID;
int gl_Layer;
int gl_ViewportIndex;
bool gl_CullPrimitiveEXT;
int gl_PrimitiveShadingRateEXT;
} gl_MeshPrimitivesEXT[];
(modify the discussion of the built-in variables shared with compute
shaders, which starts on p. 123)
The built-in variable gl_NumWorkGroups is a compute, task, or mesh shader
input variable containing the total number of global work items in each
dimension of the workgroup that will execute the compute, task or mesh
shader. ...
The built-in constant gl_WorkGroupSize is a compute, task, or mesh shader
constant containing the local workgroup size of the shader. The size ...
The built-in variable gl_WorkGroupID is a compute, task, or mesh shader
input variable containing the three-dimensional index of the global work
group that the current invocation is executing in. ...
The built-in variable gl_LocalInvocationID is a compute, task, or mesh
shader input variable containing the three-dimensional index of the local
workgroup within the global workgroup that the current invocation is
executing in. ...
The built-in variable gl_GlobalInvocationID is a compute, task, or mesh
shader input variable containing the global index of the current work
item. This value uniquely identifies this invocation from all other
invocations across all local and global workgroups initiated by the
current DispatchCompute or DispatchMeshTasksEXT call or by a previously
executed task shader. ...
The built-in variable gl_LocalInvocationIndex is a compute, task, or mesh
shader input variable that contains the one-dimensional representation of
the gl_LocalInvocationID.
(modify discussion of gl_PrimitiveID, gl_Layer, and gl_ViewportIndex to
allow as a mesh output, pp. 125-127)
The output variable gl_PrimitiveID is available only in the geometry and
mesh languages and provides a single integer that serves as a primitive
identifier. This is then available to fragment shaders as the fragment
input gl_PrimitiveID, which will select the written primitive ID from the
provoking vertex in the primitive being shaded when using a geometry
shader or from the appropriate per-primitive output value when using a
mesh shader. If a fragment shader using gl_PrimitiveID is active and a
geometry or mesh shader is also active, the geometry or mesh shader must
write to gl_PrimitiveID or the fragment shader input gl_PrimitiveID is
undefined. ...
The variable gl_Layer is available as an output variable in the geometry
and mesh languages and an input variable in the fragment language. In the
geometry and mesh languages, it is used to select a specific layer (or
face and layer of a cube map) of a multi-layer framebuffer attachment.
When using a geometry shader, the actual layer used will come from one of
the vertices in the primitive being shaded. Which vertex the layer comes
from is discussed in section 11.3.4.6 "Layer and Viewport Selection" of
the OpenGL Specification. It might be undefined, so it is best to write
the same layer value for all vertices of a primitive. When using a mesh
shader, the actual layer will come from the appropriate per-primitive
output value written by the mesh shader. ...
The input variable gl_Layer in the fragment language will have the same
value that was written to the output variable gl_Layer in the geometry or
mesh language. If the geometry or mesh stage does not dynamically assign
... If the geometry or mesh stage makes no static assignment to gl_Layer,
the input value... Otherwise, the fragment stage will read the same value
written by the geometry or mesh stage, even if...
The variable gl_ViewportIndex is available as an output variable in the
geometry and mesh languages and an input variable in the fragment
language. In the geometry and mesh language, it provides the ...
Primitives generated by the geometry or mesh shader will undergo viewport
transformation and scissor testing using the viewport transformation and
scissor rectangle selected by the value of gl_ViewportIndex. When using a
geometry shader, the viewport index used will come from one of the
vertices in the primitive being shaded. However, which vertex the
viewport index comes from is implementation-dependent, so it is best to
use the same viewport index for all vertices of the primitive. When using
a mesh shader, the viewport index used will come from the appropriate
per-primitive output value written by the mesh shader. If a geometry or
mesh shader does not assign a value to gl_ViewportIndex, ... If a
geometry or mesh shader statically assigns a value to gl_ViewportIndex...
The input variable gl_ViewportIndex in the fragment stage will have the
same value that was written to the output variable gl_ViewportIndex in the
geometry or mesh stage. If the geometry or mesh stage does not dynamically
assign... If the geometry or mesh stage makes no static assignment...
Otherwise, the fragment stage will read the same value written by the
geometry or mesh stage, even if...
The output variable gl_CullPrimitiveEXT is only available in the
mesh language. When set to true, it marks that this primitive should
be culled. If not written to, it defaults to false.
The output array variables gl_PrimitivePointIndicesEXT[],
gl_PrimitiveLineIndicesEXT[] or gl_PrimitiveTriangleIndicesEXT[] are only
available in the mesh language.
Depending on the output primitive type declared using a
layout qualifier, the appropriate array element specifies
the indices of the vertices making up the primitive.
All index values must be in the range [0, N-1], where N is the value of
the "primitiveCount" argument of a previous call to SetMeshOutputsEXT.
Out-of-bounds index values will result in undefined behavior.
Each array element must be written as a whole, partial writes to the
vector components for line and triangle primitives is not allowed.
(modify the fifth paragraph, p. 129)
The gl_PerVertex, gl_MeshPerVertexEXT, and gl_MeshPerPrimitiveEXT blocks can
be redeclared in a shader to explicitly indicate what subset of the fixed
pipeline interface will be used. ...
(modify the sixth paragraph, p. 129)
This establishes the output interface the shader will use with the
subsequent pipeline stage. It must be a subset of the built-in members of
gl_PerVertex, gl_MeshPerVertexEXT, or gl_MeshPerPrimitiveEXT. ...
Add new Section 8.xx, Task Shader Functions, after section 8.15, p. 187
These functions are only available in task shaders.
Insert a syntax/description table similar to the previous section.
Syntax:
void EmitMeshTasksEXT(uint groupCountX,
uint groupCountY,
uint groupCountZ)
Description:
Defines the grid size of subsequent mesh shader workgroups to generate
upon completion of the task shader invocation group that called this
function and exits the shader. These mesh shader workgroups will have
access to the data that was written to a variable with the
taskPayloadSharedEXT qualifier. The function call implies a barrier().
The arguments are taken from the first invocation in each workgroup.
Any invocation must call this function exactly once and under uniform
control flow, otherwise behavior is undefined.
Add new Section 8.xx, Mesh Shader Functions, after previous Task Shader
section
These functions are only available in mesh shaders.
Insert a syntax/description table similar to the previous section.
Syntax:
void SetMeshOutputsEXT(uint vertexCount,
uint primitiveCount)
Description:
Sets the actual output size of the primitives and vertices that this
mesh shader workgroup will emit upon completion. The vertexCount
argument must be less or equal than the provided max_vertices identifier
and the primitiveCount argument must be less or equal to max_primitives,
otherwise behavior is undefined.
The arguments are taken from the first invocation in each workgroup.
Any invocation must call this function no more than once and under
uniform control flow, otherwise behavior is undefined. There must not
be any control flow path to an output write that is not preceded by a
call to this function, otherwise behavior is undefined.
Modify Section 8.16, Shader Invocation Control Functions, p. 186
(modify first paragraph of the section, p. 186)
The shader invocation control function is available only in tessellation
control, compute, task, and mesh shaders. It is used
to control the relative execution order of multiple shader invocations
used to process a patch (in the case of tessellation control shaders) or a
local workgroup (in the case of compute, task, and mesh shaders), which
are otherwise executed with an undefined relative order.
(modify the last paragraph, p. 186)
For compute, task, and mesh shaders, the barrier() function may be placed
within flow control, but that flow control must be uniform flow control.
...
Modify Section 8.17, Shader Memory Control Functions, p. 187
(modify table of functions, p. 187)
void memoryBarrierShared()
Control the ordering of memory transactions to shared variables issued
within a single shader invocation.
Only available in compute, task, and mesh shaders.
void groupMemoryBarrier()
Control the ordering of all memory transactions issued within a single
shader invocation, as viewed by other invocations in the same work
group.
Only available in compute, task, and mesh shaders.
(modify last paragraph, p. 187)
... all of the above variable types. The functions memoryBarrierShared()
and groupMemoryBarrier() are available only in compute, task, and mesh
shaders; the other functions are available in all shader types.
(modify last paragraph, p. 188)
... When using the function groupMemoryBarrier(), this ordering guarantee
applies only to other shader invocations in the same compute, task, or
mesh shader workgroup; all other memory barrier functions provide the
guarantee to all other shader invocations. ...
Interactions with GLSL 4.60 and GL_KHR_vulkan_glsl
If GLSL 4.60 or GL_KHR_vulkan_glsl is supported, the layout qualifiers
"local_size_x_id", "local_size_y_id", and "local_size_z_id" are supported
in mesh and task shaders, as in compute shaders.
In the big layout qualifier table in section 4.4, add:
Layout Qualifier | Qualifier | Individual | Block | Block | Allowed interfaces
| only | variable | | Member |
-------------------+-----------+------------+-------+--------+--------------------
local_size_x_id = | | | | | compute in
local_size_y_id = | X | | | | mesh in
local_size_z_id = | | | | | task in
| | | | | (SPIR-V generation
| | | | | only)
No changes are required to the spec language describing these layout
qualifiers, since the language doesn't specifically reference compute
shaders and the mesh/task support should be identical.
Interactions with GL_ARB_shader_draw_parameters
If GL_ARB_shader_draw_parameters is supported, the task and mesh shaders
will also have the following built-in inputs:
in int gl_DrawIDARB;
The variable <gl_DrawIDARB> is a vertex, task and mesh language input
variable that holds the integer index of the drawing command to which the
current vertex belongs (see "Shader Inputs" in section 11.1.3.9 of the
OpenGL Graphics System Specification), or for the latter the current
task or mesh workgroup. If the vertex or workgroup is not invoked by a
Multi* form of a draw command, then the value of gl_DrawIDARB is zero.
Interactions with GL_EXT_clip_cull_distance
If implemented with OpenGL ES ESSL and GL_EXT_clip_cull_distance is not
supported, remove references to gl_ClipDistance, gl_CullDistance,
gl_ClipDistancePerViewEXT and gl_CullDistancePerViewEXT.
Interactions with GL_KHR_shader_subgroup
If GL_KHR_shader_subgroup is supported, the built-in variables and functions
added by this extension are available in the task and mesh shaders.
Interactions with GL_EXT_multiview
If GL_EXT_multiview is supported, the mesh shader
will also have the following built-in inputs:
in int gl_ViewIndex;
If a variable is dependent on gl_ViewIndex (either directly or indirectly
through memory load operation that used a dependent address),
it becomes a per-view variable and further restrictions may apply.
All view instances of per-view outputs count