Name ARB_shader_storage_buffer_object Name Strings GL_ARB_shader_storage_buffer_object Contact Pat Brown, NVIDIA (pbrown 'at' nvidia.com) Contributors Jeff Bolz, NVIDIA Piers Daniell, NVIDIA Christophe Riccio, AMD Graham Sellers, AMD Bruce Merry John Kessenich Notice Copyright (c) 2012-2013 The Khronos Group Inc. Copyright terms at http://www.khronos.org/registry/speccopyright.html Status Complete. Approved by the ARB on 2012/06/12. Version Last Modified Date: September 23, 2013 Revision: 15 Number ARB Extension #137 Dependencies OpenGL 4.0 (either core or compatibility profile) is required. OpenGL 4.3 or ARB_program_interface_query is required. This extension is written against the OpenGL 4.2 (Compatibility Profile) Specification. This extension interacts with OpenGL 4.3 and ARB_compute_shader. This extension interacts with OpenGL 4.3 and ARB_program_interface_query. This extension interacts with NV_bindless_texture. Overview This extension provides the ability for OpenGL shaders to perform random access reads, writes, and atomic memory operations on variables stored in a buffer object. Application shader code can declare sets of variables (referred to as "buffer variables") arranged into interface blocks in a manner similar to that done with uniform blocks in OpenGL 3.1. In both cases, the values of the variables declared in a given interface block are taken from a buffer object bound to a binding point associated with the block. Buffer objects used in this extension are referred to as "shader storage buffers". While the capability provided by this extension is similar to that provided by OpenGL 3.1 and ARB_uniform_buffer_object, there are several significant differences. Most importantly, shader code is allowed to write to shader storage buffers, while uniform buffers are always read-only. Shader storage buffers have a separate set of binding points, with different counts and size limits. The maximum usable size for shader storage buffers is implementation-dependent, but its minimum value is substantially larger than the minimum for uniform buffers. The ability to write to buffer objects creates the potential for multiple independent shader invocations to read and write the same underlying memory. The same issue exists with the ARB_shader_image_load_store extension provided in OpenGL 4.2, which can write to texture objects and buffers. In both cases, the specification makes few guarantees related to the relative order of memory reads and writes performed by the shader invocations. For ARB_shader_image_load_store, the OpenGL API and shading language do provide some control over memory transactions; those mechanisms also affect reads and writes of shader storage buffers. In the OpenGL API, the glMemoryBarrier() call can be used to ensure that certain memory operations related to commands issued prior the barrier complete before other operations related to commands issued after the barrier. Additionally, the shading language provides the memoryBarrier() function to control the relative order of memory accesses within individual shader invocations and provides various memory qualifiers controlling how the memory corresponding to individual variables is accessed. New Procedures and Functions void ShaderStorageBlockBinding(uint program, uint storageBlockIndex, uint storageBlockBinding); New Tokens Accepted by the parameters of BindBuffer, BufferData, BufferSubData, MapBuffer, UnmapBuffer, GetBufferSubData, and GetBufferPointerv: SHADER_STORAGE_BUFFER 0x90D2 Accepted by the parameter of GetIntegerv, GetIntegeri_v, GetBooleanv, GetInteger64v, GetFloatv, GetDoublev, GetBooleani_v, GetIntegeri_v, GetFloati_v, GetDoublei_v, and GetInteger64i_v: SHADER_STORAGE_BUFFER_BINDING 0x90D3 Accepted by the parameter of GetIntegeri_v, GetBooleani_v, GetIntegeri_v, GetFloati_v, GetDoublei_v, and GetInteger64i_v: SHADER_STORAGE_BUFFER_START 0x90D4 SHADER_STORAGE_BUFFER_SIZE 0x90D5 Accepted by the parameter of GetIntegerv, GetBooleanv, GetInteger64v, GetFloatv, and GetDoublev: MAX_VERTEX_SHADER_STORAGE_BLOCKS 0x90D6 MAX_GEOMETRY_SHADER_STORAGE_BLOCKS 0x90D7 MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS 0x90D8 MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS 0x90D9 MAX_FRAGMENT_SHADER_STORAGE_BLOCKS 0x90DA MAX_COMPUTE_SHADER_STORAGE_BLOCKS 0x90DB MAX_COMBINED_SHADER_STORAGE_BLOCKS 0x90DC MAX_SHADER_STORAGE_BUFFER_BINDINGS 0x90DD MAX_SHADER_STORAGE_BLOCK_SIZE 0x90DE SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT 0x90DF Accepted in the bitfield in glMemoryBarrier: SHADER_STORAGE_BARRIER_BIT 0x2000 Also, add a new alias for the existing token MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS: MAX_COMBINED_SHADER_OUTPUT_RESOURCES 0x8F39 (alias) Additions to Chapter 2 of the OpenGL 4.2 (Compatibility Profile) Specification (OpenGL Operation) Modify Section 2.9, Buffer Objects, p. 56 (Add to Table 2.9, p. 57) Target Name Purpose Described in section(s) --------------------- -------------------- ---------------------- SHADER_STORAGE_BUFFER read-write storage 2.14.X for shaders (modify next-to-last paragraph, p. 58) target must be one of ATOMIC_COUNTER_BUFFER, SHADER_STORAGE_BUFFER, TRANSFORM_FEEDBACK_BUFFER, UNIFORM_BUFFER. ... Modify Section 2.14.7, Uniform Variables, p. 113 (modify Table 2.16, pp. 122-125) Add a new column labeled "Buffer". Include dots for all the types on p. 122 (including BOOL types not supported for "Attrib" and "Xfb"). Add dots for the "DOUBLE_MAT*" rows on p. 123. Add no dots for any image or sampler types. In the description of the table (p. 125), add a new sentence: Types whose "Buffer" column are marked may be declared as buffer variables (see section 2.14.X). Modify unnumbered "Standard Uniform Block Layout" section, p. 132 (insert a new paragraph at the end of the section, at the bottom of p. 133) Shader storage blocks (section 2.14.X) also support the "std140" layout qualifier, as well as a "std430" layout qualifier not supported for uniform blocks. When using the "std430" storage layout, shader storage blocks will be laid out in buffer storage identically to uniform and shader storage blocks using the "std140" layout, except that the base alignment of arrays of scalars and vectors in rule (4) and of structures in rule (9) are not rounded up a multiple of the base alignment of a vec4. Add new section immediately before Section 2.14.8, Subroutine Uniform Variables (p. 135) 2.14.X, Shader Buffer Variables Shaders can declare named /buffer variables/, as described in the OpenGL Shading Language Specification. Sets of buffer variables are grouped into interface blocks called /shader storage blocks/. The values of each buffer variable in a shader storage block are read from or written to the data store of a buffer object bound to the binding point associated with the block. The values of active buffer variables may be changed by executing shaders that assign values to them or perform atomic memory operations on them, by modifying the contents of the bound buffer object's data store with commands such as BufferSubData, by binding a new buffer object to the binding point associated with the block, or by changing the binding point associated with the block. Buffer variables in shader storage blocks are represented in memory in the same way as uniforms stored in uniform blocks, as described in the "Uniform Buffer Object Storage" subsection of Section 2.14.7. When a program is linked successfully, each active buffer variable is assigned an offset relative to the base of the buffer object binding associated with its shader storage block. For buffer variables declared as arrays and matrices, strides between array elements or matrix columns or rows will also be assigned. Offsets and strides of buffer variables will be assigned in an implementation-dependent manner unless the shader storage block is declared using the "std140" or "std430" storage layout qualifiers. For "std140" and "std430" shader storage blocks, offsets will be assigned using the method described in the "Standard Uniform Block Layout" subsection of Section 2.14.7. If a program is re-linked, existing buffer variable offsets and strides are invalidated, and a new set of active variables, offsets, and strides will be generated. The total amount of buffer object storage that can be accessed in any shader storage block is subject to an implementation-dependent limit. The maximum amount of available space, in basic machine units, can be queried by calling GetIntegerv with the constant MAX_SHADER_STORAGE_BLOCK_SIZE. If the amount of storage required for any shader storage block exceeds this limit, a program will fail to link. If the number of active shader storage blocks referenced by the shaders in a program exceeds implementation-dependent limits, the program will fail to link. The limits for vertex, tessellation control, tessellation evaluation, geometry, fragment, and compute shaders can be obtained by calling GetIntegerv with pname values of MAX_VERTEX_SHADER_STORAGE_BLOCKS, MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS, MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS, MAX_GEOMETRY_SHADER_STORAGE_BLOCKS, MAX_FRAGMENT_SHADER_STORAGE_BLOCKS, and MAX_COMPUTE_SHADER_STORAGE_BLOCKS, respectively. Additionally, a program will fail to link if the sum of the number of active shader storage blocks referenced by each shader stage in a program exceeds the value of the implementation-dependent limit MAX_COMBINED_SHADER_STORAGE_BLOCKS. If a shader storage block in a program is referenced by multiple shaders, each such reference counts separately against this combined limit. When a named shader storage block is declared by multiple shaders in a program, it must be declared identically in each shader. The buffer variables within the block must be declared with the same names, types, qualification, and declaration order. If a program contains multiple shaders with different declarations for the same named shader storage block, the program will fail to link. Regions of buffer objects are bound as storage for shader storage blocks by calling one of the commands BindBufferRange or BindBufferBase (see section 2.9.1) with target set to SHADER_STORAGE_BUFFER. In addition to the general errors described in section 2.9.1, BindBufferRange will generate an INVALID_VALUE error if index is greater than or equal to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS, or if is not a multiple of the implementation-dependent alignment requirement (the value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT). Each of a program's active shader storage blocks has a corresponding shader storage buffer object binding point. When a program object is linked, the shader storage buffer object binding point assigned to each of its active shader storage blocks is reset to the value specified by the corresponding "binding" layout qualifier, if present, or zero otherwise. After a program is linked, the command void ShaderStorageBlockBinding(uint program, uint storageBlockIndex, uint storageBlockBinding); changes the active shader storage block with an assigned index of in program object . The error INVALID_VALUE is generated if is not an active shader storage block index in , or if is greater than or equal to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS. If successful, ShaderStorageBlockBinding specifies that will use the data store of the buffer object bound to the binding point to read and write the values of the buffer variables in the shader storage block identified by . When executing shaders that access shader storage blocks, the binding point corresponding to each active shader storage block must be populated with a buffer object with a size no smaller than the minimum required size of the shader storage block (the value of BUFFER_SIZE for the appropriate SHADER_STORAGE_BUFFER resource). For binding points populated by BindBufferRange, the size in question is the value of the parameter or the size of the buffer minus the value of the parameter, whichever is smaller. If any active shader storage block is not backed by a sufficiently large buffer object, the results of shader execution are undefined, and may result in GL interruption or termination. Shaders may be executed to process the primitives and vertices specified between Begin and End, or by vertex array commands (see section 2.8). Shaders may also be executed as a result of DrawPixels, Bitmap, or RasterPos* commands. Modify Section 2.14.12, Shader Execution (p. 145) (add new sub-section before "Shader Inputs", p. 151) Shader Storage Buffer Access Shaders have the ability to read and write to buffer memory via buffer variables in shader storage blocks. The maximum number of shader storage blocks available to shaders are the values of the implementation dependent constants * MAX_VERTEX_SHADER_STORAGE_BLOCKS (for vertex shaders), * MAX_TESS_CONTROL_SHADER_STORAGE_BLOCKS (for tessellation control shaders), * MAX_TESS_EVALUATION_SHADER_STORAGE_BLOCKS (for tessellation evaluation shaders), * MAX_GEOMETRY_SHADER_STORAGE_BLOCKS (for geometry shaders), * MAX_FRAGMENT_SHADER_STORAGE_BLOCKS (for fragment shaders), and * MAX_COMPUTE_SHADER_STORAGE_BLOCKS (for compute shaders). All active shaders combined cannot use more than the value of MAX_COMBINED_SHADER_STORAGE_BLOCKS shader storage blocks. If more than one pipeline stage accesses the same shader storage block, each such access separately against this combined limit. (add to the list of bullets in the "Validation" section on p. 153) * The sum of the number of active shader storage blocks used by the current program objects exceeds the combined limit on the number of active shader storage blocks (MAX_COMBINED_SHADER_STORAGE_BLOCKS). Modify Section 2.14.13, Shader Memory Access (p. 153) (modify last paragraph, p. 153) Shaders may perform random-access reads and writes to texture or buffer object memory by using built-in image load, store, and atomic functions operating on shader image variables, or by reading from, assigning to, or performing atomic memory operation on shader buffer variables, as described in the OpenGL Shading Language Specification. The ability to perform such random-access reads and writes in systems that may be highly pipelined results in ordering and synchronization issues discussed in the sections below. (add to list of MemoryBarrier bullets, p. 158) * SHADER_STORAGE_BARRIER_BIT: Memory accesses using shader buffer variables issued after the barrier will reflect data written by shaders prior to the barrier. Additionally, assignments to and atomic operations performed on shader buffer variables after the barrier will not execute until all memory accesses (e.g., loads, stores, texture fetches, vertex fetches) initiated prior to the barrier complete. Additions to Chapter 3 of the OpenGL 4.2 (Compatibility Profile) Specification (Rasterization) Modify Section 3.10.22, Texture Image Loads and Stores (p. 358) (modify first paragraph, p. 367) Implementations may support a limited combined number of image units, shader storage blocks, and active fragment shader outputs (see section 4.2.1). A link error will be generated if the sum of the number of active image uniforms used in all shaders, the number of active shader storage blocks, and the number of active fragment shader outputs exceeds the implementation-dependent value of MAX_COMBINED_SHADER_OUTPUT_RESOURCES. Additions to Chapter 4 of the OpenGL 4.2 (Compatibility Profile) Specification (Per-Fragment Operations and the Frame Buffer) None. Additions to Chapter 5 of the OpenGL 4.2 (Compatibility Profile) Specification (Special Functions) None. Additions to Chapter 6 of the OpenGL 4.2 (Compatibility Profile) Specification (State and State Requests) Modify Secction 6.1.15, Buffer Object Queries (p. 490) (add to end of section) To query which buffer objects are bound to the array of shader storage buffer binding points and will be used as the storage for active shader storage blocks, call GetIntegeri_v with set to SHADER_STORAGE_BUFFER_BINDING. must be in the range zero to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS-1. The name of the buffer object bound to index is returned in . If no buffer object is bound for , zero is returned in . To query the starting offset or size of the range of each buffer object binding used for shader storage buffers, call GetInteger64i_v with set to SHADER_STORAGE_BUFFER_START or SHADER_STORAGE_BUFFER_SIZE respectively. must be in the range zero to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS-1. If the parameter (starting offset or size) was not specified when the buffer object was bound (e.g. if bound with BindBufferBase), or if no buffer object is bound to index, zero is returned. Additions to Appendix A of the OpenGL 4.2 (Compatibility Profile) Specification (Invariance) Modify Section A.1, Repeatability (p. 583) (modify last sentence of the first paragraph, p. 583) ... This repeatability requirement doesn't apply when using shaders containing side effects (image stores, image atomic operations, atomic counter operations, buffer variable stores, buffer variable atomic operations), because these memory operations are not guaranteed to be processed in a defined order. Modify Section A.3, Invariance (p. 584) (modify first sentence of the paragraph after Rule 5, p. 586) If a sequence of GL commands specifies primitives to be rendered with shaders containing side effects (image stores, image atomic operations, atomic counter operations, buffer variable stores, buffer variable atomic operations), invariance rules are relaxed. ... (modify first paragraph, p. 587) When any sequence of GL commands triggers shader invocations that perform image stores, image atomic operations, atomic counter operations, buffer variable stores, or buffer variable atomic operations and subsequent GL commands read the memory written by those shader invocations, these operations must be explicitly synchronized. For more details, see Section 2.14.X, Shader Memory Access. Additions to Appendix D of the OpenGL 4.2 (Compatibility Profile) Specification (Shared Objects and Multiple Contexts) Modify Section D.3, Propagating State Changes, p. 611 (modify second bullet, p. 612) * Rendering commands that trigger shader invocations, where the shader performs image stores, image atomic operations, atomic counter operations, buffer variable stores, or buffer variable atomic operations. Additions to the OpenGL Shading Language 4.20 Specification Including the following line in a shader can be used to control the language features described in this extension: #extension GL_ARB_shader_storage_buffer_object : where is as specified in section 3.3. New preprocessor #defines are added to the OpenGL Shading Language: #define GL_ARB_shader_storage_buffer_object 1 Modify Section 3.6, Keywords (p. 15) (add to list of keywords) buffer Modify Section 4.1.9, Arrays (p. 29) (modify first paragraph of the section, p. 29, adding an exception allowing general indexing of the last array of a shader storage block) ... Except for the last declared member of a shader storage block (section 4.3.X), the size of an array must be declared before it is indexed with anything other than an integral constant expression. The size of an array must be declared before passing it as an argument to a function. ... (modify last paragraph, p. 30) ... This returns a type int. If an array has been explicitly sized, the value returned by the length method is a constant expression. If an array has not been explicitly sized and is not the last declared member of a shader storage block, the value returned by the length method is not a constant expression and will be determined when a program is linked. If an array has not been explicitly sized and is the last declared member of a shader storage block, the value returned will not be constant expression and will be determined at run time based on the size of the buffer object providing storage for the block. For such arrays, the value returned by the length method will be undefined if the array is contained in an array of shader storage blocks that is indexed with a non-constant expression less than zero or greater than or equal to the number of blocks in the array. (add a new paragraph to end of the section, at the bottom of p. 30) In a shader storage block, the last member may be declared without an explicit size. In this case, the effective array size is inferred at run-time from the size of the data store backing the interface block. Such unsized arrays may be indexed with general integer expressions, but may not be passed as an argument to a function or indexed with a negative constant expression. Modify Section 4.3, Storage Qualifiers (p. 36) Storage Qualifier Meaning ---------- --------------------------------------------------- buffer value is stored in a buffer object, and can be read or written by shader invocations and the OpenGL API Modify Section 4.3.3, Constant Expressions (p. 38) (modify first bullet, p. 39, clarifying that the length() method only produces constant expressions on explicitly sized objects, since we now allow it on implicitly sized or unsized arrays) * valid use of the length() method on an explicitly sized object, whether or not the object itself is constant (implicitly sized or unsized arrays do not return a constant expression) Insert after Section 4.3.5, Uniform (p. 40) 4.3.X, Buffer Variables The qualifier is used to declare global variables whose values are stored in the data store of a buffer object bound through the OpenGL API. Buffer variables can be read and written, with the underlying storage shared among all active shader invocations. Buffer variable memory reads and writes within a single shader invocation are processed in order. However, the order of reads and writes performed in one invocation relative to those performed by another invocation is largely undefined. Buffer variables may be qualified with memory qualifiers affecting how the underlying memory is accessed, as described in Section 4.10. The "buffer" qualifier can be used with any of the basic data types, or when declaring a variable whose type is a structure, or an array of any of these. Buffer variables may only be declared inside interface blocks (Section 4.3.7), which are referred to as shader storage blocks. It is illegal to declare buffer variables at global scope (outside a block). Buffer variables cannot have initializers. There are implementation-dependent limits on the number of the shader storage blocks used for each type of shader, the combined number of shader storage blocks used for a program, and the amount of storage required by each individual shader storage block. If any of these limits are exceeded, it will cause a compile-time or link-time error. If multiple shaders are linked together, then they will share a single global buffer variable name space, including within a language as well as across languages. Hence, the types of buffer variables with the same name must match across all shaders that are linked into a single program. Modify Section 4.3.7, Interface Blocks (p. 43) (modify first paragraph) Input, output, uniform, and buffer variable declarations can be grouped into named interface blocks ... A uniform block is backed by the application with a buffer object. A block of buffer variables, called a shader storage block, is also backed by the application with a buffer object. ... (modify second paragraph) An interface block is started by an in, out, uniform, or buffer keyword, followed by ... (add "buffer" to the grammar rules) interface-qualifier: in out uniform buffer (modify first paragraph, p. 44) Types and declarators are the same as for other input, output, uniform, and buffer variable declarations... (modify third paragraph, p. 44) If no optional qualifier is used in a member-declaration, the qualification of the variable is just in, out, uniform, or buffer as determined by . ... Input variables, output variables, uniform variables, and buffer variables can only be in in blocks, out blocks, uniform blocks, and shader storage blocks, respectively. Repeating the "in", "out", "uniform", or "buffer" interface qualifier for a member's storage qualifier is optional. ... (modify fourth paragraph, p. 44) For this section, define an interface to be one of these: * All the uniforms of a program. This spans all compilation units linked together within one program. * All the buffer variables of a program. * The boundary between adjacent programmable pipeline stages: ... (modify next-to-last paragraph, p. 45) For uniform or shader storage blocks declared as an array, each individual array element corresponds to a separate buffer object bind range, backing one instance of the block. As the array size indicates the number of buffer objects needed, uniform and shader storage block array declarations must specify an array size. A uniform or shader storage block array can only be indexed with a dynamically uniform integral expression, otherwise results are undefined. (modify last paragraph of the section, p. 46) There are implementation-dependent limits on the number of uniform blocks and the number of shader storage blocks that can be used per stage. If either limit is exceeded, it will cause a link error. Modify Section 4.4.1.2, Geometry Shader Inputs (p. 49) (modify example at the top of p. 51, since it's now legal to take the length of implicitly sized arrays) // code sequence within one shader... in vec4 Color1[]; // legal, size still unknown in vec4 Color2[2]; // legal, size is 2 in vec4 Color3[3]; // illegal, input sizes are inconsistent layout(lines) in; // legal for Color2, input size is 2, matching Color2 in vec4 Color4[3]; // illegal, contradicts layout of lines layout(lines) in; // legal, matches other layout() declaration layout(triangles) in; // illegal, does not match earlier layout() // declaration Modify Section 4.4.3, Uniform Block Layout Qualifiers (p. 57). Rename section title to "Uniform and Shader Storage Block Layout Qualifiers". (modify first paragraph) Layout qualifiers can be used for uniform and shader storage blocks, but not for non-block uniform declarations. The layout qualifier identifiers for uniform and shader storage blocks are layout-qualifier-id shared packed std140 std430 row_major column_major binding = integer-constant (modify last paragraph, p. 57) Uniform and shader storage block layout qualifiers can be declared for global scope, on a single uniform or shader storage block, or on a single block member declaration. (modify first paragraph, p. 58) Default layouts are established (except for binding) at global scope for uniform blocks as layout(layout-qualifier-id-list) uniform; and for shader storage blocks as layout(layout-qualifier-id-list) buffer; ... The result becomes the new default qualification scoped to subsequent uniform or shader storage block definitions. (modify third paragraph, p. 58) The initial state of compilation is as if the following were declared: layout(shared, column_major) uniform; layout(shared, column_major) buffer; (modify fourth paragraph, p. 58) Uniform and shader storage blocks can be declared with optional layout qualifiers, and so can their individual member declarations. Such block layout qualification is scoped only to the content of the block. As with global layout declarations, block layout qualification first inherits from the current default qualification and then overrides it. Similarly, individual member layout qualification is scoped just to the member declaration, and inherits from and overrides the block's qualification. (modify the fifth paragraph, p. 58) The shared qualifier overrides only the std140, std430, and packed qualifiers; other qualifiers are inherited. The compiler/linker will ensure that multiple programs and programmable stages containing this definition will share the same memory layout for this block, as long as all arrays are declared with explicit sizes and all matrices have matching row_major and/or column_major qualifications (which may come from a declaration outside the block definition). ... (modify sixth paragraph, p. 58) The packed qualfier overrides only std140, std430, and shared; other qualifiers are inherited. ... Attempts to share a packed uniform or shader storage block across programs or stages will generally fail. ... (modify seventh paragraph, p. 58) The std140 and std430 qualifiers override only the packed, shared, std140, and std430 qualifiers; other qualifiers are inherited. The std430 qualifier is supported only for shader storage blocks; a shader using the std430 qualifier on a uniform block will fail to compile. ... (modify eight paragraph, p. 58) Layout qualifiers on member declarations cannot use the shared, packed, std140, or std430 qualifiers. ... (modify last paragraph, p. 58) The identifier specifies the buffer binding point corresponding to the uniform or shader storage block, which will be used to obtain the values of the member variables of the block. It is an error to specify the binding identifier for the global scope or for block member declarations. Any uniform or shader storage block declared without a binding identifier is initially assigned to block binding point zero. After a program is linked, the binding points used for uniform and shader storage blocks declared with or without a binding identifier can be updated by the OpenGL API. (modify second paragraph, p. 59) If the identifier is used with a uniform or shader storage block instanced as an array then the first element of the array takes the specified block binding and each subsequent element takes the next consecutive block binding point. (modify third paragraph, p. 59) If the binding point for any uniform or shader storage block instance is less than zero or greater than or equal to the implementation-dependent maximum number of bindings for the block type (uniform or shader storage), a compilation error will occur. When the binding identifier is used with a uniform or shader storage block instanced as an array of size , all elements of the array from through +-1 must be within this range. Modify Section 4.10, Memory Qualifiers (p. 71) (modify first paragraph of section, p. 71, removing the "Only" from "Only variables") Variables declared as image types (the basic opaque types with "image" in their keyword) can be qualified with a memory qualifier. (add to the end of the third paragraph, p. 73) ... It is an error to qualify an image variable with both "readonly" and "writeonly". (insert after third paragraph, p. 73) The memory qualifiers "coherent", "volatile", "restrict", "readonly", and "writeonly" may be used in the declaration of buffer variables (i.e., members of shader storage blocks). When a buffer variable is declared with a memory qualifier, the behavior specified for memory accesses involving image variables described above applies identically to memory accesses involving that buffer variable. It is an error to assign to a buffer variable qualified with "readonly" or to read from a buffer variable qualified with "writeonly". Additionally, memory qualifiers may also be used in the declaration of shader storage blocks. When a block declaration is qualified with a memory qualifier, it is as if all of its members were declared with the same memory qualifier. For example, the block declaration coherent buffer Block { readonly vec4 member1; vec4 member2; }; is equivalent to buffer Block { coherent readonly vec4 member1; coherent vec4 member2; }; Memory qualifiers are only supported in the declarations of image variables, buffer variables, and shader storage blocks; it is an error to use such qualifiers in any other declaration. Modify Section 5.5, Vector and Scalar Components and Length, p. 79 (modify last paragraph of section, p. 81) ... The type returned by .length() on a vector is int, and the value returned is considered a constant expression. Modify Section 5.6, Matrix Components, p. 81 (modify last paragraph of section, p. 81) ... The type returned by .length() on a matrix is int, and the value returned is considered a constant expression. Modify Section 5.9, Expressions, p. 83 (insert after 4th bullet of section, p. 83, correcting the oversight that .length() can also be used on vectors and matrices) * an expression of vector or matrix type with the length method applied Insert new section after Section 8.10, Atomic Counter Functions (p. 149) 8.X Atomic Memory Functions Atomic memory functions perform atomic operations on an individual signed or unsigned integer found in buffer object or shared variable storage. All of the atomic memory operations read a value from memory, compute a new value using one of the operations described below, write the new value to memory, and return the original value read. The contents of the memory being updated by the atomic operation are guaranteed not to be modified by any other assignment or atomic memory function in any shader invocation between the time the original value is read and the time the new value is written. Atomic memory functions are supported only for a limited set of variables. A shader will fail to compile if the value passed to the argument of an atomic memory function does not correspond to a buffer or shared variable. It is acceptable to pass an element of an array or a single component of a vector to the argument of an atomic memory function, as long as the underlying array or vector is a buffer or shared variable. Functions: uint atomicAdd(inout uint mem, uint data); int atomicAdd(inout int mem, int data); Computes a new value by adding the value of to the contents of . uint atomicMin(inout uint mem, uint data); int atomicMin(inout int mem, int data); Computes a new value by taking the minimum of the value of and the contents of . uint atomicMax(inout uint mem, uint data); int atomicMax(inout int mem, int data); Computes a new value by taking the maximum of the value of and the contents of . uint atomicAnd(inout uint mem, uint data); int atomicAnd(inout int mem, int data); Computes a new value by performing a bit-wise and of the value of and the contents of . uint atomicOr(inout uint mem, uint data); int atomicOr(inout int mem, int data); Computes a new value by performing a bit-wise or of the value of and the contents of . uint atomicXor(inout uint mem, uint data); int atomicXor(inout int mem, int data); Computes a new value by performing a bit-wise exclusive or of the value of and the contents of . uint atomicExchange(inout uint mem, uint data); int atomicExchange(inout int mem, int data); Computes a new value by simply copying the value of . uint atomicCompSwap(inout uint mem, uint compare, uint data); int atomicCompSwap(inout int mem, int compare, int data); Compares the value of and the contents of . If the values are equal, the new value is given by ; otherwise, it is taken from the original contents of . Additions to the AGL/EGL/GLX/WGL Specifications None GLX Protocol TBD Dependencies on OpenGL 4.3 and ARB_compute_shader: If OpenGL 4.3 and ARB_compute_shader are not supported, any references to uses of shader storage blocks in compute shaders, as well as the enumerant MAX_COMPUTE_SHADER_STORAGE_BLOCKS, should be removed. Additionally, this extension provides GLSL atomic memory functions that can be used with buffer variables (from this extension) and shared variables (from ARB_compute_shader). If ARB_compute_shader is not supported, references to shared variables should be removed from the language describing these functions. Note that no "#extension" directive is necessary to use atomic memory functions on shared variables in compute shaders. Dependencies on OpenGL 4.3 and ARB_program_interface_query If OpenGL 4.3 and ARB_program_interface_query are not supported, it wouldn't be possible to use GLSL query APIs to enumerate active buffer variables and shader storage blocks used by a program. We require that OpenGL 4.3 or ARB_program_interface_query be supported; this shouldn't be a problem for any implementations of this extension. Dependencies on NV_bindless_texture If NV_bindless_texture is supported (and enabled via the #extension directive), the restriction that image and sampler variables must be uniform variables not in blocks is lifted. In this case, image and sampler variables may be members in shader storage blocks. If an image variable is declared as a member of a shader storage block, the memory qualifiers on such variable declarations apply to the memory holding the block member and *not* the memory referenced by the image. If it is necessary to apply a memory qualifier to the memory referenced by an image variable found inside a shader storage block, it's possible to embed the image variable declaration in a sturcture and then embed the structure in a block. In the following example: struct S { readonly image2D x; }; buffer Block { S m; }; "readonly" is considered to apply to the memory pointed to by the image variable . In this example: buffer Block { readonly image2D m; } "readonly" is considered to apply to the memory holding the image handle. It would be illegal to write to , but it would be legal to write to the texture memory pointed to by (i.e., you can pass to imageStore). Errors INVALID_VALUE is generated by BindBufferRange if is SHADER_STORAGE_BUFFER and is greater than or equal to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS. INVALID_VALUE is generated by BindBufferRange if is SHADER_STORAGE_BUFFER and is not a multiple of the value of SHADER_STORAGE_BUFFER_OFFSET_ALIGNMENT. INVALID_VALUE is generated by ShaderStorageBlockBinding if is not an active shader storage block index of . INVALID_VALUE is generated by ShaderStorageBlockBinding if is is greater than or equal to the value of MAX_SHADER_STORAGE_BUFFER_BINDINGS. New State Add new table, labeled "Shader Storage Buffer State", after Table 6.58 (Atomic Counter State), p. 562: Initial Get Value Type Get Command Value Description Sec. ----------------------- ---- ----------- ------- ------------------------ ----- SHADER_STORAGE_BUFFER_BINDING Z+ GetIntegerv 0 Current value of generic 2.14.X shader storage buffer binding SHADER_STORAGE_BUFFER_BINDING n*Z+ GetIntegeri_v 0 Buffer object bound 2.14.X to each shader storage buffer binding point SHADER_STORAGE_BUFFER_START n*Z+ GetInteger64i_v 0 Start offset of 2.14.X binding range for each shader storage buffer SHADER_STORAGE_BUFFER_SIZE n*Z+ GetInteger64i_v 0 Size of binding range for 2.14.X each shader storage buffer New Implementation Dependent State Add to Table 6.66, Implementation Dependent Vertex Shader Limits, p. 570 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_VERTEX_SHADER_STORAGE_BLOCKS Z+ GetIntegerv 0 Number of shader storage 2.14.X blocks accessed by a vertex shader Add to Table 6.67, Implementation Dependent Tessellation Shader Limits, p. 571 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_TESS_CONTROL_SHADER_ Z+ GetIntegerv 0 Number of shader storage 2.14.X STORAGE_BLOCKS blocks accessed by a tess. control shader MAX_TESS_EVALUATION_SHADER_ Z+ GetIntegerv 0 Number of shader storage 2.14.X STORAGE_BLOCKS blocks accessed by a tess. evaluation shader Add to Table 6.68, Implementation Dependent Geometry Shader Limits, p. 572 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_GEOMETRY_SHADER_STORAGE_ Z+ GetIntegerv 0 Number of shader storage 2.14.X BLOCKS blocks accessed by a geometry shader Add to Table 6.69, Implementation Dependent Fragment Shader Limits, p. 573 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_FRAGMENT_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X BLOCKS blocks accessed by a fragment shader Add to new table in ARB_compute_shader, Implementation Dependent Compute Shader Limits Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_COMPUTE_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X BLOCKS blocks accessed by a compute shader Add to Table 6.70, Implementation Dependent Aggregate Shader Limits, p. 574 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_COMBINED_SHADER_STORAGE_ Z+ GetIntegerv 8 Number of shader storage 2.14.X BLOCKS blocks accessed by a program MAX_SHADER_STORAGE_BLOCK_SIZE Z+ GetInteger- 2^24 Maximum size in basic 2.14.X 64v machine units of a shader storage block SHADER_STORAGE_BUFFER_OFFSET_ Z+ GetIntegerv 256 Minimum required alignment 2.14.X ALIGNMENT for shader storage buffer binding offsets MAX_SHADER_STORAGE_BUFFER_ Z+ GetIntegerv 8 Maximum number of shader 2.14.X BINDINGS storage buffer bindings in the context Modify Table 6.71, Implementation Dependent Aggregate Shader Limits (cont.), p. 575 Get Value Type Get Command Minimum Value Description Sec. ----------------------- ---- ----------- ------------- ------------------------- ----- MAX_COMBINED_SHADER_OUTPUT _ Z+ GetIntegerv 8 limit on active image 3.10.22 RESOURCES units, shader storage blocks, and fragment outputs (The only change here is a rename of the token formerly called MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS.) Sample Code The following example code records a list of fragment (x,y) coordinates and colors in rasterized primitives into a buffer object. Fragment shader code would incude: #extension GL_ARB_shader_storage_buffer_object : require // Use an atomic counter to keep a running count of the number of // fragments recorded in the shader storage buffer. layout(binding=0, offset=0) uniform atomic_uint fragmentCounter; // Keep a uniform with the number of fragments that can be recorded in // the buffer. uniform uint maxFragmentCount; // Structure with the per-fragment information to record. struct FragmentData { ivec2 position; vec4 color; }; // Shader storage block holding an array declared without // a fixed size. Application code should determine how many fragments // it wants to record and allocate a buffer appropriately. With the // "std140" layout, each FragmentData record will take 32B. With other // layouts, the stride of the array is implementation-dependent. The // "binding=2" layout qualifier says that the block should // be associated with shader storage buffer binding point #2. layout(std140, binding=2) buffer Fragments { FragmentData fragments[]; }; in vec4 color; void main() { uint fragmentNumber = atomicCounterIncrement(fragmentCounter); if (fragmentNumber < maxFragmentCount) { fragments[fragmentNumber].position = ivec2(gl_FragCoord.xy); fragments[fragmentNumber].color = color; } } In application code #define NFRAGMENTS 100000 #define FRAGMENT_SIZE 32 // known due to "std140" usage GLuint fragmentBuffer, counterBuffer; // Generate, bind, and specify the data store to hold fragments. The // NULL pointer in BufferData says that the intial buffer contents are // undefined. They will be filled in by the fragment shader code. glGenBuffers(1, &fragmentBuffer); glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 2, fragmentBuffer); glBufferData(GL_SHADER_STORAGE_BUFFER, NFRAGMENTS*FRAGMENT_SIZE, NULL, GL_DYNAMIC_DRAW); // Generate, bind, and specify the data store for the atomic counter. glGenBuffers(1, &counterBuffer); glBindBufferBase(GL_ATOMIC_COUNTER_BUFFER, 0, counterBuffer); glBufferData(GL_ATOMIC_COUNTER_BUFFER, sizeof(GLuint), NULL, GL_DYNAMIC_DRAW); // Reset the atomic counter to zero, then draw stuff. This will record // values into the shader storage buffer as fragments are generated. GLuint zero = 0; glBufferSubData(GL_ATOMIC_COUNTER_BUFFER, 0, sizeof(GLuint), &zero); glUseProgram(program); glDrawElements(GL_TRIANGLES, ...); // You could inspect the contents with a call such as: void *ptr = glMapBuffer(GL_SHADER_STORAGE_BUFFER, GL_READ_ONLY); ... glUnmapBuffer(GL_SHADER_STORAGE_BUFFER); // You could also use the storage buffer contents for vertex pulling. // The glMemoryBarrier() command ensures that the data writes to the // storage buffer complete prior to vertex pulling. glMemoryBarrier(GL_VERTEX_ATTRIB_ARRAY_BARRIER_BIT); glBindBuffer(GL_ARRAY_BUFFER, fragmentBuffer); glVertexAttribIPointer(0, 2, GL_INT, GL_FALSE, FRAGMENT_SIZE, (void*)0); glVertexAttribPointer(1, 4, GL_FLOAT, GL_FALSE, FRAGMENT_SIZE, (void*)16); glEnableVertexAttribArray(0); glEnableVertexAttribArray(1); glDrawArrays(GL_POINTS, ...); Conformance Tests TBD Issues (1) The main goal of this extension is to allow C-style GLSL shader code to write to buffer objects without using roundabout hacks like creating buffer textures and using shader image loads and stores. What other approaches could we take to achieving the same thing? RESOLVED: We are using "shader storage blocks" as an abstraction similar to uniform blocks, except that we allow shaders to write to "shader storage blocks". Other options considered include: - Use uniform blocks, but with a special layout qualifier (e.g., "writeonly" or "readwrite") that implies different semantics and implementation-dependent limits. This would avoid the need for a new storage qualifier in the shading language, and could also avoid adding new GL APIs to enumerate active buffer variables and shader storage blocks. However, it would have the disadvantage of shoehorning two features, which might be implemented very differently in hardware, into a single abstraction. - Use C-style pointer syntax as in NV_shader_buffer_store, but treat the pointers as referring to a buffer binding rather than a specific GPU address. In this approach, pointers might be required to be uniform. (In NV_shader_buffer_store, pointers are just data. They can be passed as uniforms, uniform block members, shader inputs/outputs, reconstructed from texture data, or however the application wants to pass them.) (2) When using shader storage blocks to append records to a buffer, the storage is provided by a buffer object. There doesn't seem to be any reason why the shader really needs to know the "length" of the buffer. It might therefore want to declare global storage blocks containing unsized arrays. Should we allow this? If so, how does that interact with bounds checking? What does it mean for the ".length()" method in GLSL? What would happen if you tried to pass such an array as a function parameter? What does it mean for a possible introspection API allowing applications to query how big the block needs to be? RESOLVED: We will support shader storage blocks whose last member is an unsized array. For this unsized array, the effective size will be determined at run-time from the size of the data store. Such unsized arrays can be indexed with general integer expressions (other than negative constant expressions, which are generally forbidden for array indexing in GLSL). The ".length()" method is not supported, nor is passing the array as a function argument. When using the ARB_program_interface_query extension to enumerate the set of active buffer variables, only the first element of arrays (sized or unsized) will be enumerated; the array size and offsets for array elements other than the first can be determined by querying the TOP_LEVEL_ARRAY_SIZE and TOP_LEVEL_ARRAY_STRIDE properties of the buffer variable. The bounds checking rules for unsized arrays at the end of shader storage blocks are the same as for uniform blocks. If the array is accessed using an index pointing at memory beyond the end of the buffer object associated with the shader storage blocks, the results are undefined and can lead to program termination; see also issue (7). Other options considered here included having the shader declare an array with a dummy size that's either unrealistically small (1 or 2) or unrealistically large, and providing guarantees like: - (small) if the last element of the storage block is an array, we have defined behavior for indexed accesses off the end of the array, as long as the effective offset is contained within the buffer; or - (large) if the buffer is too small for the large declared array, we have defined behavior for accesses to array elements as long as the effective offset is contained within the buffer. Note that it wouldn't be possible for the application to determine the stride of an array of structures if it were declared with a size of 1. For a size of 2 or larger, you could use offset(array[1].member) - offset(array[0].member) for "shared" layouts at least, but that's not possible if there is no "array[1].member". (3) Do we allow arrays of shader storage blocks? RESOLVED: Yes; we already allow arrays of uniform blocks, where each block instance has an identical layout but is backed by a separate buffer object. It seems like we should do this here for consistency. If we had overloaded the existing uniform block APIs (e.g., by applying a "readwrite" layout qualifier to uniform blocks), it would be really weird if we disallowed arrays of writeable uniform blocks since we already allow it for regular (read-only) uniform blocks. (4) We have typically provided some sort of "introspection" API where application code written with no explicit knowledge of the shaders used can discover properties of active variables. Should we provide some here? If so, any pitfalls? RESOLVED: Yes, we will provide an introspection API, but not as part of this extension. Instead, we require support for the ARB_program_interface_query extension, which provides a generic mechanism for enumerating the set of active resources for a number of "interfaces". This API includes interfaces for all active shader storage blocks as well as all active buffer variables. Supporting enumeration of these new resources was one of the primary motivations for the generic ARB_program_interface_query extension; however, that extension also added enumeration support for other resources that previously had no enumeration API. The enumeration of buffer variables follows slightly different rules than other variables; in particular, only the first element of members declared as arrays are enumerated. The previous enumeration rules would have awful consequences when applied to large arrays of structures in shader storage blocks. For example, the following declaration would report 80K active uniforms, starting with "records[0].position" and ending with "records[39999].texcoord". Ouch! struct FragmentData { vec4 position; vec2 texcoord; }; buffer FragmentInfo { FragmentData records[40000]; }; Regular uniforms and UBOs also have exactly the same problem; the primary difference is that current implementation limits on uniform storage provide a bounds on how bad this could get. Even those limits might not actually bound the GetActiveUniform* badness, as the spec doesn't require a program to link successfully for GetActiveUniform* to enumerate uniforms. (5) Uniform blocks already have a well-established usage model, for which implementations may have dedicated support as well as limits that reflect this usage model. If we were to overload uniform blocks, some new uses might not meet this limit and usage model. Is that a problem? RESOLVED: Yes, it could have been a problem if we had overloaded uniform blocks. Implementations may be able to distinguish between different types of uniform blocks, which might be implemented differently. One might be able to distinguish based on the size of the block as well as the layout qualifier (i.e., "readwrite" might be "different" than "readonly"). Note that if an implementation wants to use the size of the block as a factor for determining how the block is accessed, this would introduce a new wrinkle into the unsized array use case above. That might not be a huge deal; implementations could make a worst-case assumption and treat the effective size of an unsized array as resulting in a maximum-size buffer object. Note that this consideration applies equally to purely read-only uniform storage. For example, implementations might have a limit on the size of uniform blocks that can be accessed by shaders with accelerated hardware support. However, applications might well want to store large data sets in buffer objects and access them using random-access reads in shader code. OpenGL 4.2's mechanisms allow data to be pulled from buffer objects for vertex shaders using vertex buffers (but only using the vertex/instance number as an index). Data can also be read from a texture buffer object via texelFetch(), but that doesn't allow for more complex data structures (as noted above in the "write" example above). It would be desirable to have a mechanism to allow random access reads to "large" buffer objects, even if the implementation and performance characterstics are different from regular UBO usage. NVIDIA's NV_shader_buffer_load extension fills this need by allowing the use of read-only pointers. That extension has been supported for a longer time and is supported on more platforms than the NV_shader_buffer_store mentioned above. (6) The size of uniform blocks on typical OpenGL 3/4 implementations is 64KB. Is this good enough for shader storage buffers, or do we need a higher limit? RESOLVED: 64K is not good enough; a higher limit is required. The current specification requires implementations to support shader storage blocks of at least 2^24 bytes (16MB). Implementations may support larger sizes; the maximum size can be determined by querying MAX_SHADER_STORAGE_BLOCK_SIZE. Because implementations may choose to support block sizes >= 2^31 bytes, applications should query the maximum size with GetInteger64v(). (7) How are write accesses to shader storage blocks bounds-checked? RESOLVED: For shader storage blocks, we use the same language found in the current OpenGL 4.2 specification for uniform blocks, which guarantees no bounds-checking: | If any active uniform block is not backed by a sufficiently | large buffer object, the results of shader execution are | undefined, and may result in GL interruption or termination. It would be desirable for to have a "robustness" feature that provides more solid guarantees when accessing outside the bounds of a buffer object range. However, such a feature is not present in the existing ARB_robustness specification and is considered orthogonal to the functionality being added here. If we were to add bounds-checking here or in the future, there may still be issues of how bounds-checking would be performed, with multiple use cases. For example, some existing UBO hardware might include hardware bounds checking (e.g., return zeroes if accessing off the end of a buffer object), but that support might not be extended to cover writes or even some other read-only use cases. If shader-based bounds checking is required, using code inserted by the compiler, we'd have to figure out how to specify it. In particular, we'd have to figure out what granularity the check be done at. At the byte/word level? Using the first index of the array? struct FragmentData { vec4 position; vec2 texcoord; }; layout(writeonly,binding=2) uniform FragmentInfo { FragmentData records[40000]; }; In the example above, let's assume that the structure was tightly packed, where each element of requires exactly 24 bytes -- 16 for and 8 for . If we bound a 32-byte buffer, what would happen to reads/writes of records[1].position? Are reads/writes of the x/y components guaranteed to work, with "out-of-bounds" behavior on z/w? What about a 31-byte buffer -- do you read/write partial data for records[1].position.y? What about a 40-byte buffer, which contains sufficient storage for all of records[1].position? Is it guaranteed to work, or should we allow implementations to treat accesses to array elements out of bounds unless the buffer storage for the entire element (including records[1].texcoord in this case). (8) Should we provide new "packing" layout qualifiers to augment the existing vec4-centric "std140" rule for uniform blocks? RESOLVED: Yes, add a new "std430" layout that provided for tighter packing of arrays and structures. With "std140", the base alignment of arrays of scalars and vectors and of structures is always a multiple of the base alignment of a vec4 (16B), which means that the stride of an array of type "float", "int", or "uint" is 16B instead of 4B. With "std430", such arrays will now be tightly packed. Note that in the "std430" packing, arrays of vec3s are still not tightly packed; vec3 types still require a 16B alignment as in "std140". Note that the "std430" layout is supported only for shader storage blocks, and not for uniform blocks. (9) Should we allow memory qualifiers ("coherent", "volatile", "restrict", "readonly", and "writeonly") to apply to entire shader storage blocks? To individual shader storage block members. RESOLVED: We allow memory qualifiers to apply to both shader storage blocks and block members (buffer variables). When a memory qualifier is applied to a block declaration, it is considered to apply to all block members. Note that the extension NV_bindless_texture allows image variables (which accept memory qualifiers) to be declared as members of shader storage blocks (which also accept memory qualifiers). This spec adds an interaction that says that if this case occurs, the qualifier is considered to apply to the image handle, stored in the block, and not the memory referenced by the image. (10) Should we allow mutable assignments of storage blocks to binding points? RESOLVED: Yes, allow them in a manner similar to uniform blocks, since OpenGL 4.2's atomic counter buffer feature requires the "binding=N" layout in atomic counter declarations and doesn't let you change the binding used post-link. However, we decided to use the same behavior as uniform blocks, since the functionality seems so similar. (11) Is this extension/feature really needed? Isn't it possible to do something similar in unextended OpenGL 4.2? RESOLVED: Yes, it's possible to achieve similar functionality in unextended OpenGL 4.2, but something cleaner is clearly desirable. One of the intended uses of OpenGL 4.2's atomic counter feature (ARB_shader_atomic_counters) is to allow shader invocations to write values generated by shaders into a buffer object, using the atomic counters to reserve a unique slot number in an array of outputs. The array itself is accessed by associating the buffer object with a buffer texture (ARB_texture_buffer_object) and writing to that texture using shader image stores (ARB_shader_image_load_store). There are a number of unfortunate limitations of this approach: * Buffers written to using image stores must have a 1- to 4-component texture format associated with them. It's not possible to write out an array of structures, though one can use multiple buffers with each buffer holding a separate member. * The image store function takes a canonical vec4/ivec4/uvec4 value to write, regardless of the value stored. If you're only storing a float or a vec2, you need to use a constructor (or a swizzle hack) to generate a vec4 in which the extra components are ignored. * The image store function takes signed integer coordinates (like the texelFetch built-ins). However, the atomic counter returns an unsigned value, and GLSL doesn't support implicit conversions from unsigned to signed. * Image stores to buffers require the use of a buffer texture, even though we don't ever use it as a texture. The solution offered here is far more direct -- shader code simply declares the format of the buffer object as an interface block and can read and write the buffer using normal shader code. (12) Are there other extensions providing similar functionality? RESOLVED: Yes. The NVIDIA extension NV_shader_buffer_store also provides a mechanism where buffer objects can be written to with regular shader code. Using that extension, an application is able to query a GPU address of a buffer, make that buffer resident, and then access the buffer in GLSL code using the queried GPU address as a pointer. Applications using NV_shader_buffer_store are required to ensure that pointers are valid and no automatic bounds checking is provided. This proposed extension is intended to provide GLSL functionality similar to what you can get with NV_shader_buffer_store, but without general pointers. Instead, this extension uses bindings, with shader code effectively extracting a pointer from the bound buffer. (13) Do we need some sort of limit on the combined sum of actively used shader storage blocks and other resources, similar to what we had for image units in OpenGL 4.2 (MAX_COMBINED_IMAGE_UNITS_ AND_FRAGMENT_OUTPUTS)? RESOLVED: Yes. For this extension, we just add shader storage blocks to the set of resources that have a combined limit and also create a new general token name (MAX_COMBINED_SHADER_OUTPUT_RESOURCES) that is a new alias of the old combined limit token. Some OpenGL 4.2 and 4.3 implementations need to share a single set of internal hardware resources to handle fragment shader outputs, image loads and stores (from OpenGL 4.3 and ARB_shader_image_load_store), as well as shader storage buffers. We specify that a link error will occur if a program requires more of these internal resources than are available. It is expected that implementations without a need for a combined limit will expose a limit greater than or equal to the sum of the individual limits for each shader stage and resource type. This link error have interaction problems with the ARB_separate_shader_objects extension and OpenGL 4.1. When linking a separable program, the linker will not know anything about the usage of fragment shader outputs, image units, and shader storage blocks from other programs that could be in use at the same time as the program being linked. This makes it seemingly impossible to enforce a combined limit. In practice, this is unlikely to be a problem because the implementations needing to enforce this combined limit will support the use of image uniforms and shader storage blocks only in fragment and compute shaders, and those two stages can't run concurrently. (14) Are accesses to shader storage buffers coherent with other accesses to the same underlying resource (e.g., image loads/stores, texture fetches)? In the same shader invocation? In different shader invocations? RESOLVED: No; we don't guarantee coherent accesses between shader resources of different types. Spec language corresponding to this issue will be proposed outside this extension. (15) Do we really need to have a combined limit on the sum of the number of active shader storage blocks for each program stage (MAX_COMBINED_SHADER_STORAGE_BLOCKS)? RESOLVED: We include such a limit, following the precedent of providing a combined limit for each new resource with per-stage limits. It's not clear that this combined limit is needed by any current implementation, though we envision an implementation that could have a set of physical resources shared between shader stages without providing a full set of resources for every stage. Some implementations do need a combined limit on the number of fragment shader outputs, image uniforms, and shader storage blocks, which is handled by the separate MAX_COMBINED_SHADER_OUTPUT_RESOURCES limit discussed in issue (13). (16) How does an application determine the required buffer object size for a shader storage block whose last member is an unsized array? RESOLVED: The ARB_program_interface_query extension includes a property BUFFER_SIZE that can be queried for active shader storage blocks. For blocks where all members have known storage requirements, the value of this property gives the minimum buffer size required to back the shader storage block. For shader storage blocks ending in an unsized array, the BUFFER_SIZE property returns the minimum buffer size needed to store a single element in the unsized array. The actual storage requirements are a function of the number of elements the application wants to store in the buffer object. If an application needs to store N elements in the unsized array, the required size can be derived by minimum_size = buffer_size + (N-1) * top_level_stride where is the value of the BUFFER_SIZE property of the shader storage block, and is the value of the TOP_LEVEL_STRIDE property for the unsized array. Note that when using the "std140" layout qualifier, applications can determine the layout of shader storage blocks without any queries by following the layout rules documented in the API specification. (17) Should we provide GLSL constants for the implementation-dependent limits in this specification (e.g., gl_MaxVertexShaderStorageBlocks)? RESOLVED: No. It's not clear that these constants are of any real value, and they've been specified inconsistently. In particular, we have a bunch of constants for atomic counters, atomic counter buffers, and image units/uniforms, but we don't have any limits for uniform blocks (ARB_uniform_buffer_object). (18) Other than the last member of a shader storage block, should we allow block members declared without an explicit size? RESOLVED: Yes, for consistency with the rest of GLSL. GLSL in general allows for arrays declared without a size. Such arrays are implicitly sized by the compiler based on usage. For example, if a shader includes code such as: uniform int array[]; // no explicit size ... expression = array[2] * array[9]; // only references to the array is likely to be implicitly sized to 10 elements, since it needs to provide storage for array[9]. These implicitly sized arrays are also permitted in interface blocks, such as uniform blocks. When an array is declared in shader code, there are limitations on how the array can be used. Such arrays may not be passed to functions in their entirety or used by the ".length()" method. Additionally, the array may only be indexed with integer constant expressions. If the last member of a shader storage block is declared as an array without an explicit size, it will be considered to be an explicitly unsized array whose size will be inferred at run-time based on the provided buffer object. Such arrays can be indexed with arbitrary expressions, but can not be passed as function arguments or be used by the ".length()" method. Note that when using uniform or shader storage blocks using the "shared" or "std140" layout qualifier, shaders should avoid using implicitly sized arrays. In this case, the size will be inferred by the compiler based on shader code and might not be computed identically for multiple programs using the same block. (19) Should the ".length()" method be supported for unsized arrays at the end of a shader storage block? If not, how can shader code determine the effective size of an unsized array? RESOLVED: In previous versions of GLSL, the ".length()" method is not supported for arrays without a declared size, which means that its value is known at compile time. As a result, the value returned by ".length()" is considered a constant expression. In this expression, we allow unsized arrays at the end of shader storage blocks, and allow the ".length()" method to be used to determine the size of such arrays based on the size of the provided buffer object. The derived array size can be derived by reversing the process described in issue (16): array.length() = max((buffer_object_size - offset_of_array) / stride_of_array, 0) Given that we will support the ".length()" method on unsized arrays, we will also support on implicitly sized arrays for consistency. For such arrays, the array size will be determined at link time but will not be considered a constant expression. Revision History Revision 15, September 23, 2013 (Jon Leech) - Fix typo ShaderStorageBinding -> ShaderStorageBlockBinding in the description of that command (Bug 10715). Revision 14, September 6, 2013 (Jon Leech) - Fix typo SHADER_STORAGE_BLOCK -> SHADER_STORAGE_BUFFER in the description of ShaderStorageBlockBinding (Bug 10795). Revision 13, June 1, 2012 (pbrown) - Mark issues (8) and (9) as resolved. Revision 12, May 31, 2012 (pbrown) - Modify spec to allow the "std430" layout qualifier only on shader storage blocks, not uniform blocks (bug 8992). Revision 11, May 14, 2012 (pbrown) - Further clarify the interaction with ARB_compute_shader on atomic memory functions; add a clarification that no #extension directive is needed to use these functions on shared memory variables in compute shaders. Revision 10, May 8, 2012 (pbrown) - Add explicit language specifying that the value returned by the .length() method for unsized arrays is undefined when the array is in an array of blocks dereferenced with an out-of-bounds index. Revision 9, May 7, 2012 (pbrown) - Allow the use of the .length() method on unsized and implicitly sized arrays. For unsized arrays in shader storage blocks, .length() will be computed from the size of the associated buffer object. For implicitly sized arrays, .length() will be determined at link time. Revision 8, May 3, 2012 (pbrown) - Add a "std430" layout qualifier supporting more tightly packed arrays and structures relative to "std140" for issue (8). - Add support for memory qualifiers on shader storage block declarations for issue (9), also add more explicit language on how these qualifiers work on buffer variables. - Add spec language making it illegal to use "readonly" and "writeonly" memory qualifiers on the same declaration. - Remove built-in constants for shader storage block implementation limits, as described in issue (17). - Mark various spec issues as resolved per the Khronos F2F. - Add interaction with NV_bindless_texture, describing the behavior of memory qualifiers on image variables inside shader storage blocks. Revision 7, April 25, 2012 (pbrown) - Remove the GLSL spec language generally disallowing unsized arrays in interface blocks (bug 8837). We have supported implicitly sized arrays in blocks in previous versions of GLSL and decided to retain backward compatibility. - Added a warning in the descript the "shared" layout qualifier indicating that such blocks might not be shareable between programs if they contain implicitly-sized array members. - Minor typo/wording fixes. - Fixed token table to describe all the general query functions (e.g., GetIntegerv, GetInteger64) where certain tokens can be used. - Update the spec to require dynamically uniform indexing on arrays of shader storage blocks. - Added issues (18) and (19). Revision 6, April 16, 2012 (pbrown) - Tentatively add built-in constants for implementation limits on shader storage blocks, as well as new issue (17) on the topic. Revision 5, April 13, 2012 (pbrown) - Add missing #extension and #define built-in documentation for the GLSL part of the extension. - Add GLSL spec language documenting support for unsized arrays at the end of shader storage blocks. - Add GLSL spec language generally disallowing unsized arrays in interface blocks, including input/output blocks, uniform blocks, and shader storage buffers (bug 8837). This borrows from similar language where unsized arrays are not permitted in structures. - Extend the tables describing API tokens enumerating GLSL types to indicate the set of types that can be used for buffer variables. - Add sample code. - Update language for several issues, and mark them as resolved. - Add an issue indicating how an application can determine the required size of a shader storage buffer when using unsized arrays. Revision 4, April 12, 2012 (pbrown) - Remove the enumeration APIs for buffer variables and shader storage blocks; these resources can only be enumerated using the new APIs provided by the ARB_program_interface_query extension. - Add an interaction with ARB_program_interface_query, and have this spec require that extension to ensure that the queries are available. - Add a new interaction with ARB_compute_shader; the atomic memory functions provided in this extension for buffer variables can also be used for shared variables in compute shaders. Also add new compute shader limit for active storage blocks. - Add values for new enumerants in this extension. - Fix up the "New Procedures and Functions" and "New Tokens" sections. - Assign enumerant values for all tokens. - Add a new token MAX_COMBINED_SHADER_OUTPUT_RESOURCES that's an alias for MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS. That combined limit now needs to apply to fragment outputs, image units, and shader storage blocks. - General cleanup of API specification language for shader storage blocks. - Add documentation of per-stage and combined limits in "Shader Execution" spec langauge, and a validation error for exceeding combined limits with separate program objects. - Add new edits to Appendix A and Appendix D. - Add appropriate text to the Dependencies, New Errors, New State, and New Implementation-Dependent State sections. - Add some new issues; update issue (13). Revision 3, January 23, 2012 (pbrown) - Add actual spec language in place of the previous "here's our options" overview. Clean up the overview and issues section to reflect the general approach chosen in the initial feature discussion. - Note: Lists of new enumerants, functions, state, and errors have not been built yet. Revision 2, January 3, 2012 (pbrown) - Move issues from overview to separate section in preparation for further edits; no other changes. Revision 1, October 26, 2011 (pbrown) - Initial sketch/proposal, containing only an introduction and issues list.