Name
NV_gpu_shader5
Name Strings
GL_NV_gpu_shader5
Contact
Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)
Contributors
Barthold Lichtenbelt, NVIDIA
Chris Dodd, NVIDIA
Eric Werness, NVIDIA
Greg Roth, NVIDIA
Jeff Bolz, NVIDIA
Piers Daniell, NVIDIA
Status
Shipping.
Version
Last Modified Date: 03/23/2010
NVIDIA Revision: 6
Number
389
Dependencies
This extension is written against the OpenGL 3.2 (Compatibility Profile)
Specification.
This extension is written against version 1.50 (revision 09) of the OpenGL
Shading Language Specification.
OpenGL 3.2 and GLSL 1.50 are required.
ARB_gpu_shader5 is required.
This extension interacts with ARB_gpu_shader5.
This extension interacts with ARB_gpu_shader_fp64.
This extension interacts with ARB_tessellation_shader.
This extension interacts with NV_shader_buffer_load.
This extension interacts with EXT_direct_state_access.
This extension interacts with EXT_vertex_attrib_64bit and
NV_vertex_attrib_integer_64bit.
Overview
This extension provides a set of new features to the OpenGL Shading
Language and related APIs to support capabilities of new GPUs. Shaders
using the new functionality provided by this extension should enable this
functionality via the construct
#extension GL_NV_gpu_shader5 : require (or enable)
This extension was developed concurrently with the ARB_gpu_shader5
extension, and provides a superset of the features provided there. The
features common to both extensions are documented in the ARB_gpu_shader5
specification; this document describes only the addition language features
not available via ARB_gpu_shader5. A shader that enables this extension
via an #extension directive also implicitly enables the common
capabilities provided by ARB_gpu_shader5.
In addition to the capabilities of ARB_gpu_shader5, this extension
provides a variety of new features for all shader types, including:
* support for a full set of 8-, 16-, 32-, and 64-bit scalar and vector
data types, including uniform API, uniform buffer object, and shader
input and output support;
* the ability to aggregate samplers into arrays, index these arrays with
arbitrary expressions, and not require that non-constant indices be
uniform across all shader invocations;
* new built-in functions to pack and unpack 64-bit integer types into a
two-component 32-bit integer vector;
* new built-in functions to pack and unpack 32-bit unsigned integer
types into a two-component 16-bit floating-point vector;
* new built-in functions to convert double-precision floating-point
values to or from their 64-bit integer bit encodings;
* new built-in functions to compute the composite of a set of boolean
conditions a group of shader threads;
* vector relational functions supporting comparisons of vectors of 8-,
16-, and 64-bit integer types or 16-bit floating-point types; and
* extending texel offset support to allow loading texel offsets from
regular integer operands computed at run-time, except for lookups with
gradients (textureGrad*).
This extension also provides additional support for processing patch
primitives (introduced by ARB_tessellation_shader).
ARB_tessellation_shader requires the use of a tessellation evaluation
shader when processing patches, which means that patches will never
survive past the tessellation pipeline stage. This extension lifts that
restriction, and allows patches to proceed further in the pipeline and be
used
* as input to a geometry shader, using a new "patches" layout qualifier;
* as input to transform feedback;
* by fixed-function rasterization stages, in which case the patches are
drawn as independent points.
Additionally, it allows geometry shaders to read per-patch attributes
written by a tessellation control shader using input variables declared
with "patch in".
New Procedures and Functions
void Uniform1i64NV(int location, int64EXT x);
void Uniform2i64NV(int location, int64EXT x, int64EXT y);
void Uniform3i64NV(int location, int64EXT x, int64EXT y, int64EXT z);
void Uniform4i64NV(int location, int64EXT x, int64EXT y, int64EXT z,
int64EXT w);
void Uniform1i64vNV(int location, sizei count, const int64EXT *value);
void Uniform2i64vNV(int location, sizei count, const int64EXT *value);
void Uniform3i64vNV(int location, sizei count, const int64EXT *value);
void Uniform4i64vNV(int location, sizei count, const int64EXT *value);
void Uniform1ui64NV(int location, uint64EXT x);
void Uniform2ui64NV(int location, uint64EXT x, uint64EXT y);
void Uniform3ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z);
void Uniform4ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z,
uint64EXT w);
void Uniform1ui64vNV(int location, sizei count, const uint64EXT *value);
void Uniform2ui64vNV(int location, sizei count, const uint64EXT *value);
void Uniform3ui64vNV(int location, sizei count, const uint64EXT *value);
void Uniform4ui64vNV(int location, sizei count, const uint64EXT *value);
void GetUniformi64vNV(uint program, int location, int64EXT *params);
(The following function is also provided by NV_shader_buffer_load.)
void GetUniformui64vNV(uint program, int location, uint64EXT *params);
(All of the following ProgramUniform* functions are supported if and only
if EXT_direct_state_access is supported.)
void ProgramUniform1i64NV(uint program, int location, int64EXT x);
void ProgramUniform2i64NV(uint program, int location, int64EXT x,
int64EXT y);
void ProgramUniform3i64NV(uint program, int location, int64EXT x,
int64EXT y, int64EXT z);
void ProgramUniform4i64NV(uint program, int location, int64EXT x,
int64EXT y, int64EXT z, int64EXT w);
void ProgramUniform1i64vNV(uint program, int location, sizei count,
const int64EXT *value);
void ProgramUniform2i64vNV(uint program, int location, sizei count,
const int64EXT *value);
void ProgramUniform3i64vNV(uint program, int location, sizei count,
const int64EXT *value);
void ProgramUniform4i64vNV(uint program, int location, sizei count,
const int64EXT *value);
void ProgramUniform1ui64NV(uint program, int location, uint64EXT x);
void ProgramUniform2ui64NV(uint program, int location, uint64EXT x,
uint64EXT y);
void ProgramUniform3ui64NV(uint program, int location, uint64EXT x,
uint64EXT y, uint64EXT z);
void ProgramUniform4ui64NV(uint program, int location, uint64EXT x,
uint64EXT y, uint64EXT z, uint64EXT w);
void ProgramUniform1ui64vNV(uint program, int location, sizei count,
const uint64EXT *value);
void ProgramUniform2ui64vNV(uint program, int location, sizei count,
const uint64EXT *value);
void ProgramUniform3ui64vNV(uint program, int location, sizei count,
const uint64EXT *value);
void ProgramUniform4ui64vNV(uint program, int location, sizei count,
const uint64EXT *value);
New Tokens
Returned by the parameter of GetActiveAttrib, GetActiveUniform, and
GetTransformFeedbackVarying:
INT64_NV 0x140E
UNSIGNED_INT64_NV 0x140F
INT8_NV 0x8FE0
INT8_VEC2_NV 0x8FE1
INT8_VEC3_NV 0x8FE2
INT8_VEC4_NV 0x8FE3
INT16_NV 0x8FE4
INT16_VEC2_NV 0x8FE5
INT16_VEC3_NV 0x8FE6
INT16_VEC4_NV 0x8FE7
INT64_VEC2_NV 0x8FE9
INT64_VEC3_NV 0x8FEA
INT64_VEC4_NV 0x8FEB
UNSIGNED_INT8_NV 0x8FEC
UNSIGNED_INT8_VEC2_NV 0x8FED
UNSIGNED_INT8_VEC3_NV 0x8FEE
UNSIGNED_INT8_VEC4_NV 0x8FEF
UNSIGNED_INT16_NV 0x8FF0
UNSIGNED_INT16_VEC2_NV 0x8FF1
UNSIGNED_INT16_VEC3_NV 0x8FF2
UNSIGNED_INT16_VEC4_NV 0x8FF3
UNSIGNED_INT64_VEC2_NV 0x8FF5
UNSIGNED_INT64_VEC3_NV 0x8FF6
UNSIGNED_INT64_VEC4_NV 0x8FF7
FLOAT16_NV 0x8FF8
FLOAT16_VEC2_NV 0x8FF9
FLOAT16_VEC3_NV 0x8FFA
FLOAT16_VEC4_NV 0x8FFB
(If ARB_tessellation_shader is supported, the following enum is accepted
by a new primitive.)
Accepted by the parameter of BeginTransformFeedback:
PATCHES
Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)
Modify Section 2.6.1, Begin and End, p. 22
(Extend language describing PATCHES introduced by ARB_tessellation_shader.
It particular, add the following to the end of the description of the
primitive type.)
If a patch primitive is drawn, each patch is drawn separately as a
collection of points, which each patch vertex definining a separate point.
Extra vertices from an incomplete patch are never drawn.
Modify Section 2.14.3, Vertex Attributes, p. 86
(modify the second paragraph, p. 87) ... exceeds MAX_VERTEX_ATTRIBS. For
the purposes of this comparison, attribute variables of the type i64vec3,
u64vec3, i64vec4, and u64vec4 count as consuming twice as many attributes
as equivalent single-precision types.
(extend the list of types in the first paragraph, p. 88)
... UNSIGNED_INT_VEC3, UNSIGNED_INT_VEC4, INT8_NV, INT8_VEC2_NV,
INT8_VEC3_NV, INT8_VEC4_NV, INT16_NV, INT16_VEC2_NV, INT16_VEC3_NV,
INT16_VEC4_NV, INT64_NV, INT64_VEC2_NV, INT64_VEC3_NV, INT64_VEC4_NV,
UNSIGNED_INT8_NV, UNSIGNED_INT8_VEC2_NV, UNSIGNED_INT8_VEC3_NV,
UNSIGNED_INT8_VEC4_NV, UNSIGNED_INT16_NV, UNSIGNED_INT16_VEC2_NV,
UNSIGNED_INT16_VEC3_NV, UNSIGNED_INT16_VEC4_NV, UNSIGNED_INT64_NV,
UNSIGNED_INT64_VEC2_NV, UNSIGNED_INT64_VEC3_NV, UNSIGNED_INT64_VEC4_NV,
FLOAT16_NV, FLOAT16_VEC2_NV, FLOAT16_VEC3_NV, or FLOAT16_VEC4_NV.
Modify Section 2.14.4, Uniform Variables, p. 89
(modify third paragraph, p. 90) ... uniform variable storage for a vertex
shader. A scalar or vector uniform with with 64-bit integer components
will consume no more than 2 components, where is 1 for scalars, and
the component count for vectors. A link error is generated ...
(add to Table 2.13, p. 96)
Type Name Token Keyword
-------------------- ----------------
INT8_NV int8_t
INT8_VEC2_NV i8vec2
INT8_VEC3_NV i8vec3
INT8_VEC4_NV i8vec4
INT16_NV int16_t
INT16_VEC2_NV i16vec2
INT16_VEC3_NV i16vec3
INT16_VEC4_NV i16vec4
INT64_NV int64_t
INT64_VEC2_NV i64vec2
INT64_VEC3_NV i64vec3
INT64_VEC4_NV i64vec4
UNSIGNED_INT8_NV uint8_t
UNSIGNED_INT8_VEC2_NV u8vec2
UNSIGNED_INT8_VEC3_NV u8vec3
UNSIGNED_INT8_VEC4_NV u8vec4
UNSIGNED_INT16_NV uint16_t
UNSIGNED_INT16_VEC2_NV u16vec2
UNSIGNED_INT16_VEC3_NV u16vec3
UNSIGNED_INT16_VEC4_NV u16vec4
UNSIGNED_INT64_NV uint64_t
UNSIGNED_INT64_VEC2_NV u64vec2
UNSIGNED_INT64_VEC3_NV u64vec3
UNSIGNED_INT64_VEC4_NV u64vec4
FLOAT16_NV float16_t
FLOAT16_VEC2_NV f16vec2
FLOAT16_VEC3_NV f16vec3
FLOAT16_VEC4_NV f16vec4
(modify list of commands at the bottom of p. 99)
void Uniform{1,2,3,4}{i64,ui64}NV(int location, T value);
void Uniform{1,2,3,4}{i64,ui64}vNV(int location, T value);
(insert after fourth paragraph, p. 100) The Uniform*i64{v}NV and
Uniform*ui64{v}NV commands will load sets of one to four 64-bit
signed or unsigned integer values into a uniform location defined as a
64-bit signed or unsigned integer scalar or vector types.
(modify "Uniform Buffer Object Storage", p. 102, adding two bullets after
the last "Members of type", and modifying the subsequent bullet)
* Members of type int8_t, int16_t, and int64_t are extracted from a
buffer object by reading a single byte, short, or int64-typed value at
the specified offset.
* Members of type uint8_t, uint16_t, and uint64_t are extracted from a
buffer object by reading a single ubyte, ushort, or uint64-typed value
at the specified offset.
* Members of type float16_t are extracted from a buffer object by reading
a single half-typed value at the specified offset.
* Vectors with N elements with basic data types of bool, int, uint,
float, double, int8_t, int16_t, int64_t, uint8_t, uint16_t, uint64_t,
or float16_t are extracted as N values in consecutive memory locations
beginning at the specified offset, with components stored in order with
the first (X) component at the lowest offset. The GL data type used for
component extraction is derived according to the rules for scalar
members above.
Modify Section 2.14.6, Varying Variables, p. 106
(modify third paragraph, p. 107) ... For the purposes of counting input
and output components consumed by a shader, variables declared as vectors,
matrices, and arrays will all consume multiple components. Each component
of variables declared as 64-bit integer scalars or vectors, will be
counted as consuming two components.
(add after the bulleted list, p. 108) For the purposes of counting the
total number of components to capture, each component of outputs declared
as 64-bit integer scalars or vectors will be counted as consuming two
components.
Modify Section 2.15.1, Geometry Shader Input Primitives, p. 118
(add new qualifier at the end of the section, p. 120)
Patches (patches)
Geometry shaders that operate on patches are valid for the PATCHES
primitive type. The number of vertices available to each program
invocation is equal to the vertex count of the variable-size patch, with
vertices presented to the geometry shader in the order specified in the
patch.
Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121
(add to the end of "Geometry Shader Inputs", p. 123)
Geometry shaders also support built-in and user-defined per-primitive
inputs. The following built-in inputs, not replicated per-vertex and not
contained in gl_in[], are supported:
* The variable gl_PatchVerticesIn is filled with the number of the
vertices in the input primitive.
* The variables gl_TessLevelOuter[] and gl_TessLevelInner[] are arrays
holding outer and inner tessellation levels of an input patch. If a
tessellation control shader is active, the tessellation levels will be
taken from the corresponding outputs of the tessellation control
shader. Otherwise, the default levels provided as patch parameters
are used. Tessellation level values loaded in these variables will be
prior to the clamping and rounding operations performed by the
primitive generator as described in Section 2.X.2 of
ARB_tessellation_shader. For triangular tessellation,
gl_TessLevelOuter[3] and gl_TessLevelInner[1] will be undefined. For
isoline tessellation, gl_TessLevelOuter[2], gl_TessLevelOuter[3], and
both values in gl_TessLevelInner[] are undefined.
Additionally, a geometry shader with an input primitive type of "patches"
may declare per-patch input variables using the qualifier "patch in".
Unlike per-vertex inputs, per-patch inputs do not correspond to any
specific vertex in the input primitive, and are not indexed by vertex
number. Per-patch inputs declared as arrays have multiple values for the
input patch; similarly declared per-vertex inputs would indicate a single
value for each vertex in the output patch. User-defined per-patch input
variables are filled with corresponding per-patch output values written by
the tessellation control shader. If no tessellation control shader is
active, all such variables are undefined.
Per-patch input variables and the built-in inputs "gl_PatchVerticesIn",
"gl_TessLevelOuter[]", and "gl_TessLevelInner[]" are supported only for
geometry shaders with an input primitive type of "patches". A program
will fail to link if any such variable is used in a geometry shader with a
input primitive type other than "patches".
Modify Section 2.19, Transform Feedback, p. 130
(add to Table 2.14, p. 131)
Transform Feedback
primitiveMode allowed render primitive modes
---------------------- ---------------------------------
PATCHES PATCHES
(modify first paragraph, p. 131) ... is one of TRIANGLES,
LINES, POINTS, or PATCHES and specifies the type of primitives that will
be recorded into the buffer objects bound for transform feedback (see
below). ...
(modify last paragraph, p. 131 and first paragraph, p. 132, adding patch
support, and dealing with capture of 8- and 16-bit components)
When an individual point, line, triangle, or patch primitive reaches the
transform feedback stage ... When capturing line, triangle, and patch
primitives, all attributes ... For multi-component varying variables or
varying array elements, the individual components are written in order.
For variables with 8- or 16-bit fixed- or floating-point components,
individual components will be converted to and stored as equivalent values
of type "int", "uint", or "float". The value for any attribute specified
...
(modify next-to-last paragraph, p. 132) ... is not incremented. If
transform feedback receives a primitive that fits in the remaining space
after such an overflow occurs, that primitive may or may not be recorded.
Primitives that fail to fit in the remaining space are never recorded.
Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)
None.
Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)
None.
Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)
None.
Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)
Modify Section 6.1.15, Shader and Program Queries, p. 332
(add to the first list of commands, p. 337)
void GetUniformi64vNV(uint program, int location, int64EXT *params);
void GetUniformui64vNV(uint program, int location, uint64EXT *params);
Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)
None.
Additions to the AGL/GLX/WGL Specifications
None.
Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)
Including the following line in a shader can be used to control the
language features described in this extension:
#extension GL_NV_gpu_shader5 :
where is as specified in section 3.3.
New preprocessor #defines are added to the OpenGL Shading Language:
#define GL_NV_gpu_shader5 1
If the features of this extension are enabled by an #extension directive,
shading language features documented in the ARB_gpu_shader5 extension will
also be provided.
Modify Section 3.6, Keywords, p. 15
(add the following to the list of reserved keywords)
int8_t i8vec2 i8vec3 i8vec4
int16_t i16vec2 i16vec3 i16vec4
int32_t i32vec2 i32vec3 i32vec4
int64_t i64vec2 i64vec3 i64vec4
uint8_t u8vec2 u8vec3 u8vec4
uint16_t u16vec2 u16vec3 u16vec4
uint32_t u32vec2 u32vec3 u32vec4
uint64_t u64vec2 u64vec3 u64vec4
float16_t f16vec2 f16vec3 f16vec4
float32_t f32vec2 f32vec3 f32vec4
float64_t f64vec2 f64vec3 f64vec4
(note: the "float64_t" and "f64vec*" types are available if and only if
ARB_gpu_shader_fp64 is also supported)
Modify Section 4.1, Basic Types, p. 18
(add to the basic "Transparent Types" table, p. 18)
Types Meaning
-------- ----------------------------------------------------------
int8_t an 8-bit signed integer
i8vec2 a two-component signed integer vector (8-bit components)
i8vec3 a three-component signed integer vector (8-bit components)
i8vec4 a four-component signed integer vector (8-bit components)
int16_t a 16-bit signed integer
i16vec2 a two-component signed integer vector (16-bit components)
i16vec3 a three-component signed integer vector (16-bit components)
i16vec4 a four-component signed integer vector (16-bit components)
int32_t a 32-bit signed integer
i32vec2 a two-component signed integer vector (32-bit components)
i32vec3 a three-component signed integer vector (32-bit components)
i32vec4 a four-component signed integer vector (32-bit components)
int64_t a 64-bit signed integer
i64vec2 a two-component signed integer vector (64-bit components)
i64vec3 a three-component signed integer vector (64-bit components)
i64vec4 a four-component signed integer vector (64-bit components)
uint8_t a 8-bit unsigned integer
u8vec2 a two-component unsigned integer vector (8-bit components)
u8vec3 a three-component unsigned integer vector (8-bit components)
u8vec4 a four-component unsigned integer vector (8-bit components)
uint16_t a 16-bit unsigned integer
u16vec2 a two-component unsigned integer vector (16-bit components)
u16vec3 a three-component unsigned integer vector (16-bit components)
u16vec4 a four-component unsigned integer vector (16-bit components)
uint32_t a 32-bit unsigned integer
u32vec2 a two-component unsigned integer vector (32-bit components)
u32vec3 a three-component unsigned integer vector (32-bit components)
u32vec4 a four-component unsigned integer vector (32-bit components)
uint64_t a 64-bit unsigned integer
u64vec2 a two-component unsigned integer vector (64-bit components)
u64vec3 a three-component unsigned integer vector (64-bit components)
u64vec4 a four-component unsigned integer vector (64-bit components)
float16_t a single 16-bit floating-point value
f16vec2 a two-component floating-point vector (16-bit components)
f16vec3 a three-component floating-point vector (16-bit components)
f16vec4 a four-component floating-point vector (16-bit components)
float32_t a single 32-bit floating-point value
f32vec2 a two-component floating-point vector (32-bit components)
f32vec3 a three-component floating-point vector (32-bit components)
f32vec4 a four-component floating-point vector (32-bit components)
float64_t a single 64-bit floating-point value
f64vec2 a two-component floating-point vector (64-bit components)
f64vec3 a three-component floating-point vector (64-bit components)
f64vec4 a four-component floating-point vector (64-bit components)
Modify Section 4.1.3, Integers, p. 20
(add after the first paragraph of the section, p. 20)
Variables with the types "int8_t", "int16_t", and "int64_t" represent
signed integer values with exactly 8, 16, or 64 bits, respectively.
Variables with the type "uint8_t", "uint16_t", and "uint64_t" represent
unsigned integer values with exactly 8, 16, or 64 bits, respectively.
Variables with the type "int32_t" and "uint32_t" represent signed and
unsigned integer values with 32 bits, and are equivalent to "int" and
"uint" types, respectively.
(modify the grammar, p. 21, adding "L" and "UL" suffixes)
integer-suffix: one of
u U l L ul UL
(modify next-to-last paragraph, p. 21) ... When the suffix "u" or "U" is
present, the literal has type . When the suffix "l" or "L" is
present, the literal has type . When the suffix "ul" or "UL" is
present, the literal has type . Otherwise, the type is
. ...
Modify Section 4.1.4, Floats, p. 22
(insert after second paragraph, p. 22)
Variables of type "float16_t" represent floating-point using exactly 16
bits and are stored using the 16-bit floating-point representation
described in of the OpenGL Specification. Variables of type "float32_t"
and "float64_t" represent floating-point with 32 or 64 bits, and are
equivalent to "float" and "double" types, respectively.
Modify Section 4.1.7, Samplers, p. 23
(modify 1st paragraph of the section, deleting the restriction requiring
constant indexing of sampler arrays) ... Samplers may aggregated into
arrays within a shader (using square brackets [ ]) and can be indexed with
general integer expressions. The results of accessing a sampler array
with an out-of-bounds index are undefined. ...
(remove the additional restriction added by ARB_gpu_shader5 making a
similar edit requiring uniform indexing across shader invocations for
defined results. NV_gpu_shader5 has no such limitation.)
Modify Section 4.1.10, Implicit Conversions, p. 27
(modify table of implicit conversions)
Can be implicitly
Type of expression converted to
-------------------- -------------------------------
int uint, float, double(*)
ivec2 uvec2, vec2, dvec2(*)
ivec3 uvec3, vec3, dvec3(*)
ivec4 uvec4, vec4, dvec4(*)
int8_t int16_t int, int64_t, uint, uint64_t,
float, double(*)
i8vec2 i16vec2 ivec2, i64vec2, uvec2, u64vec2,
vec2, dvec2(*)
i8vec3 i16vec3 ivec3, i64vec3, uvec3, u64vec3,
vec3, dvec3(*)
i8vec4 i16vec4 ivec4, i64vec4, uvec4, u64vec4,
vec4, dvec4(*)
int64_t uint64_t, double(*)
i64vec2 u64vec2, dvec2(*)
i64vec3 u64vec3, dvec3(*)
i64vec4 u64vec4, dvec4(*)
uint float, double(*)
uvec2 vec2, dvec2(*)
uvec3 vec3, dvec3(*)
uvec4 vec4, dvec4(*)
uint8_t uint16_t uint, uint64_t, float, double(*)
u8vec2 u16vec2 uvec2, u64vec2, vec2, dvec2(*)
u8vec3 i16vec3 uvec3, u64vec3, vec3, dvec3(*)
u8vec4 i16vec4 uvec4, u64vec4, vec4, dvec4(*)
uint64_t double(*)
u64vec2 dvec2(*)
u64vec3 dvec3(*)
u64vec4 dvec4(*)
float double(*)
vec2 dvec2(*)
vec3 dvec3(*)
vec4 dvec4(*)
float16_t float, double(*)
f16vec2 vec2, dvec2(*)
f16vec3 vec3, dvec3(*)
f16vec4 vec4, dvec4(*)
(*) if ARB_gpu_shader_fp64 is supported
(Note: Expressions of type "int32_t", "uint32_t", "float32_t", and
"float64_t" are treated as identical to those of type "int", "uint",
"float", and "double", respectively. Implicit conversions to and from
these explicitly-sized types are allowed whenever conversions involving
the equivalent base type are allowed.)
(modify second paragraph of the section) No implicit conversions are
provided to convert from unsigned to signed integer types, from
floating-point to integer types, from higher-precision to lower-precision
types, from 8-bit to 16-bit types, or between matrix types. There are no
implicit array or structure conversions.
(add before the final paragraph of the section, p. 27)
(insert before the final paragraph of the section) When performing
implicit conversion for binary operators, there may be multiple data types
to which the two operands can be converted. For example, when adding an
int8_t value to a uint16_t value, both values can be implicitly converted
to uint, uint64_t, float, and double. In such cases, a floating-point
type is chosen if either operand has a floating-point type. Otherwise, an
unsigned integer type is chosen if either operand has an unsigned integer
type. Otherwise, a signed integer type is chosen. If operands can be
converted to both 32- and 64-bit versions of the chosen base data type,
the 32-bit version is used.
Modify Section 4.3.4, Inputs, p. 31
(modify third paragraph of section, p. 31, allowing explicitly-sized
types) ... Vertex shader inputs variables can only signed and unsigned
integers, floats, doubles, explicitly-sized integers and floating-point
values, vectors of any of these types, and matrices. ...
(modify edits done in ARB_tessellation_shader adding support for "patch
in", allowing for geometry shaders as well) Additionally, tessellation
evaluation and geometry shaders support per-patch input variables declared
with the "patch in" qualifier. Per-patch input ...
(modify third paragraph, p. 32) ... Fragment inputs can only be signed and
unsigned integers, floats, doubles, explicitly-sized integers and
floating-point values, vectors of any of these types, matrices, or arrays
or structures of these. Fragment inputs declared as signed or unsigned
integers, doubles, 64-bit floating-point values, including vectors,
matrices, or arrays derived from those types, must be qualified as "flat".
Modify Section 4.3.6, Outputs, p. 33
(modify third paragraph of the section, p. 33) ... They can only be signed
and unsigned integers, floats, doubles, explicitly-sized integers and
floating-point values, vectors of any of these types, matrices, or arrays
or structures of these.
(modify last paragraph, p. 33) ... Fragment outputs can only be signed
and unsigned integers, floats, explicitly-sized integers and
floating-point values with 32 or fewer bits, vectors of any of these
types, or arrays of these. Doubles, 64-bit integers or floating-point
values, vectors or arrays of those types, matrices, and structures cannot
be output. ...
Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37
(add to the list of qualifiers for geometry shaders, p. 37)
layout-qualifier-id:
...
triangles_adjacency
patches
(modify the "size of input arrays" table, p. 38)
Layout Size of Input Arrays
------------ --------------------
patches gl_MaxPatchVertices
(add paragraph below that table, p. 38)
When using the input primitive type "patches", the geometry shader is used
to process a set of patches with vertex counts that may vary from patch to
patch. For the purposes of input array sizing, patches are treated as
having a vertex count fixed at the implementation-dependent maximum patch
size, gl_MaxPatchVertices. If a shader reads an input corresponding to a
vertex not found in the patch being processed, the values read are
undefined.
Modify Section 5.4.1, Conversion and Scalar Constructors, p. 49
(add after first list of constructor examples)
Similar constructors are provided to convert to and from explicitly-sized
scalar data types, as well:
float(uint8_t) // converts an 8-bit uint value to a float
int64_t(double) // converts a double value to a 64-bit int
float64_t(int16_t) // converts a 16-bit int value to a 64-bit float
uint16_t(bool) // converts a Boolean value to a 16-bit uint
(replace final two paragraphs, p. 49, and the first paragraph, p. 50,
using more general language)
When constructors are used to convert any floating-point type to any
integer type, the fractional part of the floating-point value is dropped.
It is undefined to convert a negative floating point value to an unsigned
integer type.
When a constructor is used to convert any integer or floating-point type
to bool, 0 and 0.0 are converted to false, and non-zero values are
converted to true. When a constructor is used to convert a bool to any
integer or floating-point type, false is converted to 0 or 0.0, and true
is converted to 1 or 1.0.
Constructors converting between signed and unsigned integers with the same
bit count always preserve the bit pattern of the input. This will change
the value of the argument if its most significant bit is set, converting a
negative signed integer to a large unsigned integer, or vice versa.
Modify Section 5.9, Expressions, p. 57
(modify bulleted list as follows, adding support for expressions with
64-bit integer types)
Expressions in the shading language are built from the following:
* Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
types, and all matrix types.
...
* The arithmetic binary operators add (+), subtract (-), multiply (*), and
divide (/) operate on 32-bit integer, 64-bit integer, and floating-point
scalars, vectors, and matrices. If the fundamental types of the
operands do not match, the conversions from Section 4.1.10 "Implicit
Conversions" are applied to produce matching types. ...
* The operator modulus (%) operate on 32- and 64-bit integer scalars or
vectors. If the fundamental types of the operands do not match, the
conversions from Section 4.1.10 "Implicit Conversions" are applied to
produce matching types. ...
* The arithmetic unary operators negate (-), post- and pre-increment and
decrement (-- and ++) operate on 32-bit integer, 64-bit integer, and
floating-point values (including vectors and matrices). ...
* The relational operators greater than (>), less than (<), and less than
or equal (<=) operate only on scalar 32-bit integer, 64-bit integer, and
floating-point expressions. The result is scalar Boolean. The
fundamental type of the two operands must match, either as specified, or
after one of the implicit type conversions specified in Section 4.1.10.
...
* The equality operators equal (==), and not equal (!=) operate only on
scalar 32-bit integer, 64-bit integer, and floating-point expressions.
The result is scalar Boolean. The fundamental type of the two operands
must match, either as specified, or after one of the implicit type
conversions specified in Section 4.1.10. ...
Modify Section 6.1, Function Definitions, p. 63
(ARB_gpu_shader5 adds a set of rules for defining whether implicit
conversions for one matching funcction definition are better or worse than
those for another. These comparisons are done argument by argument.
Extend the edits made by ARB_gpu_shader5 to add several new rules for
comparing implicit conversions for a single argument, corresponding to the
new data types introduced by this extension.)
To determine whether the conversion for a single argument in one match is
better than that for another match, the following rules are applied, in
order:
1. An exact match is better than a match involving any implicit
conversion.
2. A match involving a conversion from a signed integer, unsigned
integer, or floating-point type to a similar type having a larger
number of bits is better a match not involving another conversion.
The set of conversions qualifying under this rule are:
source types destination types
----------------- -----------------
int8_t, int16_t int, int64_t
int int64_t
uint8_t, uint16_t uint, uint64_t
uint uint64_t
float16_t float
float double
3. A match involving one conversion in rule 2 is better than a match
involving another conversion in rule 2 if:
(a) both conversions start with the same type and the first
conversion is to a type with a smaller number of bits (e.g.,
converting from int16_t to int is preferred to converting
int16_t to int64_t), or
(b) both conversions end with the same type and the first
conversion is from a type with a larger number of bits (e.g.,
converting an "out" parameter from int16_t to int is preferred
to convering from int8_t to int).
4. A match involving an implicit conversion from any integer type to
float is better than a match involving an implicit conversion from
any integer type to double.
Modify Section 7.1, Vertex and Geometry Shader Special Variables, p. 69
(NOTE: These edits are written against the re-organized section in the
ARB_tessellation_shader specification.)
(add to the list of built-ins inputs for geometry shaders) In the geometry
language, built-in input and output variables are intrinsically declared
as:
in int gl_PatchVerticesIn;
patch in float gl_TessLevelOuter[4];
patch in float gl_TessLevelInner[2];
...
The input variable gl_PatchVerticesIn behaves as in the identically-named
tessellation control and evaluation shader inputs.
The input variables gl_TessLevelOuter[] and gl_TessLevelInner[] behave as
in the identically-named tessellation evaluation shader inputs.
Modify Chapter 8, Built-in Functions, p. 81
(add to description of generic types, last paragraph of p. 69) ... Where
the input arguments (and corresponding output) can be int64_t, i64vec2,
i64vec3, or i64vec4, is used as the argument. Where the
input arguments (and corresponding output) can be uint64_t, u64vec2,
u64vec3, or u64vec4, is used as the argument.
Modify Section 8.3, Common Functions, p. 84
(add support for 64-bit integer packing and unpacking functions)
Syntax:
int64_t packInt2x32(ivec2 v);
uint64_t packUint2x32(uvec2 v);
ivec2 unpackInt2x32(int64_t v);
uvec2 unpackUint2x32(uint64_t v);
The functions packInt2x32() and packUint2x32() return a signed or unsigned
64-bit integer obtained by packing the components of a two-component
signed or unsigned integer vector, respectively. The first vector
component specifies the 32 least significant bits; the second component
specifies the 32 most significant bits.
The functions unpackInt2x32() and unpackUint2x32() return a signed or
unsigned integer vector built from a 64-bit signed or unsigned integer
scalar, respectively. The first component of the vector contains the 32
least significant bits of the input; the second component consists the 32
most significant bits.
(add support for 16-bit floating-point packing and unpacking functions)
Syntax:
uint packFloat2x16(f16vec2 v);
f16vec2 unpackFloat2x16(uint v);
The function packFloat2x16() returns an unsigned integer obtained by
interpreting the components of a two-component 16-bit floating-point as
integers according to OpenGL Specification, and then packing the two
16-bit integers into a 32-bit unsigned integer. The first vector
component specifies the 16 least significant bits of the result; the
second component specifies the 16 most significant bits.
The function unpackFloat2x16() returns a two-component vector with 16-bit
floating-point components obtained by unpacking a 32-bit unsigned integer
into a pair of 16-bit values, and interpreting those values as 16-bit
floating-point numbers according to the OpenGL Specification. The first
component of the vector is obtained from the 16 least significant bits of
the double; the second component is obtained from the 16 most significant
bits.
(add functions to get/set the bit encoding for floating-point values)
64-bit floating-point data types in the OpenGL shading language are
specified to be encoded according to the IEEE specification for
double-precision floating-point values. The functions below allow shaders
to convert double-precision floating-point values to and from 64-bit
signed or unsigned integers representing their encoding.
To obtain signed or unsigned integer values holding the encoding of a
floating-point value, use:
genI64Type doubleBitsToInt64(genDType value);
genU64Type doubleBitsToUint64(genDType value);
Conversions are done on a component-by-component basis.
To obtain a floating-point value corresponding to a signed or unsigned
integer encoding, use:
genDType int64BitsToDouble(genI64Type value);
genDType uint64BitsToDouble(genU64Type value);
(add functions to evaluate predicates over groups of threads)
Syntax:
bool anyThreadNV(bool value);
bool allThreadsNV(bool value);
bool allThreadsEqualNV(bool value);
Implementations of the OpenGL Shading Language may, but are not required,
to run multiple shader threads for a single stage as a SIMD thread group,
where individual execution threads are assigned to thread groups in an
undefined, implementation-dependent order. Algorithms may benefit from
being able to evaluate a composite of boolean values over all active
threads in the thread group.
The function anyThreadNV() returns true if and only if is true for
at least one active thread in the group. The function allThreadsNV()
returns true if and only if is true for all active threads in the
group. The function allThreadsEqualNV() returns true if is the
same for all active threads in the group; the result of
allThreadsEqualNV() will be true if and only if anyThreadNV() and
allThreadsNV() would return the same value.
Since these functions depends on the values of in an undefined
group of threads, the value returned by these functions is largely
undefined. However, anyThreadNV() is guaranteed to return true if
is true, and allThreadsNV() is guaranteed to return false if is
false.
Since implementations are generally not required to combine threads into
groups, simply returning for anyThreadNV() and allThreadsNV() and
returning true for allThreadsEqualNV() is a legal implementation of these
functions.
Modify Section 8.6, Vector Relational Functions, p. 90
(modify the first paragraph, p. 90, adding support for relational
functions operating on explicitly-sized types)
Relational and equality operators (<, <=, >, >=, ==, !=) are defined (or
reserved) to operate on scalars and produce scalar Boolean results. For
vector results, use the following built-in functions. In the definitions
below, the following terms are used as placeholders for all vector types
for a given fundamental data type:
placeholder fundamental types
----------- ------------------------------------------------
bvec bvec2, bvec3, bvec4
ivec ivec2, ivec3, ivec4, i8vec2, i8vec3, i8vec4,
i16vec2, i16vec3, i16vec4, i64vec2, i64vec3, i64vec4
uvec uvec2, uvec3, uvec4, u8vec2, u8vec3, u8vec4,
u16vec2, u16vec3, u16vec4, u64vec2, u64vec3, u64vec4
vec vec2, vec3, vec4, dvec2(*), dvec3(*), dvec4(*),
f16vec2, f16vec3, f16vec4
(*) only if ARB_gpu_shader_fp64 is supported
In all cases, the sizes of the input and return vectors for any
particular call must match.
Modify Section 8.7, Texture Lookup Functions, p. 91
(modify text for textureOffset() functions, p. 94, allowing non-constant
offsets)
Do a texture lookup as in texture but with offset added to the (u,v,w)
texel coordinates before looking up each texel. The value need
not be constant; however, a limited range of offset values are supported.
If any component of is less than MIN_PROGRAM_TEXEL_OFFSET_EXT or
greater than MAX_PROGRAM_TEXEL_OFFSET_EXT, the offset applied to the
texture coordinates is undefined. Note that offset does not apply to the
layer coordinate for texture arrays. This is explained in detail in
section 3.9.9 of the OpenGL Specification (Version 3.2, Compatibility
Profile), where offset is (delta_u, delta_v, delta_w). Note that texel
offsets are also not supported for cube maps.
(Note: This lifting of the constant offset restriction also applies to
texelFetchOffset, p. 95, textureProjOffset, p. 95, textureLodOffset,
p. 96, textureProjLodOffset, p. 96.)
(modify to the description of the textureGradOffset() functions, p. 97,
preserving the restriction on constant offsets)
Do a texture lookup with both explicit gradient and offset, as described
in textureGrad and textureOffset. For these functions, the offset value
must be a constant expression. A limited range of offset values are
supported; the minimum and maximum offset values are
implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET and
MAX_PROGRAM_TEXEL_OFFSET, respectively.
(modify to the description of the textureProjGradOffset() functions,
p. 98, preserving the restriction on constant offsets)
Do a texture lookup projectively and with explicit gradient as described
in textureProjGrad, as well as with offset, as described in textureOffset.
For these functions, the offset value must be a constant expression. A
limited range of offset values are supported; the minimum and maximum
offset values are implementation-dependent and given by
MIN_PROGRAM_TEXEL_OFFSET and MAX_PROGRAM_TEXEL_OFFSET, respectively.
Modify Section 9, Shading Language Grammar, p. 92
!!! TBD !!!
GLX Protocol
TBD
Dependencies on ARB_gpu_shader5
This extension also incorporates all the changes to the OpenGL Shading
Language made by ARB_gpu_shader5; enabling this extension by a #extension
directive in shader code also enables all features of ARB_gpu_shader5 as
though the shader code has also declared
#extension GL_ARB_gpu_shader5 : enable
The converse is not true; implementations supporting both extensions
should not provide the shading language features in this extension if
shader code #extension directives enable only ARB_gpu_shader5.
This specification and ARB_gpu_shader5 both lift the restriction in GLSL
1.50 requiring that indexing in arrays of samplers must be done with
constant expressions. However, ARB_gpu_shader5 specifies that results are
undefined if the indices would diverge if multiple shader invocations are
run in lockstep. This extension does not impose the non-divergent
indexing requirement.
Dependencies on ARB_gpu_shader_fp64
This extension and ARB_gpu_shader_fp64 both provide support for shading
language variables with 64-bit components. If both extensions are
supported, the various edits describing this new support should be
combined.
If ARB_gpu_shader_fp64 is not supported, the following edits should be
removed:
* language adding the data types "float64_t", "f64vec2", "f64vec3", and
"f64vec4";
* language allowing implicit conversions of various types to double,
dvec2, dvec3, or dvec4; and
* the built-in functions doubleBitsToInt64(), doubleBitsToUint64(),
int64BitsToDouble(), and uint64BitsToDouble().
Dependencies on ARB_tessellation_shader
If ARB_tessellation_shader is not supported, language introduced by this
extension describing processing patches in geometry shaders, transform
feedback, and rasterization should be removed.
If this extension and ARB_tessellation_shader are supported, it is legal
to send patches past the tessellation stage -- the following language from
ARB_tessellation_shader is removed:
Patch primitives are not supported by pipeline stages below the
tessellation evaluation shader. If there is no active program object or
the active program object does not contain a tessellation evaluation
shader, the error INVALID_OPERATION is generated by Begin (or vertex
array commands that implicitly call Begin) if the primitive mode is
PATCHES.
Dependencies on NV_shader_buffer_load
If NV_shader_buffer_load is supported, that specification should be edited
as follows, to allow pointers to dereference the new data types added by
this extension.
Modify "Section 2.20.X, Shader Memory Access" from NV_shader_buffer_load.
(add rules for loads of variables having the new data types from this
extension to the list of bullets following "When a shader dereferences a
pointer variable")
- Data of type "int8_t," "int16_t", "int32_t", and "int64_t" are read
from or written to memory as a single 8-, 16-, 32-, or 64-bit signed
integer value at the specified GPU address.
- Data of type "uint8_t," "uint16_t", "uint32_t", and "uint64_t" are read
from or written to memory as a single 8-, 16-, 32-, or 64-bit unsigned
integer value at the specified GPU address.
- Data of type "float16_t", "float32_t", and "float64_t" are read from or
written to memory as a single 16-, 32-, or 64-bit floating-point value
at the specified GPU address.
Dependencies on EXT_direct_state_access
If EXT_direct_state_access is supported, that specification should be
edited as follows to include new ProgramUniform* functions.
(modify the ProgramUniform* language)
The following commands:
....
void ProgramUniform{1,2,3,4}{i64,ui64}NV
(uint program int location, T value);
void ProgramUniform{1,2,3,4}{i64,ui64}vNV
(uint program, int location, const T *value);
operate identically to the corresponding command where "Program" is
deleted from the name (and extension suffixes are dropped or updated
appropriately) except, rather than updating the currently active program
object, these "Program" commands update the program object named by the
parameter. ...
Dependencies on EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit
The EXT_vertex_attrib_64bit extension provides the ability to specify
64-bit floating-point vertex attributes in a GLSL vertex shader and the
specify the values of these attributes via the OpenGL API. To
successfully compile vertex shaders with fp64 input variables, is
necessary to include
#extension GL_EXT_vertex_attrib_64bit : enable
in the shader text.
However, this extension is considered to enable 64-bit floating-point and
integer inputs. Including the following code in a vertex shader
#extension GL_NV_gpu_shader5 : enable
will 64-bit floating-point or integer input variables whose values would
be specified using the OpenGL API mechanisms found in the
EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit extensions.
Errors
None.
New State
None.
New Implementation Dependent State
None.
Issues
(1) What implicit conversions are supported by this extension on top of
those provided by related extensions?
RESOLVED: ARB_gpu_shader5 and ARB_gpu_shader_fp64 provide new implicit
conversions from "int" to "uint", and from "int", "uint", and "float" to
"double".
This extension provides integer types of multiple sizes and supports
implicit conversions from small integer types to 32- or 64-bit integer
types of the same signedness, as well as float and double. It also
provides floating-point types of multiple sizes and supports implicit
conversions from smaller to larger types. Additionally, it supports
conversion from 64-bit integer types to double.
(2) How do these implicit conversions impact binary operators?
RESOLVED: For binary operators, we prefer converting to a common type
that is as close as possible in size and type to the original
expression.
(3) How do these implicit conversions impact function overloading rules?
RESOLVED: We extend the preference rules in ARB_gpu_shader5 to account
for the new data types, adding rules to:
* favor new "promotions" in integer/floating point types (previously,
the only promotion was float-to-double)
* for promotions, favor conversion to the type closer in size (e.g.,
prefer converting from int16_t to int over converting to int64_t)
(4) What should be done to distinguish between 32- and 64-bit integer
constants?
RESOLVED: We will use "L" and "UL" to identify signed and unsigned
64-bit integer constants; the use of "L" matches a similar ("long")
suffix in the C programming language. C leaves the size of integer
types implementation-dependent, and many implementations require an "LL"
suffix to declare 64-bit integer constants. With our size definitions,
"L" will be considered sufficient to make an integer constant 64-bit.
(5) Should provide support for vertex attributes with 64-bit components,
and if so, how should the support be provided in the OpenGL API?
RESOLVED: Yes, this seems like useful functionality, particularly for
applications wanting to provide double-precision or 64-bit integer data
to shaders performing computations on such types. We provide
VertexAttribL* entry points for 64-bit components in the separate
EXT_vertex_attrib_64bit and NV_vertex_attrib_64bit extensions, which
should be supported on all implementations supporting this extension.
(6) Should we allow vertex attributes with 8- or 16-bit components in the
shading language, and if so, how does it interact with the OpenGL API?
RESOLVED: Yes, but we will use existing APIs to specify such
attributes, which already typically allow 8- and 16-bit components on
the API side. Vertex attribute components (other than 64-bit ones)
specified by the API will be converted from the type specified in the
vertex attribute commands to the component type of the attribute. For
floating-point values, that may involve 16-to-32 bit conversion or vice
versa. For integer types, that may involve dropping all but the least
significant bits of attribute components.
(7) Should we support uniforms with double or 64-bit attribute types, and
if so, how? Should we support uniforms with <32-bit components, and
if so, how?
RESOLVED: We will support uniforms of all component types, either in a
buffer object (via OpenGL 3.1 or ARB_uniform_buffer_object) or in
storage associated with the program.
When uniforms are stored in buffer object, they are stored using their
native data types according to the pre-existing packing and layout
rules. Those rules were already written to be able to accommodate both
the larger and smaller new data types.
Uniforms stored in program objects are loaded with Uniform* APIs. There
are no pre-existing uniform APIs accepting doubles or other "long"
types, so there was no clear need to add an extra "L" to the name to
distinguish from other APIs like we do with VertexAttribL* APIs.
Uniforms with 8- and 16- bit components are loaded with the "larger"
Uniform*{i,ui,f} APIs; it didn't seem worth it to add numerous entry
points to the APIs to handle all those new types.
(8) How do the uniform loading commands introduced by this extension
interact similar commands added by NV_shader_buffer_load?
RESOLVED: NV_shader_buffer_load provided the command Uniformui64NV to
load pointer uniforms with a single 64-bit unsigned integer. This
extension provides vectors of 64-bit unsigned integers, so we needed
Uniform{2,3,4}ui64NV commands. We chose to provide a Uniform1ui64NV
command, which will be functionally equivalent to Uniformui64NV.
(9) How will transform feedback work for capturing variables with double
or 64-bit components? Should we support transform feedback on
variables with components with fewer than 32 bits?
RESOLVED: Transform feedback will support variables with any component
size. Components with fewer than 32-bits are converted to their
equivalent 32-bit types.
For doubles and variables with 64-bit components, each component
captured will count as 64-bit values and occupy two components for the
purpose of component counting rules. This could be a problem for the
SEPARATE_ATTRIBS mode, since the minimum component limit is four, which
would not be sufficient to capture a dvec3 or dvec4. However,
implementations supporting this extension should also be able to support
ARB_transform_feedback3, which extends INTERLEAVED_ATTRIBS mode to
capture vertex attribute values interleaved into multiple buffers. That
functionality effectively obsoletes the SEPARATE_ATTRIBS mode, since it
is a functional superset.
We considered support for capturing 8- and 16-bit values directly, which
had a number of problems. First, full byte addressing might impose both
alignment issues (e.g., capturing a uint8_t followed by a float might
misalign the float) and additional hardware implementation burdens. One
other option would be to pack multiple values into a 32-bit integer
(e.g., f16vec2 would be packed with .x in the LSBs and .y in the MSBs).
This could work, even with word addressing, but would require padding
for odd sizes (e.g., f16vec2 padded to two words, with the second word
holding only .z). It would also have endianness issues; packed values
would look like arrays of the corresponding smaller type on
little-endian systems, but not on big-endian ones.
(10) What precision will be used for computation, storage, and inter-stage
transfer of 8- and 16-bit component data types?
RESOLVED: The components may be considered to occupy a full 32 bits for
the purposes of input/output component count limits. 8- and 16-bit
values should, however, be passed at that precision.
(11) Is the new support for non-constant texel offsets completely
orthogonal?
RESOLVED: No. Non-constant offsets are not supported for the existing
functions textureGradOffset() and textureProjGradOffset(), or for the
new functions textureGatherOffsets() and shadowGatherOffsets().
(12) Should we provide functions like intBitsToFloat() that operate on
16-bit floating-point values?
RESOLVED: Not in this extension. Such conversions can be performed
using the following code:
uint16_t float16BitsToUint16(float16_t v)
{
return uint16_t(packFloat2x16(f16vec2(v, 0));
}
float16_t uint16BitsToFloat16(uint16_t v)
{
return unpackFloat2x16(uint(v)).x;
}
(13) Should we provide distinct sized types for 32-bit integers and
floats, and 64-bit floats? Should we provide those types as aliases
for existing unsized types? Or should we provide no such types at
all?
RESOLVED: We will provide sized versions of these types, which are
defined as completely equivalent to unsized types according to the
following table:
unsized type sized types
------------- ---------------
int int32_t
uint uint32_t
float float32_t
double float64_t
Vector types with sized and unsized components have equivalent
relationships.
Note that the nominally "unsized" data types in the GLSL 1.30 spec are
actually sized. The specification explicitly defines signed and unsized
integers (int, uint) to be 32-bit values. It also defines
floating-point values to "match the IEEE single precision floating-point
definition for precision and dynamic range", which are also 32-bit
values.
This type equivalence has minor implications on function overloading:
* You can't declare separate versions of a function with an "int"
argument in one version and an "int32_t" argument in another.
* Because there is no implicit conversion between equivalent types, we
will get an exact match if an argument is declared with one type
(e.g., "int") in the caller and a textually different but equivalent
type ("int32_t") in the function.
Note that the type equivalence also applies to API data type queries.
For example, the type INT will be returned for a variable declared as
"int32_t".
(14) What are functions like anyThreadNV() and allThreadsNV() good for?
NRESOLVED: If an implementation performs SIMD thread execution,
divergent branching may result in reduced performance if the "if" and
"else" blocks of an "if" statement are executed sequentially. For
example, an algorithm may have both a "fast path" that performs a
computation quickly for a subset of all cases and a "fast path" that
performs a computation quickly but correctly. When performing SIMD
execution, code like the following:
if (condition) {
result = do_fast_path(...);
} else {
result = do_slow_path(...);
}
may end up executing *both* the fast and slow paths for a SIMD thread
group if diverges, and may execute more slowly than simply
executing the slow path unconditionally. These functions allow code
like:
if (allThreadsNV(condition)) {
result = do_fast_path(...);
} else {
result = do_slow_path(...);
}
that executes the fast path if and only if it can be used for *all*
threads in the group. For thread groups where diverges,
this algorithm would unconditionally run the slow path, but would never
run both in sequence.
There may be other cases where "voting" across shader invocations may be
useful. Note that we provide no control over how shader invocations may
be packed within a SIMD thread group, unlike various "compute" APIs
(CUDA, OpenCL).
(15) Can the 64-bit uniform APIs be used to load values for uniforms of
type "bool", "bvec2", "bvec3", or "bvec4"?
RESOLVED: No. OpenGL 2.0 and beyond did allow "bool" variable to be
set with Uniform*i* and Uniform*f APIs, and OpenGL 3.0 extended that
support to Uniform*ui* for orthogonality. But it seems pointless to
extended this capability forward to 64-bit Uniform APIs as well.
(19) The ARB_tessellation_shader extension adds support for patch
primitives that might survive to the transform feedback stage. How
are such primitives captured?
RESOLVED: If patch primitives survive to the transform feedback stage,
they are recorded on a patch-by-patch basis. Incomplete patches are not
recorded. As with other primitive types, if the transform feedback
buffers do not contain enough space to capture an entire patch, no
vertices are recorded.
Note that the only way to get patch primitives all the way to transform
feedback is to have tessellation evaluation and geometry shaders
disabled; the output streams from both of those shader stages are
collections of points, lines, or triangles.
(20) Previous transform feedback allowed capturing only fixed-size
primitives; this extension supports variable-sized patches. What
interactions does this functionality have with transform feedback
buffer overflow?
RESOLVED: With fixed-size point, line, or triangle primitives, once any
primitive fails to be recorded due to insufficient space, all subsequent
primitives would also fail. With variable-size patch primitives, the
transform feedback stage might first receive a large patch that doesn't
fit, followed by a smaller patch that could squeeze into the remaining
space.
To allow for different types of implementation of this extension without
requiring special-case handling of this corner case, we've chosen to
leave this behavior undefined -- the smaller patch may or may not be
recorded.
Revision History
Rev. Date Author Changes
---- -------- -------- -----------------------------------------
6 03/23/10 pbrown Update overview, dependencies, remove references
to old extension names. Extend the function
overloading prioritization rules from
ARB_gpu_shader5 to account for new data types.
Major overhaul of the issues section to match
the refactoring done to produce ARB specs.
5 03/08/10 pbrown Add interaction with EXT_vertex_attrib_64bit and
NV_vertex_attrib_integer_64bit; enabling this
extension automatically enables 64-bit floating-
point and integer vertex inputs.
4 03/01/10 pbrown Fix prototype for GetUniformui64vNV.
3 01/14/10 pbrown Fix with updated enum assignments.
2 12/08/09 pbrown Add explicit component counting rules for
64-bit integer attributes similar to those
in the ARB_gpu_shader_fp64 spec.
1 pbrown Internal revisions.