Name

    NV_gpu_shader5

Name Strings

    GL_NV_gpu_shader5

Contact

    Pat Brown, NVIDIA Corporation (pbrown 'at' nvidia.com)

Contributors

    Barthold Lichtenbelt, NVIDIA
    Chris Dodd, NVIDIA
    Eric Werness, NVIDIA
    Greg Roth, NVIDIA
    Jeff Bolz, NVIDIA
    Piers Daniell, NVIDIA

Status

    Shipping.

Version

    Last Modified Date:         03/23/2010
    NVIDIA Revision:            6

Number

    389

Dependencies

    This extension is written against the OpenGL 3.2 (Compatibility Profile)
    Specification.

    This extension is written against version 1.50 (revision 09) of the OpenGL
    Shading Language Specification.

    OpenGL 3.2 and GLSL 1.50 are required.

    ARB_gpu_shader5 is required.

    This extension interacts with ARB_gpu_shader5.

    This extension interacts with ARB_gpu_shader_fp64.

    This extension interacts with ARB_tessellation_shader.

    This extension interacts with NV_shader_buffer_load.

    This extension interacts with EXT_direct_state_access.

    This extension interacts with EXT_vertex_attrib_64bit and
    NV_vertex_attrib_integer_64bit.


Overview

    This extension provides a set of new features to the OpenGL Shading
    Language and related APIs to support capabilities of new GPUs.  Shaders
    using the new functionality provided by this extension should enable this
    functionality via the construct

      #extension GL_NV_gpu_shader5 : require     (or enable)

    This extension was developed concurrently with the ARB_gpu_shader5
    extension, and provides a superset of the features provided there.  The
    features common to both extensions are documented in the ARB_gpu_shader5
    specification; this document describes only the addition language features
    not available via ARB_gpu_shader5.  A shader that enables this extension
    via an #extension directive also implicitly enables the common
    capabilities provided by ARB_gpu_shader5.

    In addition to the capabilities of ARB_gpu_shader5, this extension
    provides a variety of new features for all shader types, including:

      * support for a full set of 8-, 16-, 32-, and 64-bit scalar and vector
        data types, including uniform API, uniform buffer object, and shader
        input and output support;

      * the ability to aggregate samplers into arrays, index these arrays with
        arbitrary expressions, and not require that non-constant indices be
        uniform across all shader invocations;

      * new built-in functions to pack and unpack 64-bit integer types into a
        two-component 32-bit integer vector;

      * new built-in functions to pack and unpack 32-bit unsigned integer
        types into a two-component 16-bit floating-point vector;

      * new built-in functions to convert double-precision floating-point
        values to or from their 64-bit integer bit encodings;

      * new built-in functions to compute the composite of a set of boolean
        conditions a group of shader threads;

      * vector relational functions supporting comparisons of vectors of 8-,
        16-, and 64-bit integer types or 16-bit floating-point types; and

      * extending texel offset support to allow loading texel offsets from
        regular integer operands computed at run-time, except for lookups with
        gradients (textureGrad*).

    This extension also provides additional support for processing patch
    primitives (introduced by ARB_tessellation_shader).
    ARB_tessellation_shader requires the use of a tessellation evaluation
    shader when processing patches, which means that patches will never
    survive past the tessellation pipeline stage.  This extension lifts that
    restriction, and allows patches to proceed further in the pipeline and be
    used

      * as input to a geometry shader, using a new "patches" layout qualifier;

      * as input to transform feedback;

      * by fixed-function rasterization stages, in which case the patches are
        drawn as independent points.

    Additionally, it allows geometry shaders to read per-patch attributes
    written by a tessellation control shader using input variables declared
    with "patch in".


New Procedures and Functions

    void Uniform1i64NV(int location, int64EXT x);
    void Uniform2i64NV(int location, int64EXT x, int64EXT y);
    void Uniform3i64NV(int location, int64EXT x, int64EXT y, int64EXT z);
    void Uniform4i64NV(int location, int64EXT x, int64EXT y, int64EXT z,
                       int64EXT w);
    void Uniform1i64vNV(int location, sizei count, const int64EXT *value);
    void Uniform2i64vNV(int location, sizei count, const int64EXT *value);
    void Uniform3i64vNV(int location, sizei count, const int64EXT *value);
    void Uniform4i64vNV(int location, sizei count, const int64EXT *value);

    void Uniform1ui64NV(int location, uint64EXT x);
    void Uniform2ui64NV(int location, uint64EXT x, uint64EXT y);
    void Uniform3ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z);
    void Uniform4ui64NV(int location, uint64EXT x, uint64EXT y, uint64EXT z,
                       uint64EXT w);
    void Uniform1ui64vNV(int location, sizei count, const uint64EXT *value);
    void Uniform2ui64vNV(int location, sizei count, const uint64EXT *value);
    void Uniform3ui64vNV(int location, sizei count, const uint64EXT *value);
    void Uniform4ui64vNV(int location, sizei count, const uint64EXT *value);

    void GetUniformi64vNV(uint program, int location, int64EXT *params);


    (The following function is also provided by NV_shader_buffer_load.)

    void GetUniformui64vNV(uint program, int location, uint64EXT *params);


    (All of the following ProgramUniform* functions are supported if and only
     if EXT_direct_state_access is supported.)

    void ProgramUniform1i64NV(uint program, int location, int64EXT x);
    void ProgramUniform2i64NV(uint program, int location, int64EXT x,
                              int64EXT y);
    void ProgramUniform3i64NV(uint program, int location, int64EXT x, 
                              int64EXT y, int64EXT z);
    void ProgramUniform4i64NV(uint program, int location, int64EXT x, 
                              int64EXT y, int64EXT z, int64EXT w);
    void ProgramUniform1i64vNV(uint program, int location, sizei count,
                               const int64EXT *value);
    void ProgramUniform2i64vNV(uint program, int location, sizei count,
                               const int64EXT *value);
    void ProgramUniform3i64vNV(uint program, int location, sizei count,
                               const int64EXT *value);
    void ProgramUniform4i64vNV(uint program, int location, sizei count,
                               const int64EXT *value);

    void ProgramUniform1ui64NV(uint program, int location, uint64EXT x);
    void ProgramUniform2ui64NV(uint program, int location, uint64EXT x, 
                               uint64EXT y);
    void ProgramUniform3ui64NV(uint program, int location, uint64EXT x, 
                               uint64EXT y, uint64EXT z);
    void ProgramUniform4ui64NV(uint program, int location, uint64EXT x, 
                               uint64EXT y, uint64EXT z, uint64EXT w);
    void ProgramUniform1ui64vNV(uint program, int location, sizei count,
                                const uint64EXT *value);
    void ProgramUniform2ui64vNV(uint program, int location, sizei count,
                                const uint64EXT *value);
    void ProgramUniform3ui64vNV(uint program, int location, sizei count,
                                const uint64EXT *value);
    void ProgramUniform4ui64vNV(uint program, int location, sizei count, 
                                const uint64EXT *value);


New Tokens

    Returned by the <type> parameter of GetActiveAttrib, GetActiveUniform, and
    GetTransformFeedbackVarying:

        INT64_NV                                        0x140E
        UNSIGNED_INT64_NV                               0x140F

        INT8_NV                                         0x8FE0
        INT8_VEC2_NV                                    0x8FE1
        INT8_VEC3_NV                                    0x8FE2
        INT8_VEC4_NV                                    0x8FE3
        INT16_NV                                        0x8FE4
        INT16_VEC2_NV                                   0x8FE5
        INT16_VEC3_NV                                   0x8FE6
        INT16_VEC4_NV                                   0x8FE7
        INT64_VEC2_NV                                   0x8FE9
        INT64_VEC3_NV                                   0x8FEA
        INT64_VEC4_NV                                   0x8FEB
        UNSIGNED_INT8_NV                                0x8FEC
        UNSIGNED_INT8_VEC2_NV                           0x8FED
        UNSIGNED_INT8_VEC3_NV                           0x8FEE
        UNSIGNED_INT8_VEC4_NV                           0x8FEF
        UNSIGNED_INT16_NV                               0x8FF0
        UNSIGNED_INT16_VEC2_NV                          0x8FF1
        UNSIGNED_INT16_VEC3_NV                          0x8FF2
        UNSIGNED_INT16_VEC4_NV                          0x8FF3
        UNSIGNED_INT64_VEC2_NV                          0x8FF5
        UNSIGNED_INT64_VEC3_NV                          0x8FF6
        UNSIGNED_INT64_VEC4_NV                          0x8FF7
        FLOAT16_NV                                      0x8FF8
        FLOAT16_VEC2_NV                                 0x8FF9
        FLOAT16_VEC3_NV                                 0x8FFA
        FLOAT16_VEC4_NV                                 0x8FFB

    (If ARB_tessellation_shader is supported, the following enum is accepted
     by a new primitive.)

    Accepted by the <primitiveMode> parameter of BeginTransformFeedback:

        PATCHES
          


Additions to Chapter 2 of the OpenGL 3.2 (Compatibility Profile) Specification
(OpenGL Operation)

    Modify Section 2.6.1, Begin and End, p. 22

    (Extend language describing PATCHES introduced by ARB_tessellation_shader.
    It particular, add the following to the end of the description of the
    primitive type.)

    If a patch primitive is drawn, each patch is drawn separately as a
    collection of points, which each patch vertex definining a separate point.
    Extra vertices from an incomplete patch are never drawn.


    Modify Section 2.14.3, Vertex Attributes, p. 86

    (modify the second paragraph, p. 87) ... exceeds MAX_VERTEX_ATTRIBS.  For
    the purposes of this comparison, attribute variables of the type i64vec3,
    u64vec3, i64vec4, and u64vec4 count as consuming twice as many attributes
    as equivalent single-precision types.


    (extend the list of types in the first paragraph, p. 88)
    ... UNSIGNED_INT_VEC3, UNSIGNED_INT_VEC4, INT8_NV, INT8_VEC2_NV,
    INT8_VEC3_NV, INT8_VEC4_NV, INT16_NV, INT16_VEC2_NV, INT16_VEC3_NV,
    INT16_VEC4_NV, INT64_NV, INT64_VEC2_NV, INT64_VEC3_NV, INT64_VEC4_NV,
    UNSIGNED_INT8_NV, UNSIGNED_INT8_VEC2_NV, UNSIGNED_INT8_VEC3_NV,
    UNSIGNED_INT8_VEC4_NV, UNSIGNED_INT16_NV, UNSIGNED_INT16_VEC2_NV,
    UNSIGNED_INT16_VEC3_NV, UNSIGNED_INT16_VEC4_NV, UNSIGNED_INT64_NV,
    UNSIGNED_INT64_VEC2_NV, UNSIGNED_INT64_VEC3_NV, UNSIGNED_INT64_VEC4_NV,
    FLOAT16_NV, FLOAT16_VEC2_NV, FLOAT16_VEC3_NV, or FLOAT16_VEC4_NV.


    Modify Section 2.14.4, Uniform Variables, p. 89

    (modify third paragraph, p. 90) ... uniform variable storage for a vertex
    shader.  A scalar or vector uniform with with 64-bit integer components
    will consume no more than 2<n> components, where <n> is 1 for scalars, and
    the component count for vectors.  A link error is generated ...

    (add to Table 2.13, p. 96)

      Type Name Token           Keyword
      --------------------      ----------------
      INT8_NV                   int8_t
      INT8_VEC2_NV              i8vec2
      INT8_VEC3_NV              i8vec3
      INT8_VEC4_NV              i8vec4
      INT16_NV                  int16_t
      INT16_VEC2_NV             i16vec2
      INT16_VEC3_NV             i16vec3
      INT16_VEC4_NV             i16vec4
      INT64_NV                  int64_t
      INT64_VEC2_NV             i64vec2
      INT64_VEC3_NV             i64vec3
      INT64_VEC4_NV             i64vec4
      UNSIGNED_INT8_NV          uint8_t
      UNSIGNED_INT8_VEC2_NV     u8vec2
      UNSIGNED_INT8_VEC3_NV     u8vec3
      UNSIGNED_INT8_VEC4_NV     u8vec4
      UNSIGNED_INT16_NV         uint16_t
      UNSIGNED_INT16_VEC2_NV    u16vec2
      UNSIGNED_INT16_VEC3_NV    u16vec3
      UNSIGNED_INT16_VEC4_NV    u16vec4
      UNSIGNED_INT64_NV         uint64_t
      UNSIGNED_INT64_VEC2_NV    u64vec2
      UNSIGNED_INT64_VEC3_NV    u64vec3
      UNSIGNED_INT64_VEC4_NV    u64vec4
      FLOAT16_NV                float16_t
      FLOAT16_VEC2_NV           f16vec2
      FLOAT16_VEC3_NV           f16vec3
      FLOAT16_VEC4_NV           f16vec4

    (modify list of commands at the bottom of p. 99)

      void Uniform{1,2,3,4}{i64,ui64}NV(int location, T value);
      void Uniform{1,2,3,4}{i64,ui64}vNV(int location, T value);

    (insert after fourth paragraph, p. 100) The Uniform*i64{v}NV and
    Uniform*ui64{v}NV commands will load <count> sets of one to four 64-bit
    signed or unsigned integer values into a uniform location defined as a
    64-bit signed or unsigned integer scalar or vector types.


    (modify "Uniform Buffer Object Storage", p. 102, adding two bullets after
     the last "Members of type", and modifying the subsequent bullet)

     * Members of type int8_t, int16_t, and int64_t are extracted from a
       buffer object by reading a single byte, short, or int64-typed value at
       the specified offset.

     * Members of type uint8_t, uint16_t, and uint64_t are extracted from a
       buffer object by reading a single ubyte, ushort, or uint64-typed value
       at the specified offset.

     * Members of type float16_t are extracted from a buffer object by reading
       a single half-typed value at the specified offset.

     * Vectors with N elements with basic data types of bool, int, uint,
       float, double, int8_t, int16_t, int64_t, uint8_t, uint16_t, uint64_t,
       or float16_t are extracted as N values in consecutive memory locations
       beginning at the specified offset, with components stored in order with
       the first (X) component at the lowest offset. The GL data type used for
       component extraction is derived according to the rules for scalar
       members above.


    Modify Section 2.14.6, Varying Variables, p. 106

    (modify third paragraph, p. 107) ... For the purposes of counting input
    and output components consumed by a shader, variables declared as vectors,
    matrices, and arrays will all consume multiple components.  Each component
    of variables declared as 64-bit integer scalars or vectors, will be
    counted as consuming two components.

    (add after the bulleted list, p. 108) For the purposes of counting the
    total number of components to capture, each component of outputs declared
    as 64-bit integer scalars or vectors will be counted as consuming two
    components.


    Modify Section 2.15.1, Geometry Shader Input Primitives, p. 118

    (add new qualifier at the end of the section, p. 120)

    Patches (patches)

    Geometry shaders that operate on patches are valid for the PATCHES
    primitive type.  The number of vertices available to each program
    invocation is equal to the vertex count of the variable-size patch, with
    vertices presented to the geometry shader in the order specified in the
    patch.


    Modify Section 2.15.4, Geometry Shader Execution Environment, p. 121

    (add to the end of "Geometry Shader Inputs", p. 123)

    Geometry shaders also support built-in and user-defined per-primitive
    inputs.  The following built-in inputs, not replicated per-vertex and not
    contained in gl_in[], are supported:

      * The variable gl_PatchVerticesIn is filled with the number of the
        vertices in the input primitive.

      * The variables gl_TessLevelOuter[] and gl_TessLevelInner[] are arrays
        holding outer and inner tessellation levels of an input patch.  If a
        tessellation control shader is active, the tessellation levels will be
        taken from the corresponding outputs of the tessellation control
        shader.  Otherwise, the default levels provided as patch parameters
        are used.  Tessellation level values loaded in these variables will be
        prior to the clamping and rounding operations performed by the
        primitive generator as described in Section 2.X.2 of
        ARB_tessellation_shader.  For triangular tessellation,
        gl_TessLevelOuter[3] and gl_TessLevelInner[1] will be undefined.  For
        isoline tessellation, gl_TessLevelOuter[2], gl_TessLevelOuter[3], and
        both values in gl_TessLevelInner[] are undefined.

    Additionally, a geometry shader with an input primitive type of "patches"
    may declare per-patch input variables using the qualifier "patch in".
    Unlike per-vertex inputs, per-patch inputs do not correspond to any
    specific vertex in the input primitive, and are not indexed by vertex
    number.  Per-patch inputs declared as arrays have multiple values for the
    input patch; similarly declared per-vertex inputs would indicate a single
    value for each vertex in the output patch.  User-defined per-patch input
    variables are filled with corresponding per-patch output values written by
    the tessellation control shader.  If no tessellation control shader is
    active, all such variables are undefined.

    Per-patch input variables and the built-in inputs "gl_PatchVerticesIn",
    "gl_TessLevelOuter[]", and "gl_TessLevelInner[]" are supported only for
    geometry shaders with an input primitive type of "patches".  A program
    will fail to link if any such variable is used in a geometry shader with a
    input primitive type other than "patches".


    Modify Section 2.19, Transform Feedback, p. 130

    (add to Table 2.14, p. 131)

      Transform Feedback
      primitiveMode               allowed render primitive modes
      ----------------------      ---------------------------------
      PATCHES                     PATCHES


    (modify first paragraph, p. 131) ... <primitiveMode> is one of TRIANGLES,
    LINES, POINTS, or PATCHES and specifies the type of primitives that will
    be recorded into the buffer objects bound for transform feedback (see
    below). ...

    (modify last paragraph, p. 131 and first paragraph, p. 132, adding patch
    support, and dealing with capture of 8- and 16-bit components)

    When an individual point, line, triangle, or patch primitive reaches the
    transform feedback stage ...  When capturing line, triangle, and patch
    primitives, all attributes ...  For multi-component varying variables or
    varying array elements, the individual components are written in order.
    For variables with 8- or 16-bit fixed- or floating-point components,
    individual components will be converted to and stored as equivalent values
    of type "int", "uint", or "float".  The value for any attribute specified
    ...

    (modify next-to-last paragraph, p. 132) ... is not incremented.  If
    transform feedback receives a primitive that fits in the remaining space
    after such an overflow occurs, that primitive may or may not be recorded.
    Primitives that fail to fit in the remaining space are never recorded.


Additions to Chapter 3 of the OpenGL 3.2 (Compatibility Profile) Specification
(Rasterization)

    None.

Additions to Chapter 4 of the OpenGL 3.2 (Compatibility Profile) Specification
(Per-Fragment Operations and the Frame Buffer)

    None.

Additions to Chapter 5 of the OpenGL 3.2 (Compatibility Profile) Specification
(Special Functions)

    None.

Additions to Chapter 6 of the OpenGL 3.2 (Compatibility Profile) Specification
(State and State Requests)

    Modify Section 6.1.15, Shader and Program Queries, p. 332

    (add to the first list of commands, p. 337)

      void GetUniformi64vNV(uint program, int location, int64EXT *params);
      void GetUniformui64vNV(uint program, int location, uint64EXT *params);


Additions to Appendix A of the OpenGL 3.2 (Compatibility Profile)
Specification (Invariance)

    None.

Additions to the AGL/GLX/WGL Specifications

    None.

Modifications to The OpenGL Shading Language Specification, Version 1.50
(Revision 09)

    Including the following line in a shader can be used to control the
    language features described in this extension:

      #extension GL_NV_gpu_shader5 : <behavior>

    where <behavior> is as specified in section 3.3.

    New preprocessor #defines are added to the OpenGL Shading Language:

      #define GL_NV_gpu_shader5         1

    If the features of this extension are enabled by an #extension directive,
    shading language features documented in the ARB_gpu_shader5 extension will
    also be provided.


    Modify Section 3.6, Keywords, p. 15

    (add the following to the list of reserved keywords)

    int8_t              i8vec2          i8vec3          i8vec4
    int16_t             i16vec2         i16vec3         i16vec4
    int32_t             i32vec2         i32vec3         i32vec4
    int64_t             i64vec2         i64vec3         i64vec4
    uint8_t             u8vec2          u8vec3          u8vec4
    uint16_t            u16vec2         u16vec3         u16vec4
    uint32_t            u32vec2         u32vec3         u32vec4
    uint64_t            u64vec2         u64vec3         u64vec4
    float16_t           f16vec2         f16vec3         f16vec4
    float32_t           f32vec2         f32vec3         f32vec4
    float64_t           f64vec2         f64vec3         f64vec4

    (note:  the "float64_t" and "f64vec*" types are available if and only if
    ARB_gpu_shader_fp64 is also supported)


    Modify Section 4.1, Basic Types, p. 18

    (add to the basic "Transparent Types" table, p. 18)
    
      Types       Meaning
      --------    ----------------------------------------------------------
      int8_t      an 8-bit signed integer
      i8vec2      a two-component signed integer vector (8-bit components)
      i8vec3      a three-component signed integer vector (8-bit components)
      i8vec4      a four-component signed integer vector (8-bit components)

      int16_t     a 16-bit signed integer
      i16vec2     a two-component signed integer vector (16-bit components)
      i16vec3     a three-component signed integer vector (16-bit components)
      i16vec4     a four-component signed integer vector (16-bit components)

      int32_t     a 32-bit signed integer
      i32vec2     a two-component signed integer vector (32-bit components)
      i32vec3     a three-component signed integer vector (32-bit components)
      i32vec4     a four-component signed integer vector (32-bit components)

      int64_t     a 64-bit signed integer
      i64vec2     a two-component signed integer vector (64-bit components)
      i64vec3     a three-component signed integer vector (64-bit components)
      i64vec4     a four-component signed integer vector (64-bit components)

      uint8_t     a 8-bit unsigned integer
      u8vec2      a two-component unsigned integer vector (8-bit components)
      u8vec3      a three-component unsigned integer vector (8-bit components)
      u8vec4      a four-component unsigned integer vector (8-bit components)

      uint16_t    a 16-bit unsigned integer
      u16vec2     a two-component unsigned integer vector (16-bit components)
      u16vec3     a three-component unsigned integer vector (16-bit components)
      u16vec4     a four-component unsigned integer vector (16-bit components)

      uint32_t    a 32-bit unsigned integer
      u32vec2     a two-component unsigned integer vector (32-bit components)
      u32vec3     a three-component unsigned integer vector (32-bit components)
      u32vec4     a four-component unsigned integer vector (32-bit components)

      uint64_t    a 64-bit unsigned integer
      u64vec2     a two-component unsigned integer vector (64-bit components)
      u64vec3     a three-component unsigned integer vector (64-bit components)
      u64vec4     a four-component unsigned integer vector (64-bit components)

      float16_t   a single 16-bit floating-point value
      f16vec2     a two-component floating-point vector (16-bit components)
      f16vec3     a three-component floating-point vector (16-bit components)
      f16vec4     a four-component floating-point vector (16-bit components)

      float32_t   a single 32-bit floating-point value
      f32vec2     a two-component floating-point vector (32-bit components)
      f32vec3     a three-component floating-point vector (32-bit components)
      f32vec4     a four-component floating-point vector (32-bit components)

      float64_t   a single 64-bit floating-point value
      f64vec2     a two-component floating-point vector (64-bit components)
      f64vec3     a three-component floating-point vector (64-bit components)
      f64vec4     a four-component floating-point vector (64-bit components)


    Modify Section 4.1.3, Integers, p. 20

    (add after the first paragraph of the section, p. 20)

    Variables with the types "int8_t", "int16_t", and "int64_t" represent
    signed integer values with exactly 8, 16, or 64 bits, respectively.
    Variables with the type "uint8_t", "uint16_t", and "uint64_t" represent
    unsigned integer values with exactly 8, 16, or 64 bits, respectively.
    Variables with the type "int32_t" and "uint32_t" represent signed and
    unsigned integer values with 32 bits, and are equivalent to "int" and
    "uint" types, respectively.


    (modify the grammar, p. 21, adding "L" and "UL" suffixes)

      integer-suffix:  one of

        u U l L ul UL

    (modify next-to-last paragraph, p. 21) ... When the suffix "u" or "U" is
    present, the literal has type <uint>.  When the suffix "l" or "L" is
    present, the literal has type <int64_t>.  When the suffix "ul" or "UL" is
    present, the literal has type <uint64_t>.  Otherwise, the type is
    <int>. ...


    Modify Section 4.1.4, Floats, p. 22

    (insert after second paragraph, p. 22) 

    Variables of type "float16_t" represent floating-point using exactly 16
    bits and are stored using the 16-bit floating-point representation
    described in of the OpenGL Specification.  Variables of type "float32_t"
    and "float64_t" represent floating-point with 32 or 64 bits, and are
    equivalent to "float" and "double" types, respectively.


    Modify Section 4.1.7, Samplers, p. 23

    (modify 1st paragraph of the section, deleting the restriction requiring
    constant indexing of sampler arrays) ... Samplers may aggregated into
    arrays within a shader (using square brackets [ ]) and can be indexed with
    general integer expressions.  The results of accessing a sampler array
    with an out-of-bounds index are undefined. ...

    (remove the additional restriction added by ARB_gpu_shader5 making a
    similar edit requiring uniform indexing across shader invocations for
    defined results.  NV_gpu_shader5 has no such limitation.)


    Modify Section 4.1.10, Implicit Conversions, p. 27

    (modify table of implicit conversions)

                                Can be implicitly
        Type of expression        converted to
        --------------------    -------------------------------
        int                     uint, float, double(*)
        ivec2                   uvec2, vec2, dvec2(*)
        ivec3                   uvec3, vec3, dvec3(*)
        ivec4                   uvec4, vec4, dvec4(*)

        int8_t   int16_t        int, int64_t, uint, uint64_t,
                                  float, double(*)
        i8vec2   i16vec2        ivec2, i64vec2, uvec2, u64vec2,
                                  vec2, dvec2(*)
        i8vec3   i16vec3        ivec3, i64vec3, uvec3, u64vec3,
                                  vec3, dvec3(*)
        i8vec4   i16vec4        ivec4, i64vec4, uvec4, u64vec4,
                                  vec4, dvec4(*)

        int64_t                 uint64_t, double(*)
        i64vec2                 u64vec2, dvec2(*)
        i64vec3                 u64vec3, dvec3(*)
        i64vec4                 u64vec4, dvec4(*)

        uint                    float, double(*)
        uvec2                   vec2, dvec2(*)
        uvec3                   vec3, dvec3(*)
        uvec4                   vec4, dvec4(*)

        uint8_t  uint16_t       uint, uint64_t, float, double(*)
        u8vec2   u16vec2        uvec2, u64vec2, vec2, dvec2(*)
        u8vec3   i16vec3        uvec3, u64vec3, vec3, dvec3(*)
        u8vec4   i16vec4        uvec4, u64vec4, vec4, dvec4(*)

        uint64_t                double(*)
        u64vec2                 dvec2(*)
        u64vec3                 dvec3(*)
        u64vec4                 dvec4(*)

        float                   double(*)
        vec2                    dvec2(*)
        vec3                    dvec3(*)
        vec4                    dvec4(*)

        float16_t               float, double(*)
        f16vec2                 vec2, dvec2(*)
        f16vec3                 vec3, dvec3(*)
        f16vec4                 vec4, dvec4(*)

        (*) if ARB_gpu_shader_fp64 is supported

    (Note:  Expressions of type "int32_t", "uint32_t", "float32_t", and
    "float64_t" are treated as identical to those of type "int", "uint",
    "float", and "double", respectively.  Implicit conversions to and from
    these explicitly-sized types are allowed whenever conversions involving
    the equivalent base type are allowed.)


    (modify second paragraph of the section) No implicit conversions are
    provided to convert from unsigned to signed integer types, from
    floating-point to integer types, from higher-precision to lower-precision
    types, from 8-bit to 16-bit types, or between matrix types.  There are no
    implicit array or structure conversions.

    (add before the final paragraph of the section, p. 27) 

    (insert before the final paragraph of the section) When performing
    implicit conversion for binary operators, there may be multiple data types
    to which the two operands can be converted.  For example, when adding an
    int8_t value to a uint16_t value, both values can be implicitly converted
    to uint, uint64_t, float, and double.  In such cases, a floating-point
    type is chosen if either operand has a floating-point type.  Otherwise, an
    unsigned integer type is chosen if either operand has an unsigned integer
    type.  Otherwise, a signed integer type is chosen.  If operands can be
    converted to both 32- and 64-bit versions of the chosen base data type,
    the 32-bit version is used.


    Modify Section 4.3.4, Inputs, p. 31

    (modify third paragraph of section, p. 31, allowing explicitly-sized
    types) ... Vertex shader inputs variables can only signed and unsigned
    integers, floats, doubles, explicitly-sized integers and floating-point
    values, vectors of any of these types, and matrices.  ...

    (modify edits done in ARB_tessellation_shader adding support for "patch
    in", allowing for geometry shaders as well) Additionally, tessellation
    evaluation and geometry shaders support per-patch input variables declared
    with the "patch in" qualifier.  Per-patch input ...


    (modify third paragraph, p. 32) ... Fragment inputs can only be signed and
    unsigned integers, floats, doubles, explicitly-sized integers and
    floating-point values, vectors of any of these types, matrices, or arrays
    or structures of these.  Fragment inputs declared as signed or unsigned
    integers, doubles, 64-bit floating-point values, including vectors,
    matrices, or arrays derived from those types, must be qualified as "flat".


    Modify Section 4.3.6, Outputs, p. 33

    (modify third paragraph of the section, p. 33) ... They can only be signed
    and unsigned integers, floats, doubles, explicitly-sized integers and
    floating-point values, vectors of any of these types, matrices, or arrays
    or structures of these.

    (modify last paragraph, p. 33) ...  Fragment outputs can only be signed
    and unsigned integers, floats, explicitly-sized integers and
    floating-point values with 32 or fewer bits, vectors of any of these
    types, or arrays of these.  Doubles, 64-bit integers or floating-point
    values, vectors or arrays of those types, matrices, and structures cannot
    be output. ...


    Modify Section 4.3.8.1, Input Layout Qualifiers, p. 37

    (add to the list of qualifiers for geometry shaders, p. 37)

      layout-qualifier-id:
        ...
        triangles_adjacency
        patches

    (modify the "size of input arrays" table, p. 38)

        Layout          Size of Input Arrays
      ------------      --------------------
        patches         gl_MaxPatchVertices

    (add paragraph below that table, p. 38)

    When using the input primitive type "patches", the geometry shader is used
    to process a set of patches with vertex counts that may vary from patch to
    patch.  For the purposes of input array sizing, patches are treated as
    having a vertex count fixed at the implementation-dependent maximum patch
    size, gl_MaxPatchVertices.  If a shader reads an input corresponding to a
    vertex not found in the patch being processed, the values read are
    undefined.


    Modify Section 5.4.1, Conversion and Scalar Constructors, p. 49

    (add after first list of constructor examples)

    Similar constructors are provided to convert to and from explicitly-sized
    scalar data types, as well:

      float(uint8_t)      // converts an 8-bit uint value to a float
      int64_t(double)     // converts a double value to a 64-bit int
      float64_t(int16_t)  // converts a 16-bit int value to a 64-bit float
      uint16_t(bool)      // converts a Boolean value to a 16-bit uint

    (replace final two paragraphs, p. 49, and the first paragraph, p. 50,
    using more general language)

    When constructors are used to convert any floating-point type to any
    integer type, the fractional part of the floating-point value is dropped.
    It is undefined to convert a negative floating point value to an unsigned
    integer type.

    When a constructor is used to convert any integer or floating-point type
    to bool, 0 and 0.0 are converted to false, and non-zero values are
    converted to true.  When a constructor is used to convert a bool to any
    integer or floating-point type, false is converted to 0 or 0.0, and true
    is converted to 1 or 1.0.

    Constructors converting between signed and unsigned integers with the same
    bit count always preserve the bit pattern of the input.  This will change
    the value of the argument if its most significant bit is set, converting a
    negative signed integer to a large unsigned integer, or vice versa.


    Modify Section 5.9, Expressions, p. 57

    (modify bulleted list as follows, adding support for expressions with
    64-bit integer types)

    Expressions in the shading language are built from the following:

    * Constants of type bool, int, int64_t, uint, uint64_t, float, all vector
      types, and all matrix types.

    ...

    * The arithmetic binary operators add (+), subtract (-), multiply (*), and
      divide (/) operate on 32-bit integer, 64-bit integer, and floating-point
      scalars, vectors, and matrices.  If the fundamental types of the
      operands do not match, the conversions from Section 4.1.10 "Implicit
      Conversions" are applied to produce matching types.  ...

    * The operator modulus (%) operate on 32- and 64-bit integer scalars or
      vectors. If the fundamental types of the operands do not match, the
      conversions from Section 4.1.10 "Implicit Conversions" are applied to
      produce matching types.  ...

    * The arithmetic unary operators negate (-), post- and pre-increment and
      decrement (-- and ++) operate on 32-bit integer, 64-bit integer, and
      floating-point values (including vectors and matrices). ...

    * The relational operators greater than (>), less than (<), and less than
      or equal (<=) operate only on scalar 32-bit integer, 64-bit integer, and
      floating-point expressions.  The result is scalar Boolean.  The
      fundamental type of the two operands must match, either as specified, or
      after one of the implicit type conversions specified in Section 4.1.10.
      ...

    * The equality operators equal (==), and not equal (!=) operate only on
      scalar 32-bit integer, 64-bit integer, and floating-point expressions.
      The result is scalar Boolean.  The fundamental type of the two operands
      must match, either as specified, or after one of the implicit type
      conversions specified in Section 4.1.10.  ...


    Modify Section 6.1, Function Definitions, p. 63

    (ARB_gpu_shader5 adds a set of rules for defining whether implicit
    conversions for one matching funcction definition are better or worse than
    those for another.  These comparisons are done argument by argument.
    Extend the edits made by ARB_gpu_shader5 to add several new rules for
    comparing implicit conversions for a single argument, corresponding to the
    new data types introduced by this extension.)

     To determine whether the conversion for a single argument in one match is
     better than that for another match, the following rules are applied, in
     order:

       1.  An exact match is better than a match involving any implicit
           conversion.

       2.  A match involving a conversion from a signed integer, unsigned
           integer, or floating-point type to a similar type having a larger
           number of bits is better a match not involving another conversion.
           The set of conversions qualifying under this rule are:

            source types                destination types
            -----------------           -----------------
            int8_t, int16_t             int, int64_t
            int                         int64_t
            uint8_t, uint16_t           uint, uint64_t
            uint                        uint64_t
            float16_t                   float
            float                       double

       3.  A match involving one conversion in rule 2 is better than a match
           involving another conversion in rule 2 if:

            (a) both conversions start with the same type and the first
                conversion is to a type with a smaller number of bits (e.g.,
                converting from int16_t to int is preferred to converting
                int16_t to int64_t), or

            (b) both conversions end with the same type and the first
                conversion is from a type with a larger number of bits (e.g.,
                converting an "out" parameter from int16_t to int is preferred
                to convering from int8_t to int).

       4. A match involving an implicit conversion from any integer type to
          float is better than a match involving an implicit conversion from
          any integer type to double.


    Modify Section 7.1, Vertex and Geometry Shader Special Variables, p. 69

    (NOTE:  These edits are written against the re-organized section in the
    ARB_tessellation_shader specification.)

    (add to the list of built-ins inputs for geometry shaders) In the geometry
    language, built-in input and output variables are intrinsically declared
    as:

      in int gl_PatchVerticesIn;
      patch in float gl_TessLevelOuter[4];
      patch in float gl_TessLevelInner[2];

    ...

    The input variable gl_PatchVerticesIn behaves as in the identically-named
    tessellation control and evaluation shader inputs.

    The input variables gl_TessLevelOuter[] and gl_TessLevelInner[] behave as
    in the identically-named tessellation evaluation shader inputs.
    

    Modify Chapter 8, Built-in Functions, p. 81

    (add to description of generic types, last paragraph of p. 69) ...  Where
    the input arguments (and corresponding output) can be int64_t, i64vec2,
    i64vec3, or i64vec4, <genI64Type> is used as the argument.  Where the
    input arguments (and corresponding output) can be uint64_t, u64vec2,
    u64vec3, or u64vec4, <genU64Type> is used as the argument.


    Modify Section 8.3, Common Functions, p. 84

    (add support for 64-bit integer packing and unpacking functions)

    Syntax:

      int64_t  packInt2x32(ivec2 v);
      uint64_t packUint2x32(uvec2 v);

      ivec2  unpackInt2x32(int64_t v);
      uvec2  unpackUint2x32(uint64_t v);

    The functions packInt2x32() and packUint2x32() return a signed or unsigned
    64-bit integer obtained by packing the components of a two-component
    signed or unsigned integer vector, respectively.  The first vector
    component specifies the 32 least significant bits; the second component
    specifies the 32 most significant bits.

    The functions unpackInt2x32() and unpackUint2x32() return a signed or
    unsigned integer vector built from a 64-bit signed or unsigned integer
    scalar, respectively.  The first component of the vector contains the 32
    least significant bits of the input; the second component consists the 32
    most significant bits.


    (add support for 16-bit floating-point packing and unpacking functions)

    Syntax:

      uint      packFloat2x16(f16vec2 v);
      f16vec2   unpackFloat2x16(uint v);

    The function packFloat2x16() returns an unsigned integer obtained by
    interpreting the components of a two-component 16-bit floating-point as
    integers according to OpenGL Specification, and then packing the two
    16-bit integers into a 32-bit unsigned integer.  The first vector
    component specifies the 16 least significant bits of the result; the
    second component specifies the 16 most significant bits.

    The function unpackFloat2x16() returns a two-component vector with 16-bit
    floating-point components obtained by unpacking a 32-bit unsigned integer
    into a pair of 16-bit values, and interpreting those values as 16-bit
    floating-point numbers according to the OpenGL Specification.  The first
    component of the vector is obtained from the 16 least significant bits of
    the double; the second component is obtained from the 16 most significant
    bits.


    (add functions to get/set the bit encoding for floating-point values)

    64-bit floating-point data types in the OpenGL shading language are
    specified to be encoded according to the IEEE specification for
    double-precision floating-point values.  The functions below allow shaders
    to convert double-precision floating-point values to and from 64-bit
    signed or unsigned integers representing their encoding.

    To obtain signed or unsigned integer values holding the encoding of a
    floating-point value, use:

      genI64Type doubleBitsToInt64(genDType value);
      genU64Type doubleBitsToUint64(genDType value);

    Conversions are done on a component-by-component basis.

    To obtain a floating-point value corresponding to a signed or unsigned
    integer encoding, use:

      genDType int64BitsToDouble(genI64Type value);
      genDType uint64BitsToDouble(genU64Type value);


    (add functions to evaluate predicates over groups of threads)

    Syntax:

      bool anyThreadNV(bool value); 
      bool allThreadsNV(bool value);
      bool allThreadsEqualNV(bool value);

    Implementations of the OpenGL Shading Language may, but are not required,
    to run multiple shader threads for a single stage as a SIMD thread group,
    where individual execution threads are assigned to thread groups in an
    undefined, implementation-dependent order.  Algorithms may benefit from
    being able to evaluate a composite of boolean values over all active
    threads in the thread group.  

    The function anyThreadNV() returns true if and only if <value> is true for
    at least one active thread in the group.  The function allThreadsNV()
    returns true if and only if <value> is true for all active threads in the
    group.  The function allThreadsEqualNV() returns true if <value> is the
    same for all active threads in the group; the result of
    allThreadsEqualNV() will be true if and only if anyThreadNV() and
    allThreadsNV() would return the same value.

    Since these functions depends on the values of <value> in an undefined
    group of threads, the value returned by these functions is largely
    undefined.  However, anyThreadNV() is guaranteed to return true if <value>
    is true, and allThreadsNV() is guaranteed to return false if <value> is
    false.  

    Since implementations are generally not required to combine threads into
    groups, simply returning <value> for anyThreadNV() and allThreadsNV() and
    returning true for allThreadsEqualNV() is a legal implementation of these
    functions.


    Modify Section 8.6, Vector Relational Functions, p. 90

    (modify the first paragraph, p. 90, adding support for relational
    functions operating on explicitly-sized types)

    Relational and equality operators (<, <=, >, >=, ==, !=) are defined (or
    reserved) to operate on scalars and produce scalar Boolean results.  For
    vector results, use the following built-in functions.  In the definitions
    below, the following terms are used as placeholders for all vector types
    for a given fundamental data type:

        placeholder     fundamental types
        -----------     ------------------------------------------------
        bvec            bvec2, bvec3, bvec4

        ivec            ivec2, ivec3, ivec4, i8vec2, i8vec3, i8vec4,
                        i16vec2, i16vec3, i16vec4, i64vec2, i64vec3, i64vec4

        uvec            uvec2, uvec3, uvec4, u8vec2, u8vec3, u8vec4,
                        u16vec2, u16vec3, u16vec4, u64vec2, u64vec3, u64vec4

        vec             vec2, vec3, vec4, dvec2(*), dvec3(*), dvec4(*),
                        f16vec2, f16vec3, f16vec4

        (*) only if ARB_gpu_shader_fp64 is supported

    In all cases, the sizes of the input and return vectors for any
    particular call must match.


    Modify Section 8.7, Texture Lookup Functions, p. 91

    (modify text for textureOffset() functions, p. 94, allowing non-constant
    offsets) 

    Do a texture lookup as in texture but with offset added to the (u,v,w)
    texel coordinates before looking up each texel.  The value <offset> need
    not be constant; however, a limited range of offset values are supported.
    If any component of <offset> is less than MIN_PROGRAM_TEXEL_OFFSET_EXT or
    greater than MAX_PROGRAM_TEXEL_OFFSET_EXT, the offset applied to the
    texture coordinates is undefined.  Note that offset does not apply to the
    layer coordinate for texture arrays. This is explained in detail in
    section 3.9.9 of the OpenGL Specification (Version 3.2, Compatibility
    Profile), where offset is (delta_u, delta_v, delta_w).  Note that texel
    offsets are also not supported for cube maps.

    (Note:  This lifting of the constant offset restriction also applies to
    texelFetchOffset, p. 95, textureProjOffset, p. 95, textureLodOffset,
    p. 96, textureProjLodOffset, p. 96.)


    (modify to the description of the textureGradOffset() functions, p. 97,
    preserving the restriction on constant offsets) 

    Do a texture lookup with both explicit gradient and offset, as described
    in textureGrad and textureOffset.  For these functions, the offset value
    must be a constant expression.  A limited range of offset values are
    supported; the minimum and maximum offset values are
    implementation-dependent and given by MIN_PROGRAM_TEXEL_OFFSET and
    MAX_PROGRAM_TEXEL_OFFSET, respectively.


    (modify to the description of the textureProjGradOffset() functions,
    p. 98, preserving the restriction on constant offsets)


    Do a texture lookup projectively and with explicit gradient as described
    in textureProjGrad, as well as with offset, as described in textureOffset.
    For these functions, the offset value must be a constant expression.  A
    limited range of offset values are supported; the minimum and maximum
    offset values are implementation-dependent and given by
    MIN_PROGRAM_TEXEL_OFFSET and MAX_PROGRAM_TEXEL_OFFSET, respectively.


    Modify Section 9, Shading Language Grammar, p. 92

    !!! TBD !!!


GLX Protocol

    TBD

Dependencies on ARB_gpu_shader5

    This extension also incorporates all the changes to the OpenGL Shading
    Language made by ARB_gpu_shader5; enabling this extension by a #extension
    directive in shader code also enables all features of ARB_gpu_shader5 as
    though the shader code has also declared

      #extension GL_ARB_gpu_shader5 : enable

    The converse is not true; implementations supporting both extensions
    should not provide the shading language features in this extension if
    shader code #extension directives enable only ARB_gpu_shader5.

    This specification and ARB_gpu_shader5 both lift the restriction in GLSL
    1.50 requiring that indexing in arrays of samplers must be done with
    constant expressions.  However, ARB_gpu_shader5 specifies that results are
    undefined if the indices would diverge if multiple shader invocations are
    run in lockstep.  This extension does not impose the non-divergent
    indexing requirement.


Dependencies on ARB_gpu_shader_fp64

    This extension and ARB_gpu_shader_fp64 both provide support for shading
    language variables with 64-bit components.  If both extensions are
    supported, the various edits describing this new support should be
    combined.

    If ARB_gpu_shader_fp64 is not supported, the following edits should be
    removed:

     * language adding the data types "float64_t", "f64vec2", "f64vec3", and
       "f64vec4";

     * language allowing implicit conversions of various types to double,
       dvec2, dvec3, or dvec4; and

     * the built-in functions doubleBitsToInt64(), doubleBitsToUint64(),
       int64BitsToDouble(), and uint64BitsToDouble().

Dependencies on ARB_tessellation_shader

    If ARB_tessellation_shader is not supported, language introduced by this
    extension describing processing patches in geometry shaders, transform
    feedback, and rasterization should be removed.

    If this extension and ARB_tessellation_shader are supported, it is legal
    to send patches past the tessellation stage -- the following language from
    ARB_tessellation_shader is removed:

      Patch primitives are not supported by pipeline stages below the
      tessellation evaluation shader.  If there is no active program object or
      the active program object does not contain a tessellation evaluation
      shader, the error INVALID_OPERATION is generated by Begin (or vertex
      array commands that implicitly call Begin) if the primitive mode is
      PATCHES.

Dependencies on NV_shader_buffer_load

    If NV_shader_buffer_load is supported, that specification should be edited
    as follows, to allow pointers to dereference the new data types added by
    this extension.

    Modify "Section 2.20.X, Shader Memory Access" from NV_shader_buffer_load.

    (add rules for loads of variables having the new data types from this
    extension to the list of bullets following "When a shader dereferences a
    pointer variable")

    - Data of type "int8_t," "int16_t", "int32_t", and "int64_t" are read
      from or written to memory as a single 8-, 16-, 32-, or 64-bit signed
      integer value at the specified GPU address.

    - Data of type "uint8_t," "uint16_t", "uint32_t", and "uint64_t" are read
      from or written to memory as a single 8-, 16-, 32-, or 64-bit unsigned
      integer value at the specified GPU address.

    - Data of type "float16_t", "float32_t", and "float64_t" are read from or
      written to memory as a single 16-, 32-, or 64-bit floating-point value
      at the specified GPU address.

Dependencies on EXT_direct_state_access

    If EXT_direct_state_access is supported, that specification should be
    edited as follows to include new ProgramUniform* functions.

    (modify the ProgramUniform* language)

    The following commands:

        ....
        void ProgramUniform{1,2,3,4}{i64,ui64}NV
            (uint program int location, T value);
        void ProgramUniform{1,2,3,4}{i64,ui64}vNV
            (uint program, int location, const T *value);
   
    operate identically to the corresponding command where "Program" is
    deleted from the name (and extension suffixes are dropped or updated
    appropriately) except, rather than updating the currently active program
    object, these "Program" commands update the program object named by the
    <program> parameter.  ...

Dependencies on EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit

    The EXT_vertex_attrib_64bit extension provides the ability to specify
    64-bit floating-point vertex attributes in a GLSL vertex shader and the
    specify the values of these attributes via the OpenGL API.  To
    successfully compile vertex shaders with fp64 input variables, is
    necessary to include

      #extension GL_EXT_vertex_attrib_64bit : enable

    in the shader text.

    However, this extension is considered to enable 64-bit floating-point and
    integer inputs.  Including the following code in a vertex shader

      #extension GL_NV_gpu_shader5 : enable

    will 64-bit floating-point or integer input variables whose values would
    be specified using the OpenGL API mechanisms found in the
    EXT_vertex_attrib_64bit and NV_vertex_attrib_integer_64bit extensions.

Errors

    None.

New State

    None.

New Implementation Dependent State

    None.

Issues

    (1) What implicit conversions are supported by this extension on top of
        those provided by related extensions?

      RESOLVED:  ARB_gpu_shader5 and ARB_gpu_shader_fp64 provide new implicit
      conversions from "int" to "uint", and from "int", "uint", and "float" to
      "double".

      This extension provides integer types of multiple sizes and supports
      implicit conversions from small integer types to 32- or 64-bit integer
      types of the same signedness, as well as float and double.  It also
      provides floating-point types of multiple sizes and supports implicit
      conversions from smaller to larger types.  Additionally, it supports
      conversion from 64-bit integer types to double.

    (2) How do these implicit conversions impact binary operators?

      RESOLVED:  For binary operators, we prefer converting to a common type
      that is as close as possible in size and type to the original
      expression.

    (3) How do these implicit conversions impact function overloading rules?

      RESOLVED:  We extend the preference rules in ARB_gpu_shader5 to account
      for the new data types, adding rules to:

        * favor new "promotions" in integer/floating point types (previously,
          the only promotion was float-to-double)

        * for promotions, favor conversion to the type closer in size (e.g.,
          prefer converting from int16_t to int over converting to int64_t)

    (4) What should be done to distinguish between 32- and 64-bit integer
        constants?

      RESOLVED:  We will use "L" and "UL" to identify signed and unsigned
      64-bit integer constants; the use of "L" matches a similar ("long")
      suffix in the C programming language.  C leaves the size of integer
      types implementation-dependent, and many implementations require an "LL"
      suffix to declare 64-bit integer constants.  With our size definitions,
      "L" will be considered sufficient to make an integer constant 64-bit.

    (5) Should provide support for vertex attributes with 64-bit components,
        and if so, how should the support be provided in the OpenGL API?

      RESOLVED:  Yes, this seems like useful functionality, particularly for
      applications wanting to provide double-precision or 64-bit integer data
      to shaders performing computations on such types.  We provide
      VertexAttribL* entry points for 64-bit components in the separate
      EXT_vertex_attrib_64bit and NV_vertex_attrib_64bit extensions, which
      should be supported on all implementations supporting this extension.

    (6) Should we allow vertex attributes with 8- or 16-bit components in the
        shading language, and if so, how does it interact with the OpenGL API?

      RESOLVED:  Yes, but we will use existing APIs to specify such
      attributes, which already typically allow 8- and 16-bit components on
      the API side.  Vertex attribute components (other than 64-bit ones)
      specified by the API will be converted from the type specified in the
      vertex attribute commands to the component type of the attribute.  For
      floating-point values, that may involve 16-to-32 bit conversion or vice
      versa.  For integer types, that may involve dropping all but the least
      significant bits of attribute components.

    (7) Should we support uniforms with double or 64-bit attribute types, and
        if so, how?  Should we support uniforms with <32-bit components, and
        if so, how?

      RESOLVED:  We will support uniforms of all component types, either in a
      buffer object (via OpenGL 3.1 or ARB_uniform_buffer_object) or in
      storage associated with the program.

      When uniforms are stored in buffer object, they are stored using their
      native data types according to the pre-existing packing and layout
      rules.  Those rules were already written to be able to accommodate both
      the larger and smaller new data types.

      Uniforms stored in program objects are loaded with Uniform* APIs.  There
      are no pre-existing uniform APIs accepting doubles or other "long"
      types, so there was no clear need to add an extra "L" to the name to
      distinguish from other APIs like we do with VertexAttribL* APIs.

      Uniforms with 8- and 16- bit components are loaded with the "larger"
      Uniform*{i,ui,f} APIs; it didn't seem worth it to add numerous entry
      points to the APIs to handle all those new types.

    (8) How do the uniform loading commands introduced by this extension
        interact similar commands added by NV_shader_buffer_load?

      RESOLVED:  NV_shader_buffer_load provided the command Uniformui64NV to
      load pointer uniforms with a single 64-bit unsigned integer.  This
      extension provides vectors of 64-bit unsigned integers, so we needed
      Uniform{2,3,4}ui64NV commands.  We chose to provide a Uniform1ui64NV
      command, which will be functionally equivalent to Uniformui64NV.

    (9) How will transform feedback work for capturing variables with double
        or 64-bit components?  Should we support transform feedback on
        variables with components with fewer than 32 bits?

      RESOLVED:  Transform feedback will support variables with any component
      size.  Components with fewer than 32-bits are converted to their
      equivalent 32-bit types.

      For doubles and variables with 64-bit components, each component
      captured will count as 64-bit values and occupy two components for the
      purpose of component counting rules.  This could be a problem for the
      SEPARATE_ATTRIBS mode, since the minimum component limit is four, which
      would not be sufficient to capture a dvec3 or dvec4.  However,
      implementations supporting this extension should also be able to support
      ARB_transform_feedback3, which extends INTERLEAVED_ATTRIBS mode to
      capture vertex attribute values interleaved into multiple buffers.  That
      functionality effectively obsoletes the SEPARATE_ATTRIBS mode, since it
      is a functional superset.

      We considered support for capturing 8- and 16-bit values directly, which
      had a number of problems.  First, full byte addressing might impose both
      alignment issues (e.g., capturing a uint8_t followed by a float might
      misalign the float) and additional hardware implementation burdens.  One
      other option would be to pack multiple values into a 32-bit integer
      (e.g., f16vec2 would be packed with .x in the LSBs and .y in the MSBs).
      This could work, even with word addressing, but would require padding
      for odd sizes (e.g., f16vec2 padded to two words, with the second word
      holding only .z).  It would also have endianness issues; packed values
      would look like arrays of the corresponding smaller type on
      little-endian systems, but not on big-endian ones.

    (10) What precision will be used for computation, storage, and inter-stage
         transfer of 8- and 16-bit component data types?

      RESOLVED:  The components may be considered to occupy a full 32 bits for
      the purposes of input/output component count limits.  8- and 16-bit
      values should, however, be passed at that precision.

    (11) Is the new support for non-constant texel offsets completely
         orthogonal?

      RESOLVED:  No.  Non-constant offsets are not supported for the existing
      functions textureGradOffset() and textureProjGradOffset(), or for the
      new functions textureGatherOffsets() and shadowGatherOffsets().

    (12) Should we provide functions like intBitsToFloat() that operate on
         16-bit floating-point values?

      RESOLVED:  Not in this extension.  Such conversions can be performed
      using the following code:

        uint16_t float16BitsToUint16(float16_t v)
        {
          return uint16_t(packFloat2x16(f16vec2(v, 0));
        }

        float16_t uint16BitsToFloat16(uint16_t v)
        {
          return unpackFloat2x16(uint(v)).x;
        }

    (13) Should we provide distinct sized types for 32-bit integers and
         floats, and 64-bit floats?  Should we provide those types as aliases
         for existing unsized types?  Or should we provide no such types at
         all?

      RESOLVED:  We will provide sized versions of these types, which are
      defined as completely equivalent to unsized types according to the
      following table:

        unsized type     sized types
        -------------    ---------------
        int              int32_t
        uint             uint32_t
        float            float32_t
        double           float64_t

      Vector types with sized and unsized components have equivalent
      relationships.

      Note that the nominally "unsized" data types in the GLSL 1.30 spec are
      actually sized.  The specification explicitly defines signed and unsized
      integers (int, uint) to be 32-bit values.  It also defines
      floating-point values to "match the IEEE single precision floating-point
      definition for precision and dynamic range", which are also 32-bit
      values.

      This type equivalence has minor implications on function overloading:

        * You can't declare separate versions of a function with an "int"
          argument in one version and an "int32_t" argument in another.

        * Because there is no implicit conversion between equivalent types, we
          will get an exact match if an argument is declared with one type
          (e.g., "int") in the caller and a textually different but equivalent
          type ("int32_t") in the function.

      Note that the type equivalence also applies to API data type queries.
      For example, the type INT will be returned for a variable declared as
      "int32_t".

    (14) What are functions like anyThreadNV() and allThreadsNV() good for?

      NRESOLVED:  If an implementation performs SIMD thread execution,
      divergent branching may result in reduced performance if the "if" and
      "else" blocks of an "if" statement are executed sequentially.  For
      example, an algorithm may have both a "fast path" that performs a
      computation quickly for a subset of all cases and a "fast path" that
      performs a computation quickly but correctly.  When performing SIMD
      execution, code like the following:

        if (condition) {
          result = do_fast_path(...);
        } else {
          result = do_slow_path(...);
        }

      may end up executing *both* the fast and slow paths for a SIMD thread
      group if <condition> diverges, and may execute more slowly than simply
      executing the slow path unconditionally.  These functions allow code
      like:

        if (allThreadsNV(condition)) {
          result = do_fast_path(...);
        } else {
          result = do_slow_path(...);
        }

      that executes the fast path if and only if it can be used for *all*
      threads in the group.  For thread groups where <condition> diverges,
      this algorithm would unconditionally run the slow path, but would never
      run both in sequence.

      There may be other cases where "voting" across shader invocations may be
      useful.  Note that we provide no control over how shader invocations may
      be packed within a SIMD thread group, unlike various "compute" APIs
      (CUDA, OpenCL).

    (15) Can the 64-bit uniform APIs be used to load values for uniforms of
         type "bool", "bvec2", "bvec3", or "bvec4"?

      RESOLVED:  No.  OpenGL 2.0 and beyond did allow "bool" variable to be
      set with Uniform*i* and Uniform*f APIs, and OpenGL 3.0 extended that
      support to Uniform*ui* for orthogonality.  But it seems pointless to
      extended this capability forward to 64-bit Uniform APIs as well.

    (19) The ARB_tessellation_shader extension adds support for patch
         primitives that might survive to the transform feedback stage.  How
         are such primitives captured?

      RESOLVED:  If patch primitives survive to the transform feedback stage,
      they are recorded on a patch-by-patch basis.  Incomplete patches are not
      recorded.  As with other primitive types, if the transform feedback
      buffers do not contain enough space to capture an entire patch, no
      vertices are recorded.

      Note that the only way to get patch primitives all the way to transform
      feedback is to have tessellation evaluation and geometry shaders
      disabled; the output streams from both of those shader stages are
      collections of points, lines, or triangles.

    (20) Previous transform feedback allowed capturing only fixed-size
         primitives; this extension supports variable-sized patches.  What
         interactions does this functionality have with transform feedback
         buffer overflow?

      RESOLVED:  With fixed-size point, line, or triangle primitives, once any
      primitive fails to be recorded due to insufficient space, all subsequent
      primitives would also fail.  With variable-size patch primitives, the
      transform feedback stage might first receive a large patch that doesn't
      fit, followed by a smaller patch that could squeeze into the remaining
      space.  

      To allow for different types of implementation of this extension without
      requiring special-case handling of this corner case, we've chosen to
      leave this behavior undefined -- the smaller patch may or may not be
      recorded.


Revision History

    Rev.    Date    Author    Changes
    ----  --------  --------  -----------------------------------------
     6    03/23/10  pbrown    Update overview, dependencies, remove references
                              to old extension names.  Extend the function
                              overloading prioritization rules from
                              ARB_gpu_shader5 to account for new data types.
                              Major overhaul of the issues section to match
                              the refactoring done to produce ARB specs.

     5    03/08/10  pbrown    Add interaction with EXT_vertex_attrib_64bit and
                              NV_vertex_attrib_integer_64bit; enabling this
                              extension automatically enables 64-bit floating-
                              point and integer vertex inputs.

     4    03/01/10  pbrown    Fix prototype for GetUniformui64vNV.

     3    01/14/10  pbrown    Fix with updated enum assignments.

     2    12/08/09  pbrown    Add explicit component counting rules for
                              64-bit integer attributes similar to those
                              in the ARB_gpu_shader_fp64 spec.

     1              pbrown    Internal revisions.