================================================================================
NVIDIA CUDA Toolkit v3.2 for MacOS X : PATCH
================================================================================

In CUDA Toolkit 3.2 for MacOS, when compiling 64-bit device code, the
nvcc compiler will handle variables of type 'size_t' in the device code
incorrectly.  Consider the following example:

__global__ void kernel(size_t *out, size_t in)
{
    *out = in;
}

In this example, 'in' should be an 8-byte entity when the kernel is
built in 64-bit mode. However, the MacOS version of nvcc in CUDA
Toolkit 3.2 will instead store the value into a 4-byte register,
leading to unexpected results, as seen in the following PTX:

.entry _Z6kernelPiPmm (
    .param .u64 __cudaparm__Z6kernelPiPmm_out,
    .param .u32 __cudaparm__Z6kernelPiPmm_in) // NOTE: the .u32 is incorrect
{
    ...

To address this issue, NVIDIA has released this patch for the relevant
executable ('gfec') from the CUDA Toolkit 3.2 nvcc toolchain for MacOS.
Replace the existing open64/lib/gfec file in your CUDA Toolkit 3.2
installation with this new version if you are experiencing this issue.

Note that this issue does not apply to platforms other than MacOS, and
it was not present in CUDA Toolkit 3.1 or earlier.