================================================================================ NVIDIA CUDA Toolkit v3.2 Errata for Windows, Linux, and MacOS X ================================================================================ -------------------------------------------------------------------------------- Revision History -------------------------------------------------------------------------------- Last updated 2/16/2011 Version 3.2 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- Documentation Related -------------------------------------------------------------------------------- * The document- ptx_isa_2.2.pdf in your installation directory has been updated. Get the latest from the 3.2 downloads page: www.nvidia.com/getcuda * The command line Compute Profiler document "Compute_Profiler.txt" in your installation directory has incorrect counter names for "instructions_*" . The correct counter names for compute capability 2.0 or higher are as follows: inst_issued inst_executed inst_issued1_0 inst_issued2_0 inst_issued1_1 inst_issued2_1 i.e. counter names should be "inst_*" and not "instructions_*". -------------------------------------------------------------------------------- Linux Related -------------------------------------------------------------------------------- Linux users: Please review the "Install the NVIDIA Driver" section in the Getting Started Guide for Linux: http://developer.download.nvidia.com/compute/cuda/3_2_prod/docs/Getting_Started_Linux.pdf -------------------------------------------------------------------------------- Known Issues -------------------------------------------------------------------------------- * Visual Studio custom build rules for CUDA toolkit 3.2 does not allow specifying sm_21 GPU architecture. For users who use Visual Studio custom build rules files with CUDA toolkit 3.2, the following files have been updated to include the sm_21 option for GPU architecture. - NvCudaDriverApi.rules - NvCudaDriverApi.v3.2.rules - NvCudaRuntimeApi.rules - NvCudaRuntimeApi.v3.2.rules The rules files can be obtained from the following .zip file: http://developer.download.nvidia.com/compute/cuda/3_2_prod/toolkit/cudatoolkit_3.2_win_buildrules-patch.zip Please copy and replace existing files in the Visual Studio VCProjectDefaults folder with the new files above. The typical Visual Studio VCProjectDefaults location is as follows: Win32 = C:\Program Files\Microsoft Visual Studio 9.0\VC\VCProjectDefaults Win64 = C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\VCProjectDefaults * In CUBLAS 3.2, the GEMM, SYRK, and HERK routines for Fermi GPUs can enter an infinite recursion leading to an application crash for certain input sizes meeting the criteria below. To work around this problem, the input to CUBLAS must be recursively subdivided until the individual calls to these CUBLAS routines do not match these criteria. Given threshold size T, where T is equal to 2^27 - 512 (i.e., 134217216), the crash might be seen in any of the following circumstances: 1) A is not transposed, lda * k >= T, and T is divisible by lda. 2) B is not transposed, ldb * n >= T, T is divisible by n, and n is divisible by 32 3) A is transposed, lda * m >= T, T is divisible by m, and m is divisible by 32 4) B is transposed, ldb * k >= T, and T is divisible by ldb. This issue will be fixed in the next release of the CUDA Toolkit. * For the CGEMM kernel used in some instances on Fermi GPUs when "m" is not a multiple of 16, a few bytes past the end of the "A" matrix are unnecessarily fetched. Under certain conditions, this can lead to a kernel launch failure (though in no circumstances does it lead to incorrect results). A workaround for this issue is to round the size of the memory allocated for matrix "A" up to the next highest multiple of 64 bytes. This issue will be fixed in the next release of the CUDA Toolkit. -------------------------------------------------------------------------------- Mac Related -------------------------------------------------------------------------------- * To save power, some Apple products automatically power-down the CUDA-capable GPU in the system. If the operating system has powered down the CUDA-capable GPU, CUDA fails to run and the system returns an error that no device was found. In order to ensure that your CUDA-capable GPU is not powered down by the operating system do the following: 1. Go to "System Preferences" 2. Open the "Energy Saver" section 3. Un-check the "Automatic graphics switching" check box in the upper left * On MacOS only, the NVIDIA C Compiler (nvcc) handles size_t incorrectly during 64-bit compilation. The version of nvcc included with CUDA Toolkit 3.2 fails to handle variables of type size_t as an 8-byte entity in PTX when compiling 64-bit device code. To address this issue, NVIDIA has released a patch that updates components of nvcc. The patch is available as "CUDA Toolkit: GFEC Patch for MacOS" from the following location: http://developer.nvidia.com/object/cuda_3_2_downloads.html Please refer to additional information and installation instructions in the README file distributed with the patch. -------------------------------------------------------------------------------- More Information -------------------------------------------------------------------------------- For more information and help with CUDA, please visit: http://www.nvidia.com/cuda --------------------------------------------------------------------------------