================================================================================
NVIDIA CUDA Toolkit v4.0 Errata for Windows, Linux, and MacOS X
================================================================================
--------------------------------------------------------------------------------
Revision History
--------------------------------------------------------------------------------

Last updated 9/06/11 Version 4.0

--------------------------------------------------------------------------------
Resolved Issues
--------------------------------------------------------------------------------

* Previous version of the Errata reported that for applications using multiple streams CUDA Visual Profiler can drop profiler data rows and that the following error is reported: 
"In this profiling session some profiler output rows are dropped due to incorrect gpu time stamp values and the profiler output is incomplete."
This issue has been fixed with a patch for Linux toolkits. You can download the patches from the main download page:
http://developer.nvidia.com/cuda-toolkit-40
Each patch is associated with its appropriate Linux package; the description section in the Downloads column specifies "Visual Profiler Patch" in parentheses.

--------------------------------------------------------------------------------
Known Issues
--------------------------------------------------------------------------------

* Visual Profiler incorrectly treats kernels with names that start with "memcpy" as being memory copies. As a result, profiling data reported for these kernels is incorrect. To workaround this issue the kernel name should be changed so that it does not start with "memcpy". 

* A 64-bit application, with the OS configured as 32-bit kernel running on driver versions prior to the CUDA 4.0.31, may crash.

Follow these steps to determine your default OS kernel configuration:
1. Choose About This Mac from the Apple menu.
2. Click on More Info. 
3. Select Software in the Contents pane.
4. Look for "64-bit Kernel and Extensions: Yes (or No)" under the System Software Overview heading.
With the CUDA driver 4.0.31 driver for Mac, a CUDA context cannot be created in this mode- 32-bit kernel, 64-bit CUDA application.  
If a 64-bit CUDA application tries to create a CUDA context in this mode, cuInit() will return a CUDA error.

The CUDA driver 4.0.31 on Mac OSX 10.7 supports the following configurations:
- 32-bit Kernel running with 32-bit CUDA application.
- 64-bit Kernel running with 64-bit CUDA application.
Support for 32-bit OS kernel with 64-bit CUDA applications will require a future CUDA driver update in conjunction with a Lion Software Update.

If your system is running as a 32-bit kernel, and you want to run a 64-bit CUDA application, one option is to set your OS to run in 64-bit kernel mode. This requires the Apple system hardware to support the OS running in 64-bit kernel; please refer to the Apple website for a detailed list of supported hardware.  

You can enable your OS to run in 64-bit kernel mode using one of the following ways:
At startup time:
-If 32-bit kernel is your default configuration, holding 6 and 4 keys during startup will boot into 64-bit kernel mode.
To change the default configuration for the current startup disk (persistent):
-To 64-bit kernel, open a Terminal Window with the command:
sudo systemsetup -setkernelbootarchitecture x86_64
-To 32-bit kernel, open a Terminal Window with the command:
sudo systemsetup -setkernelbootarchitecture i386

Note: Any OSX using XCODE4.0 or higher will be supported starting with CUDA 4.1 due for release late this year.  Any pre-built CUDA applications will work with the released CUDA driver for 10.7 but there is no tool chain support to create new CUDA applications on 10.7 or XCODE version 4.0 or higher until CUDA 4.1.


* The CUDA 4.0 SDK code samples for Windows platforms have been updated from version 4.0.17 to 4.0.19 to address the following issues:
1. Problems with building DEBUG targets using Visual Studio 2010
Specifically, the Visual Studio 2010 cutil project solution file did not build correctly when a DEBUG configuration was chosen. The .sln/.vcxproj solution and project files have been updated to resolve this.
2. The CUDA 4.0 SDK projects build using the last installed CUDA Toolkit instead of the latest one.  
In some cases where a developer had both CUDA 3.2 or 4.0 Toolkit installed, Visual Studio 2010 SDK projects would choose the last installed toolkit, instead of the newest one. CUDA project files previously specified the include paths to be $(CUDA_PATH)\include. To address this, SDK sample projects now specify either $(CudaToolkitIncludeDir) or $(CudaToolkitDir)\include.
3. Individual SDK solutions from VS2005, VS2008, VS2010 do not build properly.
Each SDK sample solution may depend on cutil, shrUtils, or oclUtils libraries which are also part of the SDK. In order to build with the proper dependencies, developers needed to open the release_vs200?.sln solution file for all dependencies to work. The individual SDK sample solutions for CUDA, CUDALibraries, and OpenCL now include dependencies from individual solution files.

* In some cases, Visual Profiler global memory derived statistics and hints may be incorrect. If the kernel has local memory accesses, the derived statistics- "global memory excess load %" and "global memory excess store %" can yield incorrect results. This is because the L2 throughput that is used to calculate these values include local memory accesses too. As a result, the hints which use these statistics are incorrect as well since the excess loads given by this formula are caused due to the local memory accesses (in addition to possibly uncoalesced memory access pattern).

* In a multi-gpu setup when compute mode is set to "compute prohibited" for some GPUs, the Visual Profiler cannot profile a CUDA runtime application; Visual Profiler reports an error and profiling data is not shown. 

* CudaHostRegister() is not supported in RHEL4. Please refer to the NVIDIA CUDA C Programming Guide for details on CudaHostRegister().

--------------------------------------------------------------------------------
More Information
--------------------------------------------------------------------------------

For more information and help with CUDA, please visit:
http://www.nvidia.com/cuda

--------------------------------------------------------------------------------