-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- NVIDIA CUDA MacOS X Release Notes Version 2.3 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- New Features -------------------------------------------------------------------------------- Hardware Support o See http://www.nvidia.com/object/cuda_learn_products.html CUFFT Features o Performance enhancements o Double precision - CUFFT now supports double-precision transforms, with types and functions analagous to the existing single-precision versions. Similarly, the "cufftType" enumeration (used in calls like cufftPlan1d) has expanded to include double-precision identifiers: Precision: Single Double Type: cufftReal cufftDoubleReal Type: cufftComplex cufftDoubleComplex cufftType: CUFFT_R2C CUFFT_D2Z cufftType: CUFFT_C2R CUFFT_Z2D cufftType: CUFFT_C2C CUFFT_Z2Z Function: cufftExecC2C cufftExecZ2Z Function: cufftExecR2C cufftExecD2Z Function: cufftExecC2R cufftExecZ2D - The double-precision versions are invoked in an identical manner to the single-precision ones, obviously with arguments changed from the single- to the double-precision types. See "cufft.h" for exact definitions of the above. Separate Packaging o CUDA Driver and CUDA Toolkit are now available via separate packages Double Handling by the Compiler o when a ptx file with an sm version prior to sm_13 contains double precision instructions, ptxas now emits a warning that double precision instructions are demoted to single precision. ptxas has a new option --suppress-double-demote-warning to suppress this warning -------------------------------------------------------------------------------- Major Bug Fixes -------------------------------------------------------------------------------- C++ Support for Device Emulation o Support is restored for using C++ code in device emulation mode -------------------------------------------------------------------------------- Known Issues -------------------------------------------------------------------------------- o GPU enumeration order on multi-GPU systems is non-deterministic and may change with this or future releases. Users should make sure to enumerate all CUDA-capable GPUs in the system and select the most appropriate one(s) to use. o OpenGL interop will always use a software path leading to reduced performance when compared to interop on other platforms. o CUDA kernels which do not terminate or run without interruption for several tens of seconds may trigger the GPU to reset causing a disruption of any attached displays. This may cause display image to become corrupted, which will disappear upon a reboot. o If a GPU is used without a display attached it may not exit a reduced power state, causing CUDA programs to perfom poorly when run on that GPU. Cycling the system's power saving state or rebooting should reset the GPU. In general it is best to use a GPU with a display attached. o The kernel driver may leak wired (i.e. unpageable memory) if CUDA applications terminate in unexpected ways. Continued leaks will lead to severely degraded system performance and requires a reboot to fix. o When compiling GCC, special care must be taken for structs that contain 64-bit integers. This is because GCC aligns long longs to a 4 byte boundary by default, while NVCC aligns long longs to an 8 byte boundary by default. Thus, when using GCC to compile a file that has a struct/union, users must give the -malign-double option to GCC. When using NVCC, this option is automatically passed to GCC. o It is a known issue that cudaThreadExit() may not be called implicitly on host thread exit. Due to this, developers are recommended to explicitly call cudaThreadExit() while the issue is being resolved. o On systems with multiple GPUs installed or systems with multiple monitors connected to a single GPU, OpenGL interoperability always copies shared buffers through host memory. o Current hardware limits the number of asynchronous memcopies that can be overlapped with kernel execution. Overlap is also limited to kernels executing for less than 1 second. These limitations are expected to improve on future hardware. o The following APIs exhibit high CPU utilization if they wait for the hardware for a significant amount of time. As a workaround, apps may use cu(da)StreamQuery and/or cu(da)EventQuery to check whether the GPU is busy and yield the thread as desired. - cuCtxSynchronize - cuEventSynchronize - cuStreamSynchronize - cudaThreadSynchronize - cudaEventSynchronize - cudaStreamSynchronize o OpenGL interoperability - OpenGL cannot access a buffer that is currently *mapped*. If the buffer is registered but not mapped, OpenGL can do any requested operations on the buffer. - Deleting a buffer while it is mapped for CUDA results in undefined behavior. - Attempting to map or unmap while a different context is bound than was current during the buffer register operation will generally result in a program error and should thus be avoided. o When the profiler gathers performance signals on G80-based products, the driver reduces the clock rate on the device. If the CUDA app crashes or otherwise exits uncleanly, the clocks will not be reset to their previous values. The system must be rebooted to restore the original clock rate. o The MacBook Pro currently presents both GPUs as available for use in Performance mode. This is incorrect behavior, as only one GPU is available at a time. CUDA applications that try to run on the second GPU (device ID 1) will potentially hang. This hang may be terminated by pressing ctrl-C or closing the offending application. o The shared libraries should not be redistributed from this release. Any CUDA Application shipped on Macintosh requires the end-user install the CUDA driver from the CUDA install package. -------------------------------------------------------------------------------- Open64 Sources -------------------------------------------------------------------------------- The Open64 source files are controlled under terms of the GPL license. Current and previously released versions are located via anonymous ftp at download.nvidia.com in the CUDAOpen64 directory. -------------------------------------------------------------------------------- Revision History -------------------------------------------------------------------------------- 07/2009 - Version 2.3 06/2009 - Version 2.3 Beta 05/2009 - Version 2.2 03/2009 - Version 2.2 Beta 03/2009 - Version 2.1 Beta 07/2008 - Version 2.0 01/2008 - Version 1.1 - Initial public Beta -------------------------------------------------------------------------------- More Information -------------------------------------------------------------------------------- For more information and help with CUDA, please visit http://www.nvidia.com/cuda