-------------------------------------------------------------------------------- -------------------------------------------------------------------------------- NVIDIA CUDA MacOS X Release Notes Version 3.0 -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- New Features -------------------------------------------------------------------------------- Hardware Support o See http://www.nvidia.com/object/cuda_learn_products.html Platform Support o Added support for SnowLeopard o CUBLAS Library Support - Added the BLAS1 functions: * cublasZaxpy() * cublasZcopy() * cublasZswap() - Added the BLAS2 functions: * cublasDtrmv() * cublasCtrmv() * cublasCgemv() * cublasCgeru() * cublasCgerc() * cublasZtrmv() * cublasZgemv() * cublasZgeru() * cublasZgerc() - Added the BLAS3 functions: * cublasCtrsm() * cublasCtrmm() * cublasCsyrk() * cublasCsymm() * cublasCherk() * cublasZtrsm() * cublasZtrmm() * cublasZsyrk() * cublasZsymm() * cublasZherk() -------------------------------------------------------------------------------- Bug Fixes -------------------------------------------------------------------------------- o The asynchronous memcpy routines require the user to pass pinned memory allocations for any host pointers. In Cuda 2.1, 2.2, and 2.3, no error was returned if you used non-pinned memory with the NULL stream in some Host-to-Device memcpy operations. This release adds back the appropriate error check and returns cudaErrorInvalidValue or CUDA_ERROR_INVALID_VALUE when an application uses non-pinned memory in such a transfer. -------------------------------------------------------------------------------- Known Issues -------------------------------------------------------------------------------- o GPU enumeration order on multi-GPU systems is non-deterministic and may change with this or future releases. Users should make sure to enumerate all CUDA-capable GPUs in the system and select the most appropriate one(s) to use. o OpenGL interop will always use a software path leading to reduced performance when compared to interop on other platforms. o CUDA kernels which do not terminate or run without interruption for several tens of seconds may trigger the GPU to reset causing a disruption of any attached displays. This may cause display image to become corrupted, which will disappear upon a reboot. o If a GPU is used without a display attached it may not exit a reduced power state, causing CUDA programs to perfom poorly when run on that GPU. Cycling the system's power saving state or rebooting should reset the GPU. In general it is best to use a GPU with a display attached. o The kernel driver may leak wired (i.e. unpageable memory) if CUDA applications terminate in unexpected ways. Continued leaks will lead to severely degraded system performance and requires a reboot to fix. o When compiling GCC, special care must be taken for structs that contain 64-bit integers. This is because GCC aligns long longs to a 4 byte boundary by default, while NVCC aligns long longs to an 8 byte boundary by default. Thus, when using GCC to compile a file that has a struct/union, users must give the -malign-double option to GCC. When using NVCC, this option is automatically passed to GCC. o It is a known issue that cudaThreadExit() may not be called implicitly on host thread exit. Due to this, developers are recommended to explicitly call cudaThreadExit() while the issue is being resolved. o On systems with multiple GPUs installed or systems with multiple monitors connected to a single GPU, OpenGL interoperability always copies shared buffers through host memory. o Current hardware limits the number of asynchronous memcopies that can be overlapped with kernel execution. Overlap is also limited to kernels executing for less than 1 second. These limitations are expected to improve on future hardware. o The following APIs exhibit high CPU utilization if they wait for the hardware for a significant amount of time. As a workaround, apps may use cu(da)StreamQuery and/or cu(da)EventQuery to check whether the GPU is busy and yield the thread as desired. - cuCtxSynchronize - cuEventSynchronize - cuStreamSynchronize - cudaThreadSynchronize - cudaEventSynchronize - cudaStreamSynchronize o OpenGL interoperability - OpenGL cannot access a buffer that is currently *mapped*. If the buffer is registered but not mapped, OpenGL can do any requested operations on the buffer. - Deleting a buffer while it is mapped for CUDA results in undefined behavior. - Attempting to map or unmap while a different context is bound than was current during the buffer register operation will generally result in a program error and should thus be avoided. o When the profiler gathers performance signals on G80-based products, the driver reduces the clock rate on the device. If the CUDA app crashes or otherwise exits uncleanly, the clocks will not be reset to their previous values. The system must be rebooted to restore the original clock rate. o The MacBook Pro currently presents both GPUs as available for use in Performance mode. This is incorrect behavior, as only one GPU is available at a time. CUDA applications that try to run on the second GPU (device ID 1) will potentially hang. This hang may be terminated by pressing ctrl-C or closing the offending application. o The shared libraries should not be redistributed from this release. Any CUDA Application shipped on Macintosh requires the end-user install the CUDA driver from the CUDA install package. -------------------------------------------------------------------------------- Open64 Sources -------------------------------------------------------------------------------- The Open64 source files are controlled under terms of the GPL license. Current and previously released versions are located via anonymous ftp at download.nvidia.com in the CUDAOpen64 directory. -------------------------------------------------------------------------------- Revision History -------------------------------------------------------------------------------- 10/2009 - Version 3.0 Beta 07/2009 - Version 2.3 06/2009 - Version 2.3 Beta 05/2009 - Version 2.2 03/2009 - Version 2.2 Beta 03/2009 - Version 2.1 Beta 07/2008 - Version 2.0 01/2008 - Version 1.1 - Initial public Beta -------------------------------------------------------------------------------- More Information -------------------------------------------------------------------------------- For more information and help with CUDA, please visit http://www.nvidia.com/cuda