NVIDIA GPU Computing Documentation

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, or download the complete SDK.

Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.


CUDA Getting Started Guide (Windows) This guide will show you how to install and check the correct operation of the CUDA development tools in Windows.		Download

CUDA Getting Started Guide (Linux) This guide will show you how to install and check the correct operation of the CUDA development tools in Linux.		Download

CUDA Getting Started Guide (Mac OS X) This guide will show you how to install and check the correct operation of the CUDA development tools in Mac OS X.		Download

Getting Started with CUDA SDK samples This guide covers the introductary CUDA SDK samples beginning CUDA developers should review before developing your own projects.		Download

SDK Code Sample Guide New Features in CUDA Toolkit 4.2 This guide covers what is new in CUDA Toolkit 4.2 and the new code samples that are part of the CUDA SDK 4.2.		Download

CUDA Toolkit 4.2 Release Notes NVIDIA CUDA Toolkit version 4.2 Release Notes for all OS Platforms		Download

CUDA C Programming Guide This is a detailed programming guide for CUDA C developers.		Download

CUDA C Best Practices Guide This is a manual to help developers obtain the best performance from the NVIDIA CUDA Architecture. It presents established optimization techniques and explains coding metaphors and idioms that can greatly simplify progarmming for the CUDA architecture.		Download

CUDA Occupancy Calculator The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. This tool provides guidance for optimizing the best kernel launch configuration for the best possible occupancy for the GPU.		Download

CUDA Developer Guide for Optimus Platforms This document provides guidance to CUDA developers and explains how NVIDIA CUDA APIs can be used to query for GPU capabilities in Optimus systems. It is strongly recommended to follow these guidelines to ensure CUDA applications are compatible with all notebooks featuring Optimus.		Download

OpenCL Programming Guide This is a detailed programming guide for OpenCL developers.		Download

OpenCL Best Practices Guide This is a manual to help developers obtain the best performance from OpenCL.		Download

OpenCL Overview for the CUDA Architecture This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs.		Download

OpenCL Implementation Notes This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification.		Download

DirectCompute Programming Guide This is a detailed programming guide for DirectCompute developers.		Download

CUDA API Reference Manual (HTML) This is the CUDA Runtime and Driver API reference manual in HTML format.		Browse Online

CUDA API Reference Manual (PDF) This is the CUDA Runtime and Driver API reference manual in PDF format.		Download

CUDA API Reference Manual (CHM) This is the CUDA Runtime and Driver API reference manual in CHM format (Microsoft Compiled HTML help).		Download

Floating Point and IEEE 754 Compliance for NVIDIA GPUs A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white pa- per is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.		Download

Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS In this white paper we show how to use the CUSPARSE and CUBLAS libraries to achieve a 2× speedup over CPU in the incomplete-LU and Cholesky preconditioned iterative methods. We focus on the Bi-Conjugate Gradient Stabilized and Conjugate Gradient iterative methods, that can be used to solve large sparse nonsymmetric and symmetric positive defi- nite linear systems, respectively. Also, we comment on the parallel sparse triangular solve, which is an essential building block in these algorithms.		Download

Using Inline PTX Assembly in CUDA The NVIDIA® CUDATM programming environment provides a parallel thread execution (PTX) instruction set architecture (ISA) for using the GPU as a data-parallel computing device. For more information on the PTX ISA, refer to the latest version of the PTX ISA reference document ptx_isa_[version].pdf in the CUDA Toolkit doc folder. This application note describes how to inline PTX assembly language statements into CUDA code.		Download

The CUDA Compiler Driver (NVCC) This CUDA compiler driver allows one to compile each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as generation of device code repositories). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers."		Download

NVML: API Reference (PDF) The NVIDIA Management Library (NVML) is a C-based programatic interface for monitoring and managing various states within NVIDIA Tesla GPUs. It is intended to be a platform for building 3rd party applications, and is also the underlying library for the NVIDIA-supported nvidia-smi tool.		Browse Online

PTX: Parallel Thread Execution ISA Version 3.0 This document describes PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device.		Download

CUDA-Memcheck User Manual The CUDA debugger tool, cuda-gdb, includes a memory-checking feature for detecting and debugging memory errors in CUDA applications. This document describes that feature and tool, called cuda-memcheck. The cuda-memcheck tool is designed to detect such memory access errors in your CUDA application.		Download

CUDA-gdb Debugger User Manual CUDA-GDB is the NVIDIA tool for debugging CUDA applications running on Linux and Mac. The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware. CUDA-GDB runs on Linux and Mac OS X, 32-bit and 64-bit. The Linux edition is based on GDB 6.6 whereas the Mac edition is based on GDB 6.3.5.		Download

Compute Command Line Profiler User Guide The Compute Command Line Profiler is a command line based profiling tool that can be used to measure performance and find potential opportunities for CUDA and OpenCL optimizations, to achieve maximum performance from NVIDIA GPUs. The Compute Command Line Profiler provides metrics in the form of plots and counter values presented in tables and as graphs. It tracks events with hardware counters on signals in the chip; this is explained in detail in the chapter entitled, "Compute Command Line Profiler Counters."		Download

CUDA Fermi Compatibility Guide The Fermi Compatibility Guide for CUDA Applications is intended to help developers ensure that their NVIDIA CUDA applications will run effectively on GPUs based on the NVIDIA Fermi Architecture. This document provides guidance to developers who are already familiar with programming in CUDA C/C++ and want to make sure that their software applications are compatible with Fermi.		Download

CUDA Fermi Tuning Guide An overview on how to tune applications for Fermi to further increase these speedups is provided. More details are available in the CUDA C Programming Guide (version 3.2 and later) as noted throughout the document..		Download

CUBLAS Library User Guide The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.		Download

CUFFT Library User Guide This document describes CUFFT, the NVIDIA CUDA Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating-point power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation.		Download

CUSPARSE Library User Guide The NVIDIA CUDA CUSPARSE library contains a set of basic linear algebra subroutines used for handling sparse matrices and is designed to be called from C or C++. These subroutines can be classified in four categories.		Download

CURAND Library User Guide The NVIDIA CURAND library provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers.		Download

NVIDIA Performance Primitives (NPP) Library User Guide NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.		Download

CUDA Profiler Tools SDK Interface (CUPTI) User Guide The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs, the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.		Download

CUDA Profiler Tools SDK Interface Release Notes The CUDA Profiler Tools Interface Release Notes.		Download

Thrust Quick Start Guide Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.		Download

NVIDIA CUDA H.264 Video Encoder Library User Guide The NVIDIA CUDA H.264 Video Encoder is a library for performing CUDA accelerated video encoding. The functionality in the library takes raw YUV frames as input and generates NAL packets. This encoder supports up to various profiles up to High Profile @ Level 4.1.		Download

NVIDIA CUDA Video Decoder Library User Guide The CUDA Video Decoder API gives developers access to hardware video decoding capabilities on NVIDIA GPU. The actual hardware decode can run on either Video Processor (VP) or CUDA hardware, depending on the hardware capabilities and the codecs. This API supports the following video stream formats for Linux and Windows platforms: MPEG-2, VC-1, and H.264 (AVCHD).		Download

CUDA C SDK Release Notes CUDA C SDK Release Notes.		Download

DirectCompute SDK Release Notes DirectCompute SDK Release Notes.		Download

OpenCL SDK Release Notes OpenCL SDK Release Notes.		Download

CUDA Toolkit Software License Agreement Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.		Download

GPU Computing SDK End User License Agreement This is the Software License Agreement for developers or licensees.		Download