NVIDIA GPU Computing Documentation

The GPU Computing SDK includes 100+ code samples, utilities, whitepapers, and additional documentation to help you get started developing, porting, and optimizing your applications for the CUDA architecture. You can get quick access to many of the SDK resources on this page, or download the complete SDK.

Please note that you may need to install the latest NVIDIA drivers and CUDA Toolkit to compile and run the code samples.


CUDA Getting Started Guide (Windows) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Windows.
 

Download


CUDA Getting Started Guide (Linux) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Linux.
 

Download


CUDA Getting Started Guide (Mac OS X) 

This guide will show you how to install and check the correct operation of the CUDA development tools in Mac OS X.
 

Download


Getting Started with CUDA SDK samples 

This guide covers the introductary CUDA SDK samples beginning CUDA developers should review before developing your own projects.
 

Download


SDK Code Sample Guide New Features in CUDA Toolkit 4.2 

This guide covers what is new in CUDA Toolkit 4.2 and the new code samples that are part of the CUDA SDK 4.2.
 

Download


CUDA Toolkit 4.2 Release Notes 

NVIDIA CUDA Toolkit version 4.2 Release Notes for all OS Platforms
 

Download


CUDA C Programming Guide 

This is a detailed programming guide for CUDA C developers.
 

Download


CUDA C Best Practices Guide 

This is a manual to help developers obtain the best performance from the NVIDIA CUDA Architecture. It presents established optimization techniques and explains coding metaphors and idioms that can greatly simplify progarmming for the CUDA architecture.
 

Download


CUDA Occupancy Calculator 

The CUDA Occupancy Calculator allows you to compute the multiprocessor occupancy of a GPU by a given CUDA kernel. This tool provides guidance for optimizing the best kernel launch configuration for the best possible occupancy for the GPU.
 

Download


CUDA Developer Guide for Optimus Platforms 

This document provides guidance to CUDA developers and explains how NVIDIA CUDA APIs can be used to query for GPU capabilities in Optimus systems. It is strongly recommended to follow these guidelines to ensure CUDA applications are compatible with all notebooks featuring Optimus.
 

Download


OpenCL Programming Guide 

This is a detailed programming guide for OpenCL developers.
 

Download


OpenCL Best Practices Guide 

This is a manual to help developers obtain the best performance from OpenCL.
 

Download


OpenCL Overview for the CUDA Architecture 

This whitepaper summarizes the guidelines for how to choose the best implementations for NVIDIA GPUs.
 

Download


OpenCL Implementation Notes 

This document describes the "Implementation Defined" behavior for the NVIDIA OpenCL implementation as required by the OpenCL specification Version: 1.0. The implementation defined behavior is referenced below in the order of it's reference in the OpenCL specification and is grouped by the section number for the specification.
 

Download


DirectCompute Programming Guide 

This is a detailed programming guide for DirectCompute developers.
 

Download


CUDA API Reference Manual (HTML) 

This is the CUDA Runtime and Driver API reference manual in HTML format.
 

Browse Online


CUDA API Reference Manual (PDF) 

This is the CUDA Runtime and Driver API reference manual in PDF format.
 

Download


CUDA API Reference Manual (CHM) 

This is the CUDA Runtime and Driver API reference manual in CHM format (Microsoft Compiled HTML help).
 

Download


Floating Point and IEEE 754 Compliance for NVIDIA GPUs 

A number of issues related to floating point accuracy and compliance are a frequent source of confusion on both CPUs and GPUs. The purpose of this white pa- per is to discuss the most common issues related to NVIDIA GPUs and to supplement the documentation in the CUDA C Programming Guide.
 

Download


Incomplete-LU and Cholesky Preconditioned Iterative Methods Using CUSPARSE and CUBLAS 

In this white paper we show how to use the CUSPARSE and CUBLAS libraries to achieve a 2× speedup over CPU in the incomplete-LU and Cholesky preconditioned iterative methods. We focus on the Bi-Conjugate Gradient Stabilized and Conjugate Gradient iterative methods, that can be used to solve large sparse nonsymmetric and symmetric positive defi- nite linear systems, respectively. Also, we comment on the parallel sparse triangular solve, which is an essential building block in these algorithms.
 

Download


Using Inline PTX Assembly in CUDA 

The NVIDIA® CUDATM programming environment provides a parallel thread execution (PTX) instruction set architecture (ISA) for using the GPU as a data-parallel computing device. For more information on the PTX ISA, refer to the latest version of the PTX ISA reference document ptx_isa_[version].pdf in the CUDA Toolkit doc folder. This application note describes how to inline PTX assembly language statements into CUDA code.
 

Download


The CUDA Compiler Driver (NVCC) 

This CUDA compiler driver allows one to compile each CUDA source file, and several of these steps are subtly different for different modes of CUDA compilation (such as generation of device code repositories). It is the purpose of the CUDA compiler driver nvcc to hide the intricate details of CUDA compilation from developers."
 

Download


NVML: API Reference (PDF) 

The NVIDIA Management Library (NVML) is a C-based programatic interface for monitoring and managing various states within NVIDIA Tesla GPUs. It is intended to be a platform for building 3rd party applications, and is also the underlying library for the NVIDIA-supported nvidia-smi tool.
 

Browse Online


PTX: Parallel Thread Execution ISA Version 3.0 

This document describes PTX, a low-level parallel thread execution virtual machine and instruction set architecture (ISA). PTX exposes the GPU as a data-parallel computing device.
 

Download


CUDA-Memcheck User Manual 

The CUDA debugger tool, cuda-gdb, includes a memory-checking feature for detecting and debugging memory errors in CUDA applications. This document describes that feature and tool, called cuda-memcheck. The cuda-memcheck tool is designed to detect such memory access errors in your CUDA application.
 

Download


CUDA-gdb Debugger User Manual 

CUDA-GDB is the NVIDIA tool for debugging CUDA applications running on Linux and Mac. The tool provides developers with a mechanism for debugging CUDA applications running on actual hardware. CUDA-GDB runs on Linux and Mac OS X, 32-bit and 64-bit. The Linux edition is based on GDB 6.6 whereas the Mac edition is based on GDB 6.3.5.
 

Download


Compute Command Line Profiler User Guide 

The Compute Command Line Profiler is a command line based profiling tool that can be used to measure performance and find potential opportunities for CUDA and OpenCL optimizations, to achieve maximum performance from NVIDIA GPUs. The Compute Command Line Profiler provides metrics in the form of plots and counter values presented in tables and as graphs. It tracks events with hardware counters on signals in the chip; this is explained in detail in the chapter entitled, "Compute Command Line Profiler Counters."
 

Download


CUDA Fermi Compatibility Guide 

The Fermi Compatibility Guide for CUDA Applications is intended to help developers ensure that their NVIDIA CUDA applications will run effectively on GPUs based on the NVIDIA Fermi Architecture. This document provides guidance to developers who are already familiar with programming in CUDA C/C++ and want to make sure that their software applications are compatible with Fermi.
 

Download


CUDA Fermi Tuning Guide 

An overview on how to tune applications for Fermi to further increase these speedups is provided. More details are available in the CUDA C Programming Guide (version 3.2 and later) as noted throughout the document..
 

Download


CUBLAS Library User Guide 

The CUBLAS library is an implementation of BLAS (Basic Linear Algebra Subprograms) on top of the NVIDIA CUDA runtime. It allows the user to access the computational resources of NVIDIA Graphical Processing Unit (GPU), but does not auto-parallelize across multiple GPUs.
 

Download


CUFFT Library User Guide 

This document describes CUFFT, the NVIDIA CUDA Fast Fourier Transform (FFT) library. The FFT is a divide-and-conquer algorithm for efficiently computing discrete Fourier transforms of complex or real-valued data sets, and it is one of the most important and widely used numerical algorithms, with applications that include computational physics and general signal processing. The CUFFT library provides a simple interface for computing parallel FFTs on an NVIDIA GPU, which allows users to leverage the floating-point power and parallelism of the GPU without having to develop a custom, GPUbased FFT implementation.
 

Download


CUSPARSE Library User Guide 

The NVIDIA CUDA CUSPARSE library contains a set of basic linear algebra subroutines used for handling sparse matrices and is designed to be called from C or C++. These subroutines can be classified in four categories.
 

Download


CURAND Library User Guide 

The NVIDIA CURAND library provides facilities that focus on the simple and efficient generation of high-quality pseudorandom and quasirandom numbers.
 

Download


NVIDIA Performance Primitives (NPP) Library User Guide 

NVIDIA NPP is a library of functions for performing CUDA accelerated processing. The initial set of functionality in the library focuses on imaging and video processing and is widely applicable for developers in these areas. NPP will evolve over time to encompass more of the compute heavy tasks in a variety of problem domains. The NPP library is written to maximize flexibility, while maintaining high performance.
 

Download


CUDA Profiler Tools SDK Interface (CUPTI) User Guide 

The CUDA Profiling Tools Interface (CUPTI) enables the creation of profiling and tracing tools that target CUDA applications. CUPTI provides four APIs, the Activity API, the Callback API, the Event API, and the Metric API. Using these APIs, you can develop profiling tools that give insight into the CPU and GPU behavior of CUDA applications. CUPTI is delivered as a dynamic library on all platforms supported by CUDA.
 

Download


CUDA Profiler Tools SDK Interface Release Notes 

The CUDA Profiler Tools Interface Release Notes.
 

Download


Thrust Quick Start Guide 

Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
 

Download


NVIDIA CUDA H.264 Video Encoder Library User Guide 

The NVIDIA CUDA H.264 Video Encoder is a library for performing CUDA accelerated video encoding. The functionality in the library takes raw YUV frames as input and generates NAL packets. This encoder supports up to various profiles up to High Profile @ Level 4.1.
 

Download


NVIDIA CUDA Video Decoder Library User Guide  

The CUDA Video Decoder API gives developers access to hardware video decoding capabilities on NVIDIA GPU. The actual hardware decode can run on either Video Processor (VP) or CUDA hardware, depending on the hardware capabilities and the codecs. This API supports the following video stream formats for Linux and Windows platforms: MPEG-2, VC-1, and H.264 (AVCHD).
 

Download


CUDA C SDK Release Notes 

CUDA C SDK Release Notes.
 

Download


DirectCompute SDK Release Notes 

DirectCompute SDK Release Notes.
 

Download


OpenCL SDK Release Notes 

OpenCL SDK Release Notes.
 

Download


CUDA Toolkit Software License Agreement 

Thrust is a C++ template library for CUDA based on the Standard Template Library (STL). Thrust allows you to implement high performance parallel applications with minimal programming effort through a high-level interface that is fully interoperable with CUDA C.
 

Download


GPU Computing SDK End User License Agreement 

This is the Software License Agreement for developers or licensees.
 

Download