--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA CUDA Software Development Kit (CUDA SDK)
Release Notes
Version 2.1 for 32-bit or 64-bit Windows Vista or XP
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

Please, also refer to the release notes of version 2.1 of CUDA, installed by the
CUDA Toolkit installer.

--------------------------------------------------------------------------------
TABLE OF CONTENTS
--------------------------------------------------------------------------------
I. Installation Instructions
II. Creating Your Own CUDA Program
III. Known Issues
IV. Frequently Asked Questions
V. Change Log
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
I. Installation Instructions
--------------------------------------------------------------------------------

0. CUDA 2.1 requires at least version 181.20 of the Windows Vista or Windows XP
NVIDIA Display Driver. See the NVIDIA CUDA Toolkit 2.1 release notes for more
information.

Please make sure to read the Driver Installation Hints Document before you
install the driver:
http://www.nvidia.com/object/driver_installation_hints.html

1. Uninstall any previous versions of the NVIDIA CUDA Toolkit and NVIDIA CUDA
SDK.
You can uninstall the NVIDIA CUDA Toolkit through the Start menu:
Start menu->All Programs->NVIDIA Corporation->CUDA Toolkit->Uninstall CUDA
You can uninstall the NVIDIA CUDA SDK through the Start menu:
Start menu->All Programs->NVIDIA Corporation
->NVIDIA CUDA SDK->Uninstall NVIDIA CUDA SDK

2. Install version 2.1 of the NVIDIA CUDA Toolkit by running
NVIDIA_CUDA_Toolkit_2.1_[Win|Vista][32|64].exe corresponding to your operating
system.

3. Install version 2.1 of the NVIDIA CUDA SDK by running
NVIDIA_CUDA_SDK_2.1_[Win|Vista][32|64].exe corresponding to your operating
system.

4. Build the 32-bit and/or 64-bit, release, debug, emurelease, and/or emudebug
configurations of the SDK project examples using the provided *.sln solution
files for Microsoft Visual Studio 2005 or *_vc90.sln solution files for
Microsoft Visual Studio 2008
You can:
- either use the solution files located in each of the examples'
directories in "NVIDIA CUDA SDK\projects",
- or use the global solution files release.sln or release_vc90.sln located
in "NVIDIA CUDA SDK\projects".

Notes:

- The simpleD3D example requires to have a Direct3D SDK installed and the
VC++ directory paths (located in Tools->Options...) properly setup.

- Most samples link to a utility library called "cutil" whose source code
is in "NVIDIA CUDA SDK\common". The release and emurelease versions of
these samples link to cutil[32|64].lib and dynamically load
cutil[32|64].dll. The debug and emudebug versions of these samples link
to cutil[32D|64D].lib and dynamically load cutil[32D|64D].dll.
To build the 32-bit and/or 64-bit, release and/or debug configurations
of the cutil library, use the solution files located in
"NVIDIA CUDA SDK\common". The output of the compilation goes to
"NVIDIA CUDA SDK\common\lib":
- cutil[32|64].lib and cutil[32D|64D].lib are the release and debug
import libraries,
- cutil[32|64].dll and cutil[32D|64D].dll are the release and debug
dynamic-link libraries, which get also copied to
"NVIDIA CUDA SDK\bin\win[32|64]\[release|emurelease]" and
"NVIDIA CUDA SDK\bin\win[32|64]\[debug|emudebug]"
respectively;

5. Run the examples from the release, debug, emurelease, or emudebug directories
located in "NVIDIA CUDA SDK\bin\win[32|64]\[release|debug|emurelease|emudebug]".

Notes:

- The release and debug configurations require a CUDA-capable GPU to run
properly (see Appendix A.1 of the CUDA Programming Guide for a complete
list of CUDA-capable GPUs).

- The emurelease and emudebug configurations run in device emulation mode,
and therefore do not require a CUDA-capable GPU to run properly.

--------------------------------------------------------------------------------
II. Creating Your Own CUDA Program
--------------------------------------------------------------------------------

Creating a new CUDA Program using the NVIDIA CUDA SDK infrastructure is easy.
We have provided a "template" project that you can copy and modify to suit your
needs. Just follow these steps:

1. Copy the content of "NVIDIA CUDA SDK\projects\template" to a directory of
your own "NVIDIA CUDA SDK\projects\myproject"

2. Edit the filenames of the project to suit your needs.

3. Edit the *.sln, *.vcproj and source files. Just search and replace all
occurences of "template" with "myproject".

4. Build the 32-bit and/or 64-bit, release, debug, emurelease, and/or emudebug
configurations using myproject.sln or myproject_vc90.sln.

5. Run myproject.exe from the release, debug, emurelease, or emudebug
directories located in
"NVIDIA CUDA SDK\bin\win[32|64]\[release|debug|emurelease|emudebug]".

(It should print "Test PASSED".)

6. Now modify the code to perform the computation you require. See the CUDA
Programming Guide for details of programming in CUDA.

--------------------------------------------------------------------------------
III. Known Issues
--------------------------------------------------------------------------------

Note: Please see the CUDA Toolkit release notes for additional issues.

1. In code sample alignedTypes, the following aligned type does not provide
maximum throughput because of a compiler bug:
typedef struct __align__(16) {
unsigned int r, g, b;
} RGB32;
The workaround is to use the following type instead:
typedef struct __align__(16) {
unsigned int r, g, b, a;
} RGBA32;
as illustrated in the sample.

2. Installing the CUDA SDK under Vista with UAC (User Access Control) enabled will not
fail if the path is "Program Files\NVIDIA Corporation\NVIDIA CUDA SDK".
By default, UAC is enabled for Vista. This prevents users from installing the SDK
in the folder "Program Files". If UAC is disabled, the user is free to install the
SDK in other folders.

Before the CUDA 2.1 Beta, the SDK installations path would be under:
"Program Files\NVIDIA Corporation\NVIDIA CUDA SDK".

Starting with CUDA 2.1 Beta, the new default installation folder is:
"Application Data\NVIDIA Corporation\NVIDIA CUDA SDK" residing under "All Users" or "Current".

--------------------------------------------------------------------------------
IV. Frequently Asked Questions
--------------------------------------------------------------------------------

The Official CUDA FAQ is available online on the NVIDIA CUDA Forums:
http://forums.nvidia.com/index.php?showtopic=36286

Note: Please also see the CUDA Toolkit release notes for additional Frequently
Asked Questions.

--------------------------------------------------------------------------------
V. Change Log
--------------------------------------------------------------------------------

Release 2.1
* CUDA samples that use OpenGL interop now call cudaGLSetGLDevice after the GL context is created.
This ensures that OpenGL/CUDA interop gets the best possible performance possible.
* Projects that depend on paramGL now build the paramGL source files instead of
statically linking with paramGL*.lib.
* Bug fixes

Release 2.1 Beta
* Now supports Visual Studio 2008 projects, all samples also include VS2008
* Removed Visual Studio 2003.NET projects
* Added Visual Studio CUDA.rules to support *.cu files. Most projects now use this
rule with VS2005 and VS2008 projects.
* Added CUDA smokeParticles (volumetric particle shadows samples)
* Note: added cutil_inline.h for CUDA functions as an alternative to using the
cutil.h macro definitions
* Default CUDA SDK installation folder is under "All Users" or "Current User" in a sub-folder
"Application Data\NVIDIA Corporation\NVIDIA CUDA SDK". See section "III. Known issues" for
more details.

Release 2.0 Beta2
* 2 new code samples:
cudaVideoDecode and simpleVoteIntrinsics

Release 2.0 Beta
* Updated to the 2.0 CUDA Toolkit
* CUT_DEVICE_INIT macro modified to take command line arguments. All samples now
support specifying the CUDA device to run on from the command line (“-device=n”).
* deviceQuery sample: Updated to query number of multiprocessors and overlap
flag.
* fluidsD3D sample: Renamed to fluidsD3D9 and updated to the new Direct3D
interoperability API.
* multiGPU sample: Renamed to simpleMultiGPU.
* reduction, MonteCarlo, and binomialOptions samples: updated with optional
double precision support for upcoming hardware.
* simpleAtomics sample: Renamed to simpleAtomicIntrinsics.
* simpleD3D sample: Renamed to simpleD3D9 and updated to the new Direct3D
interoperability API.
* 7 new code samples:
dct8x8, quasirandomGenerator, recursiveGaussian, simpleD3D9Texture,
simpleTexture3D, threadMigration, and volumeRender

Release 1.1
* Updated to the 1.1 CUDA Toolkit
* Removed isInteropSupported() from cutil: graphics interoperability now works
on multi-GPU systems
* MonteCarlo sample: Improved performance. Previously it was very fast for
large numbers of paths and options, now it is also very fast for
small- and medium-sized runs.
* Transpose sample: updated kernel to use a 2D shared memory array for clarity,
and optimized bank conflicts.
* 15 new code samples:
asyncAPI, cudaOpenMP, eigenvalues, fastWalshTransform, histogram256,
lineOfSight, Mandelbrot, marchingCubes, MonteCarloMultiGPU, nbody, oceanFFT,
particles, reduction, simpleAtomics, and simpleStreams

Release 1.0
* Updated to the 1.0 CUDA Toolkit.
* Added 4 new code samples: convolutionTexture, convolutionFFT2D,
histogram64, and SobelFilter.
* All graphics interop samples now call the cutil library function
isInteropSupported(), which returns false on machines with multiple CUDA GPUs,
currently (see above).
* When compiling in DEBUG mode, CU_SAFE_CALL() now calls cuCtxSynchronize() and
CUDA_SAFE_CALL() and CUDA_CHECK_ERROR() now call cudaThreadSynchronize() in
order to return meaningful errors. This means that performance might suffer in
DEBUG mode.

Release 0.9
* Updated to the 0.9 CUDA Toolkit.
* Added 6 new code samples: MersenneTwister, MonteCarlo, imageDenoising,
simpleTemplates, deviceQuery, alignedTypes, and convolutionSeparable.
* Removed 3 old code samples:
- vectorLoads and loadUByte replaced by alignedTypes;
- convolution replaced by convolutionSeparable.

Release 0.8.1 beta
* Standardized project and file naming conventions. Several project names
changed as a result.
* cppIntegration output now matches the other samples ("Test PASSED").
* Modified transpose16 sample to transpose arbitrary matrices efficiently, and
renamed it to transpose.
* Added 11 new code samples: bandwidthTest, binomialOptions, BlackScholes,
boxFilter, convolution, dxtc, fluidsGL, multiGPU, postProcessGL,
simpleTextureDrv, and vectorLoads.

Release 0.8 beta
* First public release.