--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA CUDA Software Development Kit (CUDA SDK)
Release Notes
Version 1.1 for Windows XP
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

Please, also refer to the release notes of version 1.1 of CUDA, installed by the
CUDA Toolkit installer.

--------------------------------------------------------------------------------
TABLE OF CONTENTS
--------------------------------------------------------------------------------
I.   Installation Instructions
II.  Creating Your Own CUDA Program
III. Known Issues
IV.  Frequently Asked Questions
V.   Change Log
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
I.   Installation Instructions
--------------------------------------------------------------------------------

0. Install the Windows XP display driver provided along with the CUDA SDK by
   running setup.exe. Please make sure to read the Driver Installation Hints
   Document before you install the driver:
   http://www.nvidia.com/object/driver_installation_hints.html
   This release of the CUDA SDK has only been tested with this display driver.
   Other driver versions are unsupported.

1. Uninstall any previous versions of the NVIDIA CUDA Toolkit and NVIDIA CUDA
   SDK.
   You can uninstall the NVIDIA CUDA Toolkit through the Start menu:
      Start menu->All Programs->NVIDIA Corporation->CUDA Toolkit->Uninstall CUDA
   You can uninstall the NVIDIA CUDA SDK through the Start menu:
      Start menu->All Programs->NVIDIA Corporation
                                    ->NVIDIA CUDA SDK->Uninstall NVIDIA CUDA SDK

2. Install version 1.1 of the NVIDIA CUDA Toolkit by running
   NVIDIA_CUDA_Toolkit_1.1.exe.

3. Install version 1.1 of the NVIDIA CUDA SDK by running
   NVIDIA_CUDA_SDK_1.1.exe.

4. Build the release, debug, emurelease, and/or emudebug configurations of the
   SDK project examples using the provided *.sln solution files for Microsoft
   Visual Studio Version 8 or *_vc7.sln solution files for Microsoft Visual
   Studio Version 7.
   You can:
      - either use the solution files located in each of the examples'
        directories in "NVIDIA CUDA SDK\projects",
      - or use the global solution files release.sln or release_vc7.sln located
        in "NVIDIA CUDA SDK\projects".
   
   Notes:
   
      - The simpleD3D example requires to have a Direct3D SDK installed and the
        VC++ directory paths (located in Tools->Options...) properly setup.
      
      - Most samples link to a utility library called "cutil" whose source code
        is in "NVIDIA CUDA SDK\common". The release and emurelease versions of
        these samples link to cutil32.lib and dynamically load cutil32.dll. The
        debug and emudebug versions of these samples link to cutil32D.lib and
        dynamically load cutil32D.dll.
        To build the release and/or debug configurations of the cutil library,
        use the solution files located in "NVIDIA CUDA SDK\common". The output of
        the compilation goes to "NVIDIA CUDA SDK\common\lib":
         - cutil32.lib and cutil32D.lib are the release and debug import
           libraries,
         - cutil32.dll and cutil32D.dll are the release and debug dynamic-link
           libraries, which get also copied to
           "NVIDIA CUDA SDK\common\bin\win32\[release|emurelease]" and
           "NVIDIA CUDA SDK\common\bin\win32\[debug|emudebug]" respectively;

5. Run the examples from the release, debug, emurelease, or emudebug directories 
   located in "NVIDIA CUDA SDK\bin\win32\[release|debug|emurelease|emudebug]".
   
   Notes:
   
    - The release and debug configurations require a GeForce 8 Series,
      a Quadro FX 4600, or a Quadro FX 5600 GPU to run properly.
      
    - The emurelease and emudebug configurations run in device emulation mode, 
      and therefore do not require a GeForce 8 Series, a Quadro FX 4600, or
      a Quadro FX 5600 GPU to run properly.

--------------------------------------------------------------------------------
II.  Creating Your Own CUDA Program
--------------------------------------------------------------------------------

Creating a new CUDA Program using the NVIDIA CUDA SDK infrastructure is easy.
We have provided a "template" project that you can copy and modify to suit your
needs. Just follow these steps:

1. Copy the content of "NVIDIA CUDA SDK\projects\template" to a directory of
   your own "NVIDIA CUDA SDK\projects\myproject"

2. Edit the filenames of the project to suit your needs.

3. Edit the *.sln, *.vcproj and source files. Just search and replace all
   occurences of "template" with "myproject".

4. Build the release, debug, emurelease, and/or emudebug configurations using
   myproject.sln or myproject_vc7.sln.

5. Run myproject.exe from the release, debug, emurelease, or emudebug
   directories located in
   "NVIDIA CUDA SDK\bin\win32\[release|debug|emurelease|emudebug]".

   (It should print "Test PASSED".)

6. Now modify the code to perform the computation you require. See the CUDA
   Programming Guide for details of programming in CUDA.

--------------------------------------------------------------------------------
III. Known Issues
--------------------------------------------------------------------------------

Note: Please see the CUDA Toolkit release notes for additional issues.
   
1. In code sample alignedTypes, the following aligned type does not provide
   maximum throughput because of a compiler bug:
       typedef struct __align__(16) { 
           unsigned int r, g, b; 
       } RGB32;
   The workaround is to use the following type instead:
       typedef struct __align__(16) { 
           unsigned int r, g, b, a; 
       } RGBA32;
   as illustrated in the sample.

--------------------------------------------------------------------------------
IV. Frequently Asked Questions
--------------------------------------------------------------------------------

The Official CUDA FAQ is available online on the NVIDIA CUDA Forums:
http://forums.nvidia.com/index.php?showtopic=36286

Note: Please also see the CUDA Toolkit release notes for additional Frequently 
Asked Questions.  

--------------------------------------------------------------------------------
V.  Change Log
--------------------------------------------------------------------------------

Release 1.1
* Updated to the 1.1 CUDA Toolkit
* Removed isInteropSupported() from cutil: graphics interoperability now works
  on multi-GPU systems
* MonteCarlo sample: Improved performance.  Previously it was very fast for
  large numbers of paths and options, now it is also very fast for 
  small- and medium-sized runs.
* Transpose sample: updated kernel to use a 2D shared memory array for clarity, 
  and optimized bank conflicts.
* 13 new code samples: 
  asyncAPI, cudaOpenMP, eigenvalues, fastWalshTransform, histogram256, Mandelbrot,
  MonteCarloMultiGPU, nbody, oceanFFT, particles, reduction, simpleAtomics, and
  simpleStreams

Release 1.0
* Updated to the 1.0 CUDA Toolkit.
* Added 4 new code samples: convolutionTexture, convolutionFFT2D,
  histogram64, and SobelFilter.
* All graphics interop samples now call the cutil library function 
  isInteropSupported(), which returns false on machines with multiple CUDA GPUs,
  currently (see above).
* When compiling in DEBUG mode, CU_SAFE_CALL() now calls cuCtxSynchronize() and
  CUDA_SAFE_CALL() and CUDA_CHECK_ERROR() now call cudaThreadSynchronize() in
  order to return meaningful errors. This means that performance might suffer in
  DEBUG mode.

Release 0.9
* Updated to the 0.9 CUDA Toolkit.
* Added 6 new code samples: MersenneTwister, MonteCarlo, imageDenoising, 
  simpleTemplates, deviceQuery, alignedTypes, and convolutionSeparable.
* Removed 3 old code samples:
  - vectorLoads and loadUByte replaced by alignedTypes;
  - convolution replaced by convolutionSeparable.

Release 0.8.1 beta
* Standardized project and file naming conventions. Several project names 
  changed as a result.
* cppIntegration output now matches the other samples ("Test PASSED").
* Modified transpose16 sample to transpose arbitrary matrices efficiently, and
  renamed it to transpose.
* Added 11 new code samples: bandwidthTest, binomialOptions, BlackScholes, 
  boxFilter, convolution, dxtc, fluidsGL, multiGPU, postProcessGL, 
  simpleTextureDrv, and vectorLoads.

Release 0.8 beta
* First public release.