--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
CUDA SDK Release Notes Version 2.3
NVIDIA GPU Computing Software Development Kit
R190 Driver Update Release

  Windows Vista, XP, and Win7 (32-bit and 64-bit)
  Linux OS (32-bit and 64-bit)
  Mac OSX  (10.5.x Leopard 32-bit, 10.6 SnowLeopard 32-bit)

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

Please, also refer to the release notes of version 2.3 of the CUDA Toolkit, 
installed by the CUDA Toolkit installer.

--------------------------------------------------------------------------------
TABLE OF CONTENTS
--------------------------------------------------------------------------------
I.  (a) Windows Installation Instructions
I.  (b) Linux Installation Instructions
I.  (c) Mac OSX Installation Instructions

II. (a) Creating Your Own CUDA Projects for Windows 
II. (b) Creating Your Own CUDA Projects for Linux
II. (c) Creating Your Own CUDA Projects for Mac OSX

III.(a) Known Issues on CUDA for Windows
III.(b) Known Issues on CUDA for Linux
III.(c) Known Issues on CUDA for Mac OSX

IV.  Frequently Asked Questions
V.   Change Log
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
I. (a)   Windows Installation Instructions
--------------------------------------------------------------------------------
	0. CUDA 2.3 Toolkit requires at least version 190.38 of the Windows Vista or 
	   Windows XP NVIDIA Display Driver. See the NVIDIA CUDA Toolkit 2.3 release 
	   notes for more detailed  information.

	   Please make sure to read the Driver Installation Hints Document before you 
	   install the driver:
	   http://www.nvidia.com/object/driver_installation_hints.html

	1. Uninstall any previous versions of the NVIDIA CUDA Toolkit and NVIDIA GPU Computing SDK.
	   You can uninstall the NVIDIA CUDA Toolkit through the Start menu:
		  Start menu->All Programs->NVIDIA Corporation->CUDA Toolkit->Uninstall CUDA
	      
	   You can uninstall the NVIDIA GPU Computing SDK through the Start menu:
		  Start menu->All Programs->NVIDIA Corporation
					  ->NVIDIA GPU Computing SDK->Uninstall NVIDIA GPU Computing SDK

	2. Install version 2.3 of the NVIDIA CUDA Toolkit by running
	   cudatoolkit_2.3_Win_[32|64].exe corresponding to your operating
	   system.

	3. Install version 2.3 of the NVIDIA GPU Computing SDK by running
	   gpucomputingsdk_2.3_Win_[32|64].exe corresponding to your operating
	   system.

	4. Build the 32-bit and/or 64-bit, release, debug, emurelease, and/or emudebug
	   configurations of the SDK project examples using the provided *.sln solution
	   files for Microsoft Visual Studio 2005 or *_vc90.sln solution files for
	   Microsoft Visual Studio 2008
	   You can:
		  - either use the solution files located in each of the examples'
			directories in "NVIDIA GPU Computing SDK\C\src",
		  - or use the global solution files release.sln or release_vc90.sln located
			in "NVIDIA GPU Computing SDK\C\src".
	   
	   Notes:
	   
		  - The simpleD3D9 example requires to have a Direct3D SDK installed and the
			VC++ directory paths (located in Tools->Options...) properly setup.
	      
		  - Most samples link to a utility library called "cutil" whose source code
			is in "NVIDIA GPU Computing SDK\C\common". The release and emurelease versions of
			these samples link to cutil[32|64].lib and dynamically load
		cutil[32|64].dll. The debug and emudebug versions of these samples link
		to cutil[32D|64D].lib and dynamically load cutil[32D|64D].dll.
			To build the 32-bit and/or 64-bit, release and/or debug configurations
		of the cutil library, use the solution files located in
			"NVIDIA GPU Computing SDK\C\common". The output of the compilation goes to
			"NVIDIA GPU Computing SDK\C\common\lib":
			 - cutil[32|64].lib and cutil[32D|64D].lib are the release and debug
		   import libraries,
			 - cutil[32|64].dll and cutil[32D|64D].dll are the release and debug
			   dynamic-link libraries, which get also copied to
			   "NVIDIA GPU Computing SDK\C\bin\win[32|64]\[release|emurelease]" and
			   "NVIDIA GPU Computing SDK\C\bin\win[32|64]\[debug|emudebug]"
		   respectively;

	5. Run the examples from the release, debug, emurelease, or emudebug directories 
	   located in "NVIDIA GPU Computing SDK\C\bin\win[32|64]\[release|debug|emurelease|emudebug]".
	   
	   Notes:
	   
		- The release and debug configurations require a CUDA-capable GPU to run
		  properly (see Appendix A.1 of the CUDA Programming Guide for a complete
		  list of CUDA-capable GPUs).
	      
		- The emurelease and emudebug configurations run in device emulation mode, 
		  and therefore do not require a CUDA-capable GPU to run properly.

--------------------------------------------------------------------------------
I. (b)  Linux Installation Instructions
--------------------------------------------------------------------------------
[Part 1/2]

	Note: The default installation folder <SDK_INSTALL_PATH> is "~/NVIDIA_GPU_Computing_SDK"

	For more detailed instructions, see section II below.

	0. Install the NVIDIA Linux display driver by executing the file

	   a. For 32-bit linux distributions use:
		  cudadriver_2.3_linux_32_190.18.run

	   b. For 64-bit linux distributions use:
		  cudadriver_2.3_linux_64_190.18.run
	  
	   For information on installing NVIDIA Linux display drivers, please refer to 
	   the NVIDIA Accelerated Linux Driver Set README and Installation Guide:
	   http://us.download.nvidia.com/XFree86/Linux-x86/1.0-9755/README/index.html

	1. Install version 2.3 of the NVIDIA Toolkit by executing the file
	   cudatoolkit_2.3_linux_*.run where * corresponds to your Linux distribution

	   Add the CUDA binaries and lib path to your PATH and LD_LIBRARY_PATH 
	   environment variables.

	2. Install version 2.3 of the NVIDIA GPU Computing SDK by executing the file 
	   gpucomputingsdk_2.3_linux.run

	   The installer will prompt you to enter an installation path for the SDK or
	   accept the default.  We will refer to the path you choose as 
	   SDK_INSTALL_PATH.
	   
	3. Build the SDK project examples.  

		cd <SDK_INSTALL_PATH>/C
		make

	   Note Adding the following in make will build for specific targets
	 
			make x86_64=1     (for 64-bit targets)
			make i386=1       (for 32-bit targets)
	    
	4. Run the examples (32-bit or 64-bit Linux)
	    
		cd <SDK_INSTALL_PATH>/C/bin/linux/release
			matrixmul

	   (or any of the other executables in that directory)

	See the next section for more details on installing, building, and running
	SDK samples.

[Part 2/2]
	Note: The default installation folder <SDK_INSTALL_PATH> is "~/NVIDIA_GPU_Computing_SDK"

	This package consists of a ".run" file. This is a self-extracting archive that
	decompresses its contents to a temporary folder and then installs the contents 
	to a path that you specify.  The archive is:

	gpucomputingsdk_2.3-linux.run: NVIDIA GPU Computing SDK Installer 

	In addition, a NVIDIA Linux Display driver is needed to run CUDA code on an 
	NVIDIA GPU.  CUDA 2.3 requires version 190 or newer version of the linux 
	NVIDIA Display Driver.  Please see the NVIDIA CUDA Toolkit 2.3 release notes 
	for more details.

	   For information on installing NVIDIA Linux display drivers, please refer to 
	   the NVIDIA Accelerated Linux Driver Set README and Installation Guide:
	   http://us.download.nvidia.com/XFree86/Linux-x86/1.0-9755/README/index.html

	1. Install version 2.3 of the NVIDIA CUDA Toolkit by executing the file
		 cudatoolkit_2.3_linux_*.run where * corresponds to your Linux distribution

	   To install, run the cudatoolkit_2.3_linux_*.run script.  You will be prompted
	   for the path to where you want to put the CUDA files. In the following we will 
	   call this path <CUDA_INSTALL_PATH>. It is recommended that you run the 
	   installer as root and use the default install path (/usr/local). 

	   Make sure that you add the location of the CUDA binaries (such as nvcc) to 
	   your PATH environment variable and the location of the CUDA libraries
	   (such as libcuda.so) to your LD_LIBRARY_PATH environment variable.

	   In the bash shell, one way to do this is to add the following lines to the 
	   file ~/.bash_profile from your home directory.

	   a. For 32-bit operating systems use the following paths 
		PATH=$PATH:<CUDA_INSTALL_PATH>/bin
		LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<CUDA_INSTALL_PATH>/lib
	    
	   b. For 64-bit operating systems use the following paths
		PATH=$PATH:<CUDA_INSTALL_PATH>/bin
		LD_LIBRARY_PATH=$LD_LIBRARY_PATH:<CUDA_INSTALL_PATH>/lib64

	   Then to export the environment variables add this to the profile configuration
		export PATH
		export LD_LIBRARY_PATH
	 
	2. Install the NVIDIA GPU Computing SDK by executing the file gpucomputingsdk_2.3_linux.run

	   To install, run the gpucomputingsdk_2.3_linux.run script.  You will 
	   be prompted for the path to where you want to put the CUDA SDK.  You can 
	   regard the CUDA SDK as user code (it is a set of examples), and therefore
	   the default installation is in the current user's home directory 
	   (~/NVIDIA_GPU_Computing_SDK). You must either accept the default or specify a path
	   to which the user has write permissions. 

	   We will refer to the path you choose as SDK_INSTALL_PATH below.
	   
	3. Build the SDK project examples.  
		a. cd <SDK_INSTALL_PATH>/C
		b. Build:
			- release    configuration by typing "make".
			- debug      configuration by typing "make dbg=1".
			- emurelease configuration by typing "make emu=1".
			- emudebug   configuration by typing "make emu=1 dbg=1".
			- x86_64=1   configuration by typing "make x86_64=1"
			- i386=1     configuration by typing "make i386=1"

		Running make at the top level first builds libcutil, a utility library used
		by the SDK examples (libcutil is simply for convenience -- it is not a part
		of CUDA and is not required for your own CUDA programs).  Make then builds
		each of the projects in the SDK.  

		NOTES:
		- The release and debug configurations require a CUDA-capable GPU to run
		  properly (see Appendix A.1 of the CUDA Programming Guide for a complete
		  list of CUDA-capable GPUs).
		- The emurelease and emudebug configurations run in device emulation mode, 
		  and therefore do not require a CUDA-capable GPU to run properly.
		- You can build an individual sample by typing "make" 
		  (or "make emu=1", etc.) in that sample's project directory. For example:

			cd <SDK_INSTALL_PATH>/C/src/matrixMul
			make emu=1

		  And then execute the sample with:
			<SDK_INSTALL_PATH>/C/bin/linux/emurelease/matrixMul

		- To build just libcutil, type "make" (or "make dbg=1") in the "common" 
		  subdirectory:

			cd <SDK_INSTALL_PATH>/C/common
			make

	4. Run the examples from the release, debug, emurelease, or emudebug 
	   directories located in 

			<SDK_INSTALL_PATH>/C/bin/linux/[release|debug|emurelease|emudebug].

--------------------------------------------------------------------------------
I. (c) Mac OSX Installation Instructions
--------------------------------------------------------------------------------
[Part 1/2]

	For more detailed instructions, see part 2 of this section below.

	Note: The default installation folder <SDK_INSTALL_PATH> is "/Developer/GPU Computing"

        Note: For SnowLeopard, you may need to boot as 32-bit kernel.  During Power-On, hit keys
              '3' and '2' and the OS will startup in a 32-bit kernel mdoe.

        Please install the packages in this order.

	0. Install version 2.3 of the NVIDIA Toolkit package by executing the file
		  cudatoolkit_2.3_macos_32.pkg

	   This package will work MAC OSX in 32-bit mode for
           (10.5.x Leopard and 10.6 SnowLeopard)

	1. A. Install the NVIDIA Driver Package (Mac OSX Leopard)

	      i.  For NVIDIA GeForce GPUs, install this package:
                  cudadriver_2.3.0_macos.pkg

	      ii. For NVIDIA Quadro GPUs and GeForce GTX 285, install this package:
                  cudadriver_2.3.1_macos.pkg

	      iii. For NVIDIA GPUs running SNow Leopard, install this package:
                   cudadriver_2.3.1_macos.pkg

	2. Install version 2.3 of the NVIDIA GPU Computing SDK by executing the file 
	   gpucomputingsdk_2.3_macos.pkg

	3. Build the SDK project examples

		cd <SDK_INSTALL_PATH>
		make

	   Note: Adding the following in make will build for specific target.  
	    
	4. Run the examples:
	    
		cd <SDK_INSTALL_PATH>/C/bin/darwin/release
			./matrixMul

	   (or any of the other executables in that directory)

[Part 2/2]

	See the next section for more details on installing, building, and running
	SDK samples.

	Note: The default installation folder <SDK_INSTALL_PATH> is "/Developer/GPU Computing"

	1. Build the SDK project examples.  
		a. Go to <SDK_INSTALL_PATH> ("cd <SDK_INSTALL_PATH>")
		b. Build:
			- release    configuration by typing "make".
			- debug      configuration by typing "make dbg=1".
			- emurelease configuration by typing "make emu=1".
			- emudebug   configuration by typing "make emu=1 dbg=1".
			- x86_64=1   configuration by typing "make x86_64=1"
			- i386=1     configuration by typing "make i386=1"

                Note: x86_64 is not currently working for Leopoard or SnowLeopard

		Running make at the top level first builds libcutil, a utility library used
		by the SDK examples (libcutil is simply for convenience -- it is not a part
		of CUDA and is not required for your own CUDA programs).  Make then builds
		each of the projects in the SDK.  

		NOTES:
		- The release and debug configurations require a CUDA-capable GPU to run
		  properly (see Appendix A.1 of the CUDA Programming Guide for a complete
		  list of CUDA-capable GPUs).
		- The emurelease and emudebug configurations run in device emulation mode, 
		  and therefore do not require a CUDA-capable GPU to run properly.
		- You can build an individual sample by typing "make" 
		  (or "make emu=1", etc.) in that sample's project directory. For example:

			cd <SDK_INSTALL_PATH>/C/src/matrixMul
			make emu=1

		  And then execute the sample with:
			<SDK_INSTALL_PATH>/C/bin/darwin/emurelease/matrixmul

		- To build just libcutil, type "make" (or "make dbg=1") in the "common" 
		  subdirectory:

			cd <SDK_INSTALL_PATH>/C/common
			make

	4. Run the examples from the release, debug, emurelease, or emudebug 
	   directories located in
	   
		 <SDK_INSTALL_PATH>/C/bin/darwin/[release|debug|emurelease|emudebug].


--------------------------------------------------------------------------------
II. (a) Creating Your Own CUDA Projects for Windows
--------------------------------------------------------------------------------

	Creating a new CUDA Program using the NVIDIA GPU Computing SDK infrastructure is easy.
	We have provided a "template" project that you can copy and modify to suit your
	needs. Just follow these steps:

	1. Copy the content of "NVIDIA GPU Computing SDK\C\src\template" to a directory of
	   your own "NVIDIA GPU Computing SDK\C\src\myproject"

	2. Edit the filenames of the project to suit your needs.

	3. Edit the *.sln, *.vcproj and source files. Just search and replace all
	   occurences of "template" with "myproject".

	4. Build the 32-bit and/or 64-bit, release, debug, emurelease, and/or emudebug
	   configurations using myproject.sln or myproject_vc90.sln.

	5. Run myproject.exe from the release, debug, emurelease, or emudebug
	   directories located in
	   "NVIDIA GPU Computing SDK\C\bin\win[32|64]\[release|debug|emurelease|emudebug]".

	   (It should print "Test PASSED".)

	6. Now modify the code to perform the computation you require. See the CUDA
	   Programming Guide for details of programming in CUDA.

--------------------------------------------------------------------------------
II. (b) Creating Your Own CUDA Projects for Linux
--------------------------------------------------------------------------------

	Note: The default installation folder <SDK_INSTALL_PATH> is "~/NVIDIA_GPU_Computing_SDK"

	Creating a new CUDA Program using the NVIDIA GPU Computing SDK infrastructure is easy.
	We have provided a "template" project that you can copy and modify to suit your
	needs. Just follow these steps:

	1. Copy the template project

			cd <SDK_INSTALL_PATH>/C/src
			cp -r template <myproject>

	2. Edit the filenames of the project to suit your needs

			mv template.cu myproject.cu
			mv template_kernel.cu myproject_kernel.cu
			mv template_gold.cpp myproject_gold.cpp

	3. Edit the Makefile and source files.  Just search and replace all occurences 
	   of "template" with "myproject".

	4. Build the project

			make

	   You can build a debug version with "make dbg=1", an emulation version with 
	   "make emu=1", and a debug emulation with "make dbg=1 emu=1".

	5. Run the program

			../../C/bin/linux/release/myproject

	   (It should print "Test PASSED")

	6. Now modify the code to perform the computation you require.  See the
	   CUDA Programming Guide for details of programming in CUDA.

--------------------------------------------------------------------------------
II. (c) Creating Your Own CUDA Projects for Mac OSX
--------------------------------------------------------------------------------

	Note: The default installation folder <SDK_INSTALL_PATH> is "/Developer/GPU Computing"

	Creating a new CUDA Program using the NVIDIA GPU Computing SDK infrastructure is easy.
	We have provided a "template" project that you can copy and modify to suit your
	needs. Just follow these steps:

	1. Copy the template project

			cd <SDK_INSTALL_PATH>/C/src
			cp -r template <myproject>

	2. Edit the filenames of the project to suit your needs

			mv template.cu myproject.cu
			mv template_kernel.cu myproject_kernel.cu
			mv template_gold.cpp myproject_gold.cpp

	3. Edit the Makefile and source files.  Just search and replace all occurences 
	   of "template" with "myproject".

	4. Build the project

			make

	   You can build a debug version with "make dbg=1", an emulation version with 
	   "make emu=1", and a debug emulation with "make dbg=1 emu=1".

	5. Run the program

			../../C/bin/darwin/release/myproject

	   (It should print "Test PASSED")

	6. Now modify the code to perform the computation you require.  See the
	   CUDA Programming Guide for details of programming in CUDA.

--------------------------------------------------------------------------------
III. (a) Known Issues on CUDA SDK for Windows
--------------------------------------------------------------------------------

	Note: Please see the CUDA Toolkit release notes for additional issues.
	   
	1. In code sample alignedTypes, the following aligned type does not provide
	   maximum throughput because of a compiler bug:
		   typedef struct __align__(16) { 
			   unsigned int r, g, b; 
		   } RGB32;
	   The workaround is to use the following type instead:
		   typedef struct __align__(16) { 
			   unsigned int r, g, b, a; 
		   } RGBA32;
	   as illustrated in the sample.

	2. Installing the "NVIDIA GPU Computing SDK" under Vista with UAC (User Access Control) enabled will not
	   fail if the path is "Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK".  
	   By default, UAC is enabled for Vista.  This prevents users from installing the SDK
	   in the folder "Program Files".  If UAC is disabled, the user is free to install the 
	   SDK in other folders.

	   Before the CUDA 2.1, the SDK installations path would be under:
	   "Program Files\NVIDIA Corporation\NVIDIA CUDA SDK".  

	   Starting with CUDA 2.1, the new default installation folder is:
	   "Application Data\NVIDIA Corporation\NVIDIA CUDA SDK" residing under "All Users" or "Current".  

	   For NVIDIA GPU Computing 2.3, the SDK installations path would be under:
	   "Program Files\NVIDIA Corporation\NVIDIA GPU Computing SDK".  

	   Starting with NVIDIA GPU Computing 2.3, the new default installation folder is:
	   "Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK" residing under "All Users" or "Current".  

--------------------------------------------------------------------------------
III. (b) Known Issues on CUDA SDK for Linux
--------------------------------------------------------------------------------

	Note: Please see the CUDA Toolkit release notes for additional issues.

	1. "ld: cannot find -lglut".  On some linux installations (notably default RHEL 
	   4 update 3 installations), building the simpleGL example (and other examples 
	   that use OpenGL) can result in a linking error like the following.

			/usr/bin/ld: cannot find -lglut

	   Typically this is because the SDK makefiles look for libglut.so and not for
	   variants of it (like libglut.so.3). To confirm this is the problem, simply 
	   run the following command.

			ls /usr/lib | grep glut

	   You should see the following (or similar) output.

		lrwxrwxrwx  1 root root     16 Jan  9 14:06 libglut.so.3 -> libglut.so.3.8.0
		-rwxr-xr-x  1 root root 164584 Aug 14  2004 libglut.so.3.8.0

	   If you have libglut.so.3 in /usr/lib, simply run the following command 
	   as root.

			ln -s /usr/lib/libglut.so.3 /usr/lib/libglut.so

	   If you do NOT have libglut.so.3 then you can check whether the glut package
	   is installed on your RHEL system with the following command.

			rpm -qa | grep glut

	   You should see "freeglut-2.2.2-14" or similar in the output.  If not, you 
	   or your system administrator should install the package "freeglut-2.2.2-14".
	   Refer to the Red Hat and/or rpm documentation for instructions.

	   If you have libglut.so.3 but you do not have write access to /usr/lib, you 
	   can also fix the problem by creating the soft link in a directory to which 
	   you have write permissions and then add that directory to the libarary 
	   search path (-L) in the Makefile.

	2. In code sample alignedTypes, the following aligned type does not provide
	   maximum throughput because of a compiler bug:
		   typedef struct __align__(16) { 
			   unsigned int r, g, b; 
		   } RGB32;
	   The workaround is to use the following type instead:
		   typedef struct __align__(16) { 
			   unsigned int r, g, b, a; 
		   } RGBA32;
	   as illustrated in the sample.

	3. Some Linux distributions may not include the GLU library. You can download this
	   from this search engine, matching the correct Linux distribution.

	   http://fr.rpmfind.net/linux/rpm2html/search.php?query=libGLU.so.1&submit=Search+...

	4. (SLED11) SUSE Linux Enterprise Edition 11 is missing:
	   "libGLU", "libX11" "libXi", "libXm" 

	   This particular version of SUSE Linux 11 does not have the proper symbolic links for the following libraries:

	   a. libGLU
		  ls /usr/lib | grep GLU

		  libGLU.so.1
		  libGLU.so.1.3.0370300

		  To create the proper symbolic links:

		  ln -s /usr/lib/libGLU.so  /usr/lib/libGLU.so.1

	   b. libX11
		  ls /usr/lib | grep X11


		  libX11.so.6
		  libX11.so.6.2.0

		  To create the proper symbolic links:

		  ln -s /usr/lib/libX11.so.6
	  /usr/lib/libX11.so 


	   c. libXi
		  ls /usr/lib | grep Xi

		  libXi.so.6
		  libXi.so.6.0.0

		  To create the proper symbolic links:

		  ln -s /usr/lib/libXi.so.6  /usr/lib/libXi.so 

	   d. libXm
		  ls /usr/lib | grep Xm

		  libXm.so.6
		  libXm.so.6.0.0

		  To create the proper symbolic links:

		  ln -s /usr/lib/libXm.so.6  /usr/lib/libXm.so 

--------------------------------------------------------------------------------
III. (c) Known Issues on CUDA SDK for Mac OSX
--------------------------------------------------------------------------------

  Note: In addition, please look at the CUDA Toolkit release notes for additional issues.

  - Note on CUDA Mac 10.5.x (Leopard), CUDA applications can run in 32-bit.  Support for
    64-bit is not working, this will be addressed in a future update.

    This is a known issue and NVIDIA is working to resolve this issue soon.

--------------------------------------------------------------------------------
IV. Frequently Asked Questions
--------------------------------------------------------------------------------

The Official CUDA FAQ is available online on the NVIDIA CUDA Forums:
http://forums.nvidia.com/index.php?showtopic=84440

Note: Please also see the CUDA Toolkit release notes for additional Frequently 
Asked Questions.  

--------------------------------------------------------------------------------
V.  Change Log
--------------------------------------------------------------------------------

Note: Unless noted specifically noted, the Release Logs are for all OS's.

Release 2.3 (R190 Release Update)
* New SDK Samples
  - vectorAdd, vectorAddDrv - Two samples (Runtime and Driver API) which 
    implements element by element vector addition.  These simple samples do
    not use the CUTIL library.

[Linux SDK]
* Support for Cross Compiliation
  - Adding the following during Make will build for specific targets
     x86_64=1     (for 64-bit targets)
     i386=1       (for 32-bit targets)
  - Note on Linux, the 190.18 or newer driver installs the necessary libcuda.so for both 32-bit and 64-bit targets.

Release 2.3
* New Samples
  - 3DFD - 3D Finite Differentce sample demonstrates 3DFD stencil coputation on a regular grid.
  - matrixMulDynlinkJIT - Matrix Multiply sample that uses PTXJIT (inlined) and also supports dynamic linking to nvcuda.dll
  - bitonicSort has been renamed to sortingNetworks
  - histogram64 and histogram256 have been combined to a single project histogram

Release 2.3 Beta
* Added PTXJIT
  - New SDK sample that illustrates how to use cuModuleLoadDataEx 
  - Loads a PTX source file from memory instead of file.
* Windows SDK only
  - Changing the name NVIDIA CUDA SDK -> NVIDIA GPU Computing SDK
  - Added cudaDecodeD3D9 and cudaDecodeGL
    - NVCUVID now support for OpenGL interop, these two new SDK samples 
	  illustrates how to use NVCUVID to decode MPEG-2, VC-1, or H.264 content.
    - The samples show how to decode video and pass to D3D9 or OpenGL, 
	  however these samples do not display the decoded video frames.
  
Release 2.2.1
[Windows SDK]
	* Updated Cuda.Rules files (no long uses -m32) to generate CUBIN/PTX output,
	* Cuda.rules option to generate PTX and to inline CUDA source with PTX 
	  generated assembly
[Mac and Linux SDK]
	* Updated common.mk file to removed -m32 when generating CUBIN output
	* Support for PTX output has been added to common.mk
	  
[All SDK Packages]
	* CUDA Driver API samples: simpleTextureDrv, matrixMulDrv, and threadMigration
	  have been updated to reflect changes:
		- Previously when compiling these CUDA SDK samples, gcc would generate a 
		  compilation error when building on a 64-bit Linux OS if the 32-bit glibc 
		  compatibility libraries were not previously installed.  This SDK release
		  addresses this problem.  The CUDA Driver API samples have been modified 
		  and solve this problem by casting device pointers correctly before 
		  being passed to CUDA kernels.
		- When setting parameters for CUDA kernel functions, the address offset 
		  calculation is now properly aligned so that CUDA code and applications
		  will be compatible on 32-bit and 64-bit Linux platforms.
		- The new CUDA Driver API samples by default build CUDA kernels with the 
		  output as PTX instead of CUBIN.  The CUDA Driver API samples now use 
		  PTXJIT to load the CUDA kernels and launch them.

	* Added sample pitchLinearTexture that shows how to texture from pitch linear
	  memory

Release 2.2 Final
[Windows and Linux SDK]
	* Supports CUDA Event Blocking Stream Synchronization (CU_CTX_BLOCKING_SYNC) on Linux and Windows

[All SDK Packages]
	* Added Mandelbrot (Julia Set), deviceQueryDrv, radixSort, SobolQRNG, threadFenceReduction
	* New CUDA 2.2 capabilities: 
		- supports zero-mem copy (GT200, MCP79) 
			* simpleZeroCopy SDK sample
		- supports OS allocated pinned memory (write combined memory).  Test this by:
			> bandwidthTest -memory=PINNED -wc

Release 2.1
[Windows SDK]
	* Projects that depend on paramGL now build the paramGL source files instead of 
	  statically linking with paramGL*.lib.

[All SDK Packages]
	* CUDA samples that use OpenGL interop now call cudaGLSetGLDevice after the GL context is created.
	  This ensures that OpenGL/CUDA interop gets the best possible performance possible.
	* Bug fixes

Release 2.1 Beta
[Windows SDK]
	* Now supports Visual Studio 2008 projects, all samples also include VS2008
	* Removed Visual Studio 2003.NET projects
	* Added Visual Studio CUDA.rules to support *.cu files.  Most projects now use this
	  rule with VS2005 and VS2008 projects.
	* Default CUDA SDK installation folder is under "All Users" or "Current User" in a sub-folder 
	  "Application Data\NVIDIA Corporation\NVIDIA GPU Computing SDK".  See section "III. Known issues" for 
	  more details.

[Mac and Linux SDK]
	* For CUDA samples that use the Driver API, you must install the Linux 32-bit
	  compatibility (glibc) binaries on Linux 64-bit Platforms.  See Known issues in about
	  section IV on how to do this.

[All SDK Packages]
	* Added CUDA smokeParticles (volumetric particle shadows samples)
	* Note: added cutil_inline.h for CUDA functions as an alternative to using the
			cutil.h macro definitions

Release 2.0 Beta2
[Windows SDK]
	* 2 new code samples:
	  cudaVideoDecode and simpleVoteIntrinsics

Release 2.0 Beta
[All SDK Packages]
	* Updated to the 2.0 CUDA Toolkit
	* CUT_DEVICE_INIT macro modified to take command line arguments. All samples now
	  support specifying the CUDA device to run on from the command line (�-device=n�).
	* deviceQuery sample: Updated to query number of multiprocessors and overlap
	  flag.
	* multiGPU sample: Renamed to simpleMultiGPU.
	* reduction, MonteCarlo, and binomialOptions samples: updated with optional
	  double precision support for upcoming hardware.
	* simpleAtomics sample: Renamed to simpleAtomicIntrinsics.
	* 7 new code samples: 
	  dct8x8, quasirandomGenerator, recursiveGaussian, simpleD3D9Texture,
	  simpleTexture3D, threadMigration, and volumeRender
  
[Windows SDK]
	* simpleD3D sample: Renamed to simpleD3D9 and updated to the new Direct3D
	  interoperability API.
	* fluidsD3D sample: Renamed to fluidsD3D9 and updated to the new Direct3D
	  interoperability API.

Release 1.1
* Updated to the 1.1 CUDA Toolkit
* Removed isInteropSupported() from cutil: graphics interoperability now works
  on multi-GPU systems
* MonteCarlo sample: Improved performance.  Previously it was very fast for
  large numbers of paths and options, now it is also very fast for 
  small- and medium-sized runs.
* Transpose sample: updated kernel to use 2D shared memory array for clarity, 
  and optimized bank conflicts.
* 15 new code samples: 
  asyncAPI, cudaOpenMP, eigenvalues, fastWalshTransform, histogram256,
  lineOfSight, Mandelbrot, marchingCubes, MonteCarloMultiGPU, nbody, oceanFFT,
  particles, reduction, simpleAtomics, and simpleStreams

Release 1.0
* Added support for CUDA on the MAC
* Updated to the 1.0 CUDA Toolkit.
* Added 4 new code samples: convolutionTexture, convolutionFFT2D,
  histogram64, and SobelFilter.
* All graphics interop samples now call the cutil library function 
  isInteropSupported(), which returns false on machines with multiple CUDA GPUs,
  currently (see above).
* When compiling in DEBUG mode, CU_SAFE_CALL() now calls cuCtxSynchronize() and
  CUDA_SAFE_CALL() and CUDA_CHECK_ERROR() now call cudaThreadSynchronize() in
  order to return meaningful errors. This means that performance might suffer in
  DEBUG mode.

Release 0.9
* Updated to version 0.9 of the CUDA Toolkit.
* Added 6 new code samples: MersenneTwister, MonteCarlo, imageDenoising, 
  simpleTemplates, deviceQuery, alignedTypes, and convolutionSeparable.
* Removed 3 old code samples:
  - vectorLoads and loadUByte replaced by alignedTypes;
  - convolution replaced by convolutionSeparable.

Release 0.8.1 beta
* Standardized project and file naming conventions. Several project names 
  changed as a result.
* cppIntegration output now matches the other samples ("Test PASSED").
* Modified transpose16 sample to transpose arbitrary matrices efficiently, and
  renamed it to transpose.
* Added 11 new code samples: bandwidthTest, binomialOptions, BlackScholes, 
  boxFilter, convolution, dxtc, fluidsGL, multiGPU, postProcessGL, 
  simpleTextureDrv, and vectorLoads.

Release 0.8 beta
* First public release.


--------------------------------------------------------------------------------
VII. OS Platforms Supported
--------------------------------------------------------------------------------

[Windows Platforms]
	OS Platform Support support from 2.2
		* Vista 32 & 64bit
			  o Visual Studio 8 (2005)
			  o Visual Studio 9 (2008) 
		* WinXP 32 & 64-bit
			  o Visual Studio 8 (2005)
			  o Visual Studio 9 (2008) 

	OS Platform Support added to 2.3
		* Win7 32 & 64

[Mac Platforms]
	OS Platform Support support from 2.2
		* MacOS X Leopard 10.5.6+ (32-bit)
			  o (llvm-)gcc 4.2 Apple 

	OS Platform Support support from 2.3
		* MacOS X SnowLepard 10.6 (32-bit)

[Linux Platforms]
	OS Platform Support support from 2.2
		* Linux distros 32 & 64: 
			RHEL-4.x (4.7), 
			RHEL-5.x (5.3), 
			SLED-10-SP2, 
			Fedora10,         
			Ubuntu-8.10, 
			OpenSUSE 11.1
			  o gcc 3.4, gcc 4 

	OS Platform Support added to 2.3
		* SLED11
		* Ubuntu-9.04

	OS Platforms not officially supported in 2.3
		* Fedora 9, 
		  Ubuntu-8.04, 
		  OpenSUSE-11.0