--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
NVIDIA CUDA 
Linux Release Notes
Version 1.0
--------------------------------------------------------------------------------
--------------------------------------------------------------------------------

--------------------------------------------------------------------------------
New Features
--------------------------------------------------------------------------------

  Platform Support
  o  64-bit Linux support
  o  GCC 4.1.x support
  o  New distributions supported
     - Red Hat Enterprise Linux 3.8 (32-bit and 64-bit)
     - Red Hat Enterprise Linux 4.3 (64-bit)
     - Red Hat Enterprise Linux 4.4 (32-bit and 64-bit)
     - Red Hat Enterprise Linux 5.0 (32-bit and 64-bit)
     - SUSE Linux Enterprise Desktop 10.0 (32-bit and 64-bit)
     - SUSE Linux 10.1 (32-bit and 64-bit)
     - SUSE Linux 10.2 (32-bit and 64-bit)

  Hardware Support
  o  Additional hardware support added
     - Quadro FX 5600
     - Quadro FX 4600
     - GeForce 8800 Ultra 
     - GeForce 8600 GTS
     - GeForce 8600 GT
     - GeForce 8500 GT 

  Compiler and Toolchain
  o  PTX ISA support
  o  64-bit integer support
  o  maxrregcount option added to NVCC

  Mathematical Functions
  o  Additional functions added
     - sincos()
     - rsqrt()
     - exp10()
  o  Improved accuracy of mathematical functions     

  Miscellaneous
  o  Asynchronous Launches
  o  Asynchronous device to device memory copy

  CUFFT Library
  o  Real to Complex and Complex to Real FFT support
  o  Increased maximum 1-D FFT size to 8 million elements

  CUBLAS Library
  o  Additional functions added
     - cublasIsamin()
     - cublasIcamin()

--------------------------------------------------------------------------------
Major Performance Improvements
--------------------------------------------------------------------------------

  o  Improved device to device memory copy bandwidth
  o  Improved launch overhead

--------------------------------------------------------------------------------
Major Bug Fixes
--------------------------------------------------------------------------------

  o  Fixed memory leak that required reboot
  o  CUDA no longer causes harmless screen corruption in full screen terminal 
     mode

--------------------------------------------------------------------------------
Known Issues
--------------------------------------------------------------------------------

o Individual GPU program launches are limited to a run time 
  of less than 5 seconds on a GPU with a display attached.
  Exceeding this time limit causes a launch failure reported
  through the CUDA driver or the CUDA runtime.  GPUs without
  a display attached are not subject to the 5 second run time
  restriction.  For this reason it is recommeded that CUDA is
  run on a GPU that is NOT attached to an X display.

o Context creation is not thread safe.  Applications must take
  care that only one thread creates a context at a time.

o Launches that use texture are synchronous.

o While X does not need to be running in order to use CUDA, 
  X must have been initialized at least once after booting 
  in order to properly load the NVIDIA kernel module.  The 
  NVIDIA kernel module remains loaded even after X shuts down, 
  allowing CUDA to continue to function.

o OpenGL interopability may not function correctly on some
  systems with multiple displays enabled.  To avoid this issue,
  it is recommended that the system be configured to use TwinView
  and not separate X windows.

o When compiling GCC, special care must be taken for structs that
  contain 64-bit integers.  This is because GCC aligns long longs
  to a 4 byte boundary by default, while NVCC aligns long longs 
  to an 8 byte boundary by default.  Thus, when using GCC to
  compile a file that has a struct/union, users must give the
  -malign-double
  option to GCC.  When using NVCC, this option is automatically
  passed to GCC.

o For graphics interoperability, OpenGL must be running on the same
  GPU as the compute context.  As a result, graphics interopability 
  does not work on systems with multiple GPUs installed.

o The function pow(float,int) delivers incorrect results. Use function
  powf(float,float) instead.
   
 
--------------------------------------------------------------------------------
Revision History
--------------------------------------------------------------------------------

  06/2007 - Version 1.0
  06/2007 - Version 0.9 
  02/2007 - Version 0.8 - Initial public Beta 


--------------------------------------------------------------------------------
Open64 Sources
--------------------------------------------------------------------------------

The Open64 source files controlled under terms of the GPL license are
located via anonymous ftp at download.nvidia.com in the CUDAOpen64 directory.


--------------------------------------------------------------------------------
More Information
--------------------------------------------------------------------------------

  For more information and help with CUDA, please visit
  http://www.nvidia.com/cuda