NVIDIA Nsight Visual Studio Edition 2.2 User Guide > CUDA Debugger > How To > Use the Memory Checker

The CUDA Memory Checker detects problems in global and shared memory. If the CUDA Debugger detects an MMU fault when running a kernel, it will not be able to specify the exact location of the fault. In this case, enable the CUDA Memory Checker and restart debugging, and the CUDA Memory Checker will pinpoint the exact statements that are triggering the fault.

The CUDA Memory Checker will also detect problems in code that is built in release mode. Without symbols (-G0) it will not show any associated source code.

NOTE: The CUDA Memory Checker must be enabled before debugging is started.

NOTE: Keep in mind that there is a performance penalty associated with running kernels with the CUDA Memory Checker enabled.

To use the CUDA Memory Checker:

  1. In Visual Studio, open a CUDA-based project.
  2. Enable the Memory Checker using one of three methods: 
  3. The user can enable checking in global memory or shared memory, as well as overall control of the CUDA Memory Checker.
    When the global memory space is enabled, NVIDIA Nsight will also check violations in memory allocated from device code via malloc and free.
  4. Launch the CUDA Debugger.
    1. Make sure that the Nsight Monitor is running on the target system.
    2. From Nsight menu, select Start CUDA Debugging.
      As an alternate option, you can also right-click on the project in Solution Explorer and choose Start CUDA Debugging.
    3. The CUDA Debugger starts and launches the target application.

During the debugging session, if the target attempts to write to an invalid memory location, the debugger triggers a breakpoint on the offending line of code, which stops execution. The CUDA Debugger stops before executing the instruction that would have written to the invalid memory location.

The CUDA Memory Checker treats misaligned pointers and attempted writes to an invalid address as two separate checks. This means that you can hit two separate breakpoints for the same attempted memory access.

For example, the following statement would trigger first a breakpoint because the pointer is misaligned, and then a second breakpoint because it attempts to write to an invalid memory address:

*(int*)0xffffffff = 0xbad;

Out of Patch RAM Errors

When the CUDA Memory Checker is enabled, it will consume extra memory on the GPU. If there is not enough patch RAM for the CUDA Debugger, it will give the following error: 

Internal debugger error occurred while attempting to launch "KernelName - CUmodule 0x04e67f10: code patching failed due to lack of code patching memory.

If this happens, increase the patch RAM factor by going to Nsight > OptionsCUDACode Patching Memory Factor.

This is a multiplier of the kernel's instruction size, which is added to a base patch RAM size of 64k.

Another option is to disable the shared or global memory checking, in order to use less patch RAM.

Memory Checker Results

The CUDA Memory Checker results go to the Nsight page of the Output window, as well as to the CUDA Information tool window.

Error Code Meaning

mis ld

misaligned access during a memory load
mis st misaligned access during a memory store
mis atom misaligned access during an atomic memory transaction - an atomic function was passed a misaligned address
adr ld invalid address during a memory load
adr st invalid address during a memory store  - attempted write to a memory location that was out of range, also sometimes referred to as a limit violation.
adr atom invalid address during an atomic memory transaction - an atomic function attempted a memory access at an invalid address.

Example 1

The following code snippet shows an example of attempting to write to a memory address out-of-bounds of the declared array.

Code Example 1: Attempt To Write To Array Index Out Of Bounds
__device__ float globalArray[10];
void naughtyWriter ()

(i = 0; i <= 10; i++)   //error: index will exceed array length

Notice that Code Example 1 declares globalArray[] as an array of data type float, in global memory (__device__ memory). If the array had been declared in a function executed by the CPU, the array would have been placed in thread-local memory, which is NOT checked by the CUDA Memory Checker.

Example 2

The following code snippet shows an example of an attempt to write to an invalid memory address. If the CUDA Memory Checker is enabled, debugging an application that attempts to execute this code would trigger a breakpoint before executing the statement that assigns 0xd1e to the array.

Code Example 2: Attempt To Write To An Invalid Memory Address
__global__ void otherNaughtyWriter(int* pValues, int numElements)
= 0xd1e;   //error: invalid memory address




NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 2.2.120522 ©2009-2012. NVIDIA Corporation. All Rights Reserved.