The CUDA Memory Checker detects problems in global and shared memory. If the CUDA Debugger detects an MMU fault when running a kernel, it will not be able to specify the exact location of the fault. In this case, enable the CUDA Memory Checker and restart debugging, and the CUDA Memory Checker will pinpoint the exact statements that are triggering the fault.
The CUDA Memory Checker will also detect problems in code that is built in release mode. Without symbols (-G0) it will not show any associated source code.
NOTE: The CUDA Memory Checker must be enabled before debugging is started.
NOTE: Keep in mind that there is a performance penalty associated with running kernels with the CUDA Memory Checker enabled.
To use the CUDA Memory Checker:
During the debugging session, if the target attempts to write to an invalid memory location, the debugger triggers a breakpoint on the offending line of code, which stops execution. The CUDA Debugger stops before executing the instruction that would have written to the invalid memory location.
The CUDA Memory Checker treats misaligned pointers and attempted writes to an invalid address as two separate checks. This means that you can hit two separate breakpoints for the same attempted memory access.
For example, the following statement would trigger first a breakpoint because the pointer is misaligned, and then a second breakpoint because it attempts to write to an invalid memory address:
*(int*)0xffffffff = 0xbad;
When the CUDA Memory Checker is enabled, it will consume extra memory on the GPU. If there is not enough patch RAM for the CUDA Debugger, it will give the following error:
Internal debugger error occurred while attempting to launch "KernelName - CUmodule 0x04e67f10: code patching failed due to lack of code patching memory.
If this happens, increase the patch RAM factor by going to Nsight > Options > CUDA > Code Patching Memory Factor.
This is a multiplier of the kernel's instruction size, which is added to a base patch RAM size of 64k.
Another option is to disable the shared or global memory checking, in order to use less patch RAM.
The CUDA Memory Checker results go to the Nsight page of the Output window, as well as to the CUDA Information tool window.
Error Code Meaning
misaligned access during a memory load mis st misaligned access during a memory store mis atom misaligned access during an atomic memory transaction - an atomic function was passed a misaligned address adr ld invalid address during a memory load adr st invalid address during a memory store - attempted write to a memory location that was out of range, also sometimes referred to as a limit violation. adr atom invalid address during an atomic memory transaction - an atomic function attempted a memory access at an invalid address.
The following code snippet shows an example of attempting to write to a memory address out-of-bounds of the declared array.
|Code Example 1: Attempt To Write To Array Index Out Of Bounds|
Notice that Code Example 1 declares
globalArray as an array of data type
float, in global memory (
__device__ memory). If the array had been declared in a function executed by the CPU, the array would have been placed in thread-local memory, which is NOT checked by the CUDA Memory Checker.
The following code snippet shows an example of an attempt to write to an invalid memory address. If the CUDA Memory Checker is enabled, debugging an application that attempts to execute this code would trigger a breakpoint before executing the statement that assigns
0xd1e to the array.
|Code Example 2: Attempt To Write To An Invalid Memory Address|
|NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 2.2.120522 ©2009-2012. NVIDIA Corporation. All Rights Reserved.|