NVIDIA® Nsight™ Development Platform, Visual Studio Edition 2.2 User Guide
Send Feedback
The SDK consists of the include files, pre-built stub libraries, DLLs, and several SDK samples.
The SDK is installed at the following location:
32-bit C:\Program Files\NVIDIA GPU Computing Toolkit\nvToolsExt64-bit C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\nvToolsExtbinWin32nvToolsExt32_1.dllx64nvToolsExt64_1.dllincludenvToolsExt.hnvToolsExtCuda.hnvToolsExtOpenCL.hlibWin32nvToolsExt32_1.libx64nvToolsExt64_1.lib
The SDK includes two sample projects located at C:\ProgramData\NVIDIA Nsight <ver>\Samples\NvToolsExt\
The SDK contains the following samples:
nvtxSimple | Demonstrates how to use the NVTX API to generate marker and range events, and name OS Threads and Categories. |
nvtxMultithreaded | Demonstrates more advanced usages of the NVTX C API. Introduces two sample C++ wrappers that simplify use of the API. |
The core NVTX API is defined in file nvToolsExt.h
, whereas domain-specific extensions to the NVTX interface are exposed in separate header files. For example, see nvToolsExtCuda.h
for CUDA-specific NVTX API functions.
The library (.lib
) and runtime components (.dll
) are provided in both 32-bit and 64-bit. The naming scheme for these files is defined as nvToolsExt<bitness=32|64>_<version>.{dll|lib}
.
All NVTX API functions start with an nvtx
name prefix and may end with one out of the three postfixes A
, W
, or Ex
. NVTX functions with such postfix exist in multiple variants, performing the same core functionality with different parameter encodings. Depending on the version of the NVTX library, available encodings may include ACSII (A), Unicode (W), or event structures (Ex).
Some of the NVTX functions are defined to have return values. For example, the nvtxRangeStart
functions returns a unique range identifier or nvtxRangePush
/nvtxRangePop
functions outputs the current stack level. It is recommended not to use the returned values as part of conditional code in the instrumented application. The returned values can differ between various implementations of the NVTX library and, consequently, having added dependencies on the return values might work with one tool, but may fail with another.
The NVTX API is a straight C API. The nvtxMultithreaded
sample contains an example for a C++ wrapper. It is recommended to use such a customized wrapper layer on top of the raw API to simplify inclusion of NVTX in your application.
Another advantage of a wrapper library is that it hides any changes to the base API from the end user’s program. So if one API call is changed, the developer only needs to update the wrapper library code, rather than go through the entire code and change every reference.
Markers are used to describe events that occurred at a specific time during the execution of an application, while ranges detail the time span in which they occurred. This information is presented alongside all of the other captured data, which makes it easier to understand the collected information.
The first version of the NVTX C API only allowed the caller to specify a message. The API supports both ASCII and Unicode variants of the API.
The second version of the NVTX C API added support for per-event attributes. Attributes include category, color, message, and payload. All attributes are optional.
This structure is used to describe the attributes of an event. The layout of the structure is defined by a specific version of the tools extension library and can change between different versions of the Tools Extension library.
Markers and ranges can use attributes to provide additional information for an event or to guide the tool's visualization of the data. Each of the attributes is optional and if left unspecified, the attributes fall back to a default value.
To specify any attribute other than the text message, the Ex variant of the function must be called.
3.2.2.1 Message
The message field can be used to specify an optional string. The caller must set both the
messageType
andmessage
fields. The default value isNVTX_MESSAGE_UNKNOWN
.Code Sample:
// VALID NVTX_MESSAGE_TYPE_ASCII
nvtxEventAttributes_t eventAttrib1 = {0};eventAttrib1.version = NVTX_VERSION;eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib1.message.ascii = __FUNCTION__ ":ascii";
nvtxMarkEx(&eventAttrib1);DELAY();// VALID NVTX_MESSAGE_TYPE_UNICODE
nvtxEventAttributes_t eventAttrib2 = {0};eventAttrib2.version = NVTX_VERSION;eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib2.messageType = NVTX_MESSAGE_TYPE_UNICODE;eventAttrib2.message.unicode = __FUNCTIONW__ L":unicode \u2603 snowman";
nvtxMarkEx(&eventAttrib2);DELAY();3.2.2.2 Category
A category attribute is a user-controlled ID that can be used to group events. The tool may use category IDs to improve filtering, or for grouping events. The functions
nvtxNameCategoryA
ornvtxNameCategoryW
can be used to name a category. The default value is 0.Code Sample:
nvtxEventAttributes_t eventAttrib = {0};eventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib.category = 1;// specifying message to help identify this event in the tool.
eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__;nvtxMarkEx(&eventAttrib);// Categories can be named using nvtxNameCategory{A,W}().
nvtxNameCategoryA(1, __FUNCTION__);
3.2.2.3 Color
The color attribute is used to help visually identify events in the tool. The caller must set both the
colorType
andcolor
field.Code Sample:
// valid specification of color
nvtxEventAttributes_t eventAttrib1 = {0};eventAttrib1.version = NVTX_VERSION;eventAttrib1.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib1.colorType = NVTX_COLOR_ARGB;eventAttrib1.color = COLOR_RED;// specifying message to help identify this event in the tool.eventAttrib1.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib1.message.ascii = __FUNCTION__ ":valid color";
nvtxMarkEx(&eventAttrib1);// default colornvtxEventAttributes_t eventAttrib2 = {0};eventAttrib2.version = NVTX_VERSION;eventAttrib2.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;// specifying message to help identify this event in the tool.eventAttrib2.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib2.message.ascii = __FUNCTION__ ":default color";
nvtxMarkEx(&eventAttrib2);
3.2.2.4 Payload
The payload attribute can be used to provide additional data for markers and ranges. Range events can only specify values at the beginning of a range. The caller must specify valid values for both
payloadType
andpayload
.Code Sample:
nvtxEventAttributes_t eventAttrib = {0};eventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib.payloadType = NVTX_PAYLOAD_TYPE_UNSIGNED_INT64;eventAttrib.payload.llValue = 0;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__ ":UNSIGNED_INT64 = 0";
nvtxMarkEx(&eventAttrib);
3.2.3.1 Initializing Attributes
The caller should always perform the following three tasks when using attributes:
- Zero the structure;
- Set the version field;
- Set the size field.
Zeroing the structure sets all the event attributes types and values to the default value. The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structure.
3.2.3.2 Versioning
It is recommended that the caller use one of the following to methods to initialize the event attributes structure:
• Version Safe
The version and size field are used by the Tools Extension implementation to handle multiple versions of the attributes structures. This example shows how to initialize the structure for forwards compatibility.
nvtxEventAttributes_t eventAttrib = {0};eventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;eventAttrib.colorType = NVTX_COLOR_ARGB;eventAttrib.color = ::COLOR_YELLOW;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__;nvtxMarkEx(&eventAttrib);// This can be done using C99 designated initializers.nvtxEventAttributes_t eventAttrib2 ={.version = NVTX_VERSION, // version
.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE, // size
.colorType = NVTX_COLOR_ARGB, // colorType
.color = COLOR_YELLOW, // color
.messageType = NVTX_MESSAGE_TYPE_ASCII, // messageType
.message = __FUNCTION__ ":Designated Initializer"
};nvtxMarkEx(eventAttrib2);
• Version Specific
This example shows how to initialize the structure to a specific version of the library.
nvtxEventAttributes_v1 eventAttrib = {0};eventAttrib.version = 1;eventAttrib.size = (uint16_t)(sizeof(nvtxEventAttributes_v1));
eventAttrib.colorType = NVTX_COLOR_ARGB;eventAttrib.color = COLOR_MAGENTA;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__;nvtxMarkEx(&eventAttrib);// This can be done using ordered initialization.nvtxEventAttributes_v1 eventAttrib2 ={1, // version
(uint16_t)(sizeof(nvtxEventAttributes_v1)), // size0, // category
NVTX_COLOR_ARGB, // colorType
COLOR_CYAN, // color
NVTX_PAYLOAD_TYPE_UNSIGNED_INT64, // payloadType
1, // payload
NVTX_MESSAGE_TYPE_ASCII, // messageType
__FUNCTION__ ":Ordered Initialization" // message};nvtxMarkEx(&eventAttrib2);If the caller uses Method 1 it is critical that the entire binary layout of the structure be configured to 0 so that all fields are initialized to the default value. The caller should either use both
NVTX_VERSION
andNVTX_EVENT_ATTRIB_STRUCT_SIZE
(Method 1) or use explicit values and a versioned type (Method 2). Using a mix of the two methods will likely cause either source level incompatibility or binary incompatibility in the future.
A marker is used to describe a single point in time.
nvtxMarkEx |
A marker can contain a text message or specify additional information using the event attributes structure. These attributes include a text message, color, category, and a payload. Each of the attributes is optional and can only be sent out using the nvtxMarkEx function. If nvtxMarkA or nvtxMarkW are used to specify the marker, or if an attribute is unspecified, then a default value will be used. Parameters: |
nvtxMarkA |
A marker created using nvtxMarkA or nvtxMarkW contains only a text message. Parameters: |
// nvtxMark{A,W}nvtxMarkA(__FUNCTION__ ":nvtxMarkA");
nvtxMarkW(__FUNCTIONW__ L":nvtxMarkW");
// nvtxMarkEx// zero the structurenvtxEventAttributes_t eventAttrib = {0};// set the version and the size informationeventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;// configure the attributes. 0 is the default for all attributes.eventAttrib.colorType = NVTX_COLOR_ARGB;eventAttrib.color = COLOR_RED;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__ ":nvtxMarkEx";
nvtxMarkEx(&eventAttrib);
Push/Pop ranges are an excellent way to track nested time ranges which occur on a CPU thread. The duration of each range is defined by the corresponding pair of nvtxRangePush and nvtxRangePop calls in the application's source code. Nested ranges are handled automatically on a per-CPU thread basis, and no special developer code is necessary.
nvtxPushEx |
Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned. Parameters: |
nvtxPushA |
Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned. Parameters: |
nvtxPushW |
Marks the start of a nested range. Returns the 0 based level of range being started. If an error occurs, a negative value is returned. Parameters: |
nvtxRangePop | Marks the end of a nested range. If an error occurs, a negative value is returned on the current thread. |
// nvtxRangePush{A,W}nvtxRangePushA(__FUNCTION__ ":nvtxRangePushA");
nvtxRangePop();nvtxRangePushW(__FUNCTIONW__ L":nvtxRangePushW");
nvtxRangePop();// nvtxRangePushEx// zero the structurenvtxEventAttributes_t eventAttrib = {0};// set the version and the size informationeventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;// configure the attributes. 0 is the default for all attributes.eventAttrib.colorType = NVTX_COLOR_ARGB;eventAttrib.color = COLOR_GREEN;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangePushEx";
nvtxRangePushEx(&eventAttrib);nvtxRangePop();
Start/End ranges are used to denote a time span; however, they expose arbitrary concurrency (not just nesting), and the start of a range can occur on a different thread than the end. For the correlation of a start/end pair, a unique correlation ID is created that is returned from nvtxRangeStart, and is then passed into nvtxRangeEnd.
nvtxStartEx |
Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events. Parameters: |
nvtxStartA |
Marks the start of a range. Ranges defined by start/end can overlap. This API call returns the unique ID used to correlate a pair of Start and End events. Parameters: |
nvtxRangeEnd |
Marks the end of a range. Parameters: |
// nvtxRangeStart{A,W}nvtxRangeId_t id1 = nvtxRangeStartA(__FUNCTION__ ":nvtxRangeStartA");
nvtxRangeEnd(id1);nvtxRangeId_t id2 = nvtxRangeStartW(__FUNCTIONW__ L":nvtxRangeStartW");
nvtxRangeEnd(id2);// zero the structurenvtxEventAttributes_t eventAttrib = {0};// set the version and the size informationeventAttrib.version = NVTX_VERSION;eventAttrib.size = NVTX_EVENT_ATTRIB_STRUCT_SIZE;// configure the attributes. 0 is the default for all attributes.eventAttrib.colorType = NVTX_COLOR_ARGB;eventAttrib.color = COLOR_BLUE;eventAttrib.messageType = NVTX_MESSAGE_TYPE_ASCII;eventAttrib.message.ascii = __FUNCTION__ ":nvtxRangeStartEx";
nvtxRangeId_t id3 = nvtxRangeStartEx(&eventAttrib);nvtxRangeEnd(id3);// overlapping events// re-use eventAttribeventAttrib.message.ascii = __FUNCTION__ ":Range 1";
nvtxRangeId_t r1 = nvtxRangeStartEx(&eventAttrib);eventAttrib.message.ascii = __FUNCTION__ ":Range 2";
nvtxRangeId_t r2 = nvtxRangeStartEx(&eventAttrib);nvtxRangeEnd(r1);nvtxRangeEnd(r2);
Categories and threads are used to group sets of events. Each category is identified through a unique ID; that ID is passed into any of the aforementioned marker/range events in order to assign that event to a specific category. The following API calls can be used to assign a name to a category ID.
nvtxNameCategoryA
nvtxNameCategoryW |
Allows the user to assign a name to a category ID. Parameters: |
nvtxNameOsThreadA |
Allows the user to name an active thread of the current process. If an invalid thread ID is provided, or a thread ID from a different process is used, the behavior of the tool is implementation-dependent. Parameters: |
nvtxNameCategory(1, "Memory Allocation");
nvtxNameCategory(2, "Memory Transfer");
nvtxNameCategory(3, "Memory Object Lifetime");
nvtxNameOsThread(GetCurrentThreadId(), "MAIN_THREAD");
CUDA devices, context, and streams can be named with the nvtxName-prefixed functions defined in the nvToolsExtCuda header. Each of these functions combines the object handle and the name that should be assigned to the object.
nvtxNameCuDeviceA
nvtxNameCuDeviceW |
Allows the user to associate a CUDA device with a user-provided name. Parameters: |
nvtxNameCuContextA nvtxNameCuContextW |
Allows the user to associate a CUDA context with a user-provided name. Parameters: |
nvtxNameCuStreamA nvtxNameCuStreamA |
Allows the user to associate a CUDA stream with a user-provided name. Parameters: |
CUdevice device = 0;
CUcontext context;
cuCtxCreate(&context, 0, device);nvtxNameCuContextA(context, "Context1");nvtxNameCuDeviceA(device, "Device0");
The functions used with CUDA resources can also provide the very same functionality to name OpenCL resources. The namable resources in this case include: devices, context, command queues, memory objects, samplers, programs, and events.
nvtxNameClDeviceA |
Allows the association of an OpenCL device with a user-provided name. Parameters: |
nvtxNameClContextA |
Allows the association of an OpenCL context with a user-provided name. Parameters: |
nvtxNameClCommandQueueA nvtxNameClCommandQueueW |
Allows the association of an OpenCL command queue with a user-provided name. Parameters: |
nvtxNameClMemObjectA nvtxNameClMemObjectW |
Allows the association of an OpenCL memory object with a user-provided name. Parameters: |
nvtxNameClSamplerA nvtxNameClSamplerW |
Allows the association of an OpenCL sampler with a user-provided name. Parameters: |
nvtxNameClProgramA nvtxNameClProgramW |
Allows the association of an OpenCL program with a user-provided name. Parameters: |
nvtxNameClEventA nvtxNameClEventW |
Allows the association of an OpenCL event with a user-provided name. Parameters: |
cl_context context = clCreateContextFromType(0, CL_DEVICE_TYPE_GPU, 0, 0, 0);
cl_command_queue queue = clCreateCommandQueue(context, NULL, 0, NULL);
nvtxNameClContextA(context,
"Context1"
);
nvtxNameClCommandQueueA(queue,
"Queue0"
);
The NVTX API is installed by the NVIDIA Nsight “host” installer (by default) into the following location:
On a 32-bit system:
C:\Program Files\NVIDIA GPU Computing Toolkit\nvToolsExt
On a 64-bit system:
C:\Program Files (x86)\NVIDIA GPU Computing Toolkit\nvToolsExt
Both the header files and the library files themselves (.lib, .dll), are located underneath this path.
By default, the NVIDIA Nsight installer will set up the environment variable NVTOOLSEXT_PATH
to point to the aforementioned location that matches the system's bits.
In order to compile your project with NVTX support in Visual Studio, use the following steps to setup your project accordingly:
$(NVTOOLSEXT_PATH)\include
$(NVTOOLSEXT_PATH)\lib\$(Platform)
nvToolsExt32_1.lib
or nvToolsExt64_1.lib
(according to your system specifications), to the Additional Dependencies.
In case you use NVTX to annotate code in .cu files, please also make sure the following configuration is setup (this is in addition to the steps discussed in the previous section):
$(NVTOOLSEXT_PATH)\include
It is recommended that you copy the NVTX headers and library files into your own source tree prior to integrating this API into your application. By doing this, you will ensure that your application does not require NVIDIA Nsight to be installed, in order for your application to build. The NVTX .dll has no direct dependencies on CUDA, DirectX, or other external libraries.
Once you have placed NVTX into your source tree, add a path to the NVTX headers into your include path, and include nvToolsExt.h into any CPU code source files. You may then begin to use the NVTX API calls as you wish, in order to annotate your application's runtime behavior.
When linking, you may either link using the stub .lib provided with NVTX, or use a LoadLibrary call to load the .dll directly.
The NVTX .dll is not installed into c:\Windows\System32 or another global location. Instead, make sure to deploy the .dll with your application. One common way to do this in Visual Studio is to copy the NVTX .dll into a directory which contains the application's executable, using a Custom Post-Build Step.
![]() |
Warning: Do not to rename the .dll in any way. Renaming the library will affect how NVIDIA Nsight interacts with the library to collect data. |
NVIDIA® Nsight™ Development Platform, Visual Studio Edition User Guide Rev. 2.2.120522 ©2009-2012. NVIDIA Corporation. All Rights Reserved. | |