-->

The CUPTI installation includes several samples that demonstrate the use of the CUPTI APIs. These samples can be referred to for the usage of different APIs supported by CUPTI. The samples are:

Activity API

activity_trace_async
This sample shows how to collect a trace of CPU and GPU activity using the new asynchronous activity buffer APIs.
callback_timestamp
This sample shows how to use the callback API to record a trace of API start and stop times.
cuda_graphs_trace
This sample shows how to collect the trace of CUDA graphs and correlate the graph node launch to the node creation API using CUPTI callbacks.
cuda_memory_trace
This sample shows how to collect the trace of CUDA memory operations. The sample also traces CUDA memory operations done via default memory pool.
cupti_correlation
This sample shows how to do the correlation between CUDA APIs and corresponding GPU activities.
cupti_external_correlation
This sample shows how to do the correlation of CUDA API activity records with external APIs.
cupti_finalize
This sample shows how to use API cuptiFinalize() to dynamically detach and attach CUPTI.
cupti_nvtx
This sample shows how to receive NVTX callbacks and collect NVTX records in CUPTI.
cupti_trace_injection
This sample shows how to build an injection library using the CUPTI activity and callback APIs. It can be used to trace CUDA APIs and GPU activities for any CUDA application. It does not require the CUDA application to be modified.
nvlink_bandwidth
This sample shows how to collect NVLink topology and NVLink throughput metrics in continuous mode.
openacc_trace
This sample shows how to use CUPTI APIs for OpenACC data collection.
pc_sampling
This sample shows how to collect PC Sampling profiling information for a kernel using the PC Sampling Activity APIs.
sass_source_map
This sample shows how to generate CUpti_ActivityInstructionExecution records and how to map SASS assembly instructions to CUDA C source.
unified_memory
This sample shows how to collect information about page transfers for unified memory.

Event and Metric APIs

callback_event
This sample shows how to use both the callback and event APIs to record the events that occur during the execution of a simple kernel. The sample shows the required ordering for synchronization, and for event group enabling, disabling, and reading.
callback_metric
This sample shows how to use both the callback and metric APIs to record the metric's events during the execution of a simple kernel, and then use those events to calculate the metric value.
cupti_query
This sample shows how to query CUDA-enabled devices for their event domains, events, and metrics.
event_multi_gpu
This sample shows how to use the CUPTI event and CUDA APIs to sample events on a setup with multiple GPUs. The sample shows the required ordering for synchronization, and for event group enabling, disabling, and reading.
event_sampling
This sample shows how to use the event APIs to sample events using a separate host thread.

Profiling API

extensions
This includes utilities used in some of the samples.
autorange_profiling
This sample shows how to use profiling APIs to collect metrics in autorange mode.
callback_profiling
This sample shows how to use callback and profiling APIs to collect the metrics during the execution of a kernel. It shows how to use different phases of profiling i.e. enumeration, configuration, collection and evaluation in the appropriate callbacks.
concurrent_profiling
This sample shows how to use the profiling API to record metrics from concurrent kernels launched in two different ways - using multiple streams on a single device, and using multiple threads with multiple devices.
cupti_metric_properties
This sample shows how to query various properties of metrics using the Profiling APIs. The sample shows collection method (hardware or software) and number of passes required to collect a list of metrics.
nested_range_profiling
This sample shows how to profile nested ranges using the Profiling APIs.
profiling_injection
This sample for Linux systems shows how to build an injection library which can automatically enable CUPTI's Profiling API using Auto Ranges with Kernel Replay mode. It can attach to an application which was not instrumented using CUPTI and profile any kernel launches.
userrange_profiling
This sample shows how to use profiling APIs to collect metrics in user specified range mode.

PC Sampling API

pc_sampling_continuous
This injection sample shows how to collect PC Sampling profiling information using the PC Sampling APIs. A perl script libpc_sampling_continuous.pl is provided to run the CUDA application with different PC sampling options. Use the command './libpc_sampling_continuous.pl --help' to list all the options. The CUDA application code does not need to be modified. Refer the README.txt file shipped with the sample for instructions to build and use the injection library.
pc_sampling_start_stop
This sample shows how to collect PC Sampling profiling information for kernels within a range using the PC Sampling start/stop APIs.
pc_sampling_utility
This utility takes the pc sampling data file generated by the pc_sampling_continuous injection library as input. It prints the stall reason counter values at the GPU assembly instruction level. It also does GPU assembly to CUDA-C source correlation and shows the CUDA-C source file name and line number. Refer the README.txt file shipped with the sample for instructions to build and run the utility.

Checkpoint API

checkpoint_kernels
This sample shows how to use the Checkpoint API to restore device memory, allowing a kernel to be replayed, even if it modifies its input data.