OS runtime libraries can be traced to gather information about low-level userspace APIs. This traces the system call wrappers and thread synchronization interfaces exposed by the C runtime and POSIX Threads (pthread) libraries. This does not perform a complete runtime library API trace, but instead focuses on the functions that can take a long time to execute, or could potentially cause your thread be unscheduled from the CPU while waiting for an event to complete.
OS runtime tracing complements and enhances sampling information by:
Visualizing when the process is communicating with the hardware, controlling resources, performing multi-threading synchronization or interacting with the kernel scheduler.
Adding additional thread states by correlating how OS runtime libraries traces affect the thread scheduling:
Collecting backtraces for long OS runtime libraries call. This provides a way to gather blocked-state backtraces, allowing you to gain more context about why the thread was blocked so long, yet avoiding unnecessary overhead for short events.
To enable OS runtime libraries tracing from Nsight Systems:
CLI — Use the -t
, --trace
option with the osrt
parameter. See
Command Line Options for more
information.
GUI — Select the Collect OS runtime libraries trace checkbox.
An additional configuration parameter is available:
The functions listed below receive a special treatment. If the tool detects that the resource is already acquired by another thread and will induce a blocking call, we always trace it. Otherwise, it will never be traced.
pthread_mutex_lock
pthread_rwlock_rdlock
pthread_rwlock_wrlock
pthread_spin_lock
sem_wait
Note that even if a call is determined as potentially blocking, there is a chance that it may not actually block after a few cycles have elapsed. The call will still be traced in this scenario.
Nsight Systems only traces syscall wrappers exposed by the C runtime. It is not able to trace syscall invoked through assembly code.
Additional thread states, as well as backtrace collection on long calls, are only enabled if sampling is turned on.
It is not possible to configure the depth and duration threshold when collecting backtraces. Currently, only OS runtime libraries calls longer than 80 μs will generate a backtrace with a maximum of 24 frames. This limitation will be removed in a future version of the product.
It is required to compile your application and libraries with the
-funwind-tables
compiler flag in order for Nsight Systems to unwind
the backtraces correctly.
The OS runtime libraries tracing is limited to a select list of functions. It also depends on the version of the C runtime linked to the application.
Libc system call wrappers
accept accept4 acct alarm arch_prctl bind bpf brk chroot clock_nanosleep connect copy_file_range creat creat64 dup dup2 dup3 epoll_ctl epoll_pwait epoll_wait fallocate fallocate64 fcntl fdatasync flock fork fsync ftruncate futex ioctl ioperm iopl kill killpg listen membarrier mlock mlock2 mlockall mmap mmap64 mount move_pages mprotect mq_notify mq_open mq_receive mq_send mq_timedreceive mq_timedsend mremap msgctl msgget msgrcv msgsnd msync munmap nanosleep nfsservctl open open64 openat openat64 pause pipe pipe2 pivot_root poll ppoll prctl pread pread64 preadv preadv2 preadv64 process_vm_readv process_vm_writev pselect6 ptrace pwrite pwrite64 pwritev pwritev2 pwritev64 read readv reboot recv recvfrom recvmmsg recvmsg rt_sigaction rt_sigqueueinfo rt_sigsuspend rt_sigtimedwait sched_yield seccomp select semctl semget semop semtimedop send sendfile sendfile64 sendmmsg sendmsg sendto shmat shmctl shmdt shmget shutdown sigaction sigsuspend sigtimedwait socket socketpair splice swapoff swapon sync sync_file_range syncfs tee tgkill tgsigqueueinfo tkill truncate umount2 unshare uselib vfork vhangup vmsplice wait wait3 wait4 waitid waitpid write writev _sysctl
POSIX Threads
pthread_barrier_wait pthread_cancel pthread_cond_broadcast pthread_cond_signal pthread_cond_timedwait pthread_cond_wait pthread_create pthread_join pthread_kill pthread_mutex_lock pthread_mutex_timedlock pthread_mutex_trylock pthread_rwlock_rdlock pthread_rwlock_timedrdlock pthread_rwlock_timedwrlock pthread_rwlock_tryrdlock pthread_rwlock_trywrlock pthread_rwlock_wrlock pthread_spin_lock pthread_spin_trylock pthread_timedjoin_np pthread_tryjoin_np pthread_yield sem_timedwait sem_trywait sem_wait
I/O
aio_fsync aio_fsync64 aio_suspend aio_suspend64 fclose fcloseall fflush fflush_unlocked fgetc fgetc_unlocked fgets fgets_unlocked fgetwc fgetwc_unlocked fgetws fgetws_unlocked flockfile fopen fopen64 fputc fputc_unlocked fputs fputs_unlocked fputwc fputwc_unlocked fputws fputws_unlocked fread fread_unlocked freopen freopen64 ftrylockfile fwrite fwrite_unlocked getc getc_unlocked getdelim getline getw getwc getwc_unlocked lockf lockf64 mkfifo mkfifoat posix_fallocate posix_fallocate64 putc putc_unlocked putwc putwc_unlocked
Miscellaneous
forkpty popen posix_spawn posix_spawnp sigwait sigwaitinfo sleep system usleep
Copyright (c) 2012-2020, NVIDIA Corporation. All rights reserved.