GPU Technology Conference 2010 - Recorded Sessions
  Keynotes   More Files
Coming Soon
ID Title Abstract Speakers Affiliation Topic Area(s) Streaming Downloads
1001 Opening Keynote with Jen-Hsun Huang, NVIDIA The opening keynote, features Jen-Hsun Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what's next in computing and graphics, and preview disruptive technologies and exciting demonstrations from across industries. Jen-Hsun Huang NVIDIA General Interest Watch now WMV Video  
1002 Day 2 Keynote with Dr. Klaus Schluten, University of Illinois at Urbana-Champaign How does the H1N1 “Swine Flu” virus avoid drugs while attacking our cells? What can we learn about solar energy by studying biological photosynthesis? How do our cells read the genetic code? What comes next in computational biology?

Computational biology is approaching a new and exciting frontier: the ability to simulate structures and processes in living cells. Come learn about the “computational microscope,” a new research instrument that scientists can use to simulate biomolecules at nearly infinite resolution. The computational microscope complements the most advanced physical microscopes to guide today’s biomedical research. In this keynote address, computational biology pioneer Dr. Klaus Schulten of the University of Illinois, Urbana-Champaign, will introduce the computational microscope, showcase the widely used software underlying it, and highlight major discoveries made with the aid of the computational microscope ranging from viewing protein folding, translating the genetic code in cells, and harvesting solar energy in photosynthesis. He will also look towards a future when cell tomography and computing will establish atom-by-atom views of entire life forms.
Klaus Schluten University of Illinois at Urbana-Champaign General Interest Watch now WMV Video  
1003 Closing Keynote with Dr. Sebastien Thrun, Stanford University and Google What really causes accidents and congestion on our roadways? How close are we to fully autonomous cars?

In his keynote address, Stanford Professor and Google Distinguished Engineer, Dr. Sebastian Thrun, will show how his two autonomous vehicles, Stanley (DARPA Grand Challenge winner), and Junior (2nd Place in the DARPA Urban Challenge) demonstrate how close yet how far away we are to fully autonomous cars. Using computer vision combined with lasers, radars, GPS sensors, gyros, accelerometers, and wheel velocity, the vehicle control systems are able to perceive and plan the routes to safely navigate Stanley and Junior through the courses. However, these closed courses are a far cry from everyday driving. Find out what the team will do next to get one step closer to the “holy grail” of computer vision, and a huge leap forward toward the concept of fully autonomous vehicles.
Sebastien Thrun Stanford University and Google General Interest Watch now WMV Video  
4006 Fireside Chat with Jen-Hsun Huang (Co-Founder & CEO, NVIDIA) Jen-Hsun Huang was joined in a fireside chat by Quentin Hardy, National Editor at Forbes Magazine.
They discussed the rise of GPUs, current trends in visual and parallel computing, and the transformational changes ahead for the industry.
Jen-Hsun Huang NVIDIA General Interest Not Available FLV   PDF
  Pre-Conference Tutorials More Files
Coming Soon
ID Title Abstract Speakers Affiliation Topic Area(s) Streaming Downloads
2004 Languages, APIs and Development Tools for GPU Computing (Pre-Conference Tutorial) Get a head start on the conference with this first-day introduction to key technologies for GPU Computing.  This 90-minute tutorial session will cover the key features and differences between the major programming languages, APIs and development tools available today.  Attendees will also learn several high level design patterns for consumer, professional and HPC applications, with practical programming considerations for each. Will Ramey NVIDIA Programming Languages & Techniques Watch now FLV MP4 PDF
2131 Introduction to CUDA C (Pre-Conference Tutorial) Starting with a background in C or C++, learn everything you need to know in order to start programming in CUDA C.  Beginning with a "Hello, World" CUDA C program, explore parallel programming with CUDA through a number of hands-on code examples. Examine more deeply the various APIs available to CUDA applications and learn the best (and worst) ways in which to employ them in applications. Master the first half of the book "CUDA by Example" as taught by the author, pointing you on a trajectory to complete the second half on your own after course completion. Jason Sanders NVIDIA Programming Languages & Techniques Watch now FLV MP4 PDF
2018 OpenCL on the GPU (Pre-Conference Tutorial) OpenCL is Khronos’ new open standard for parallel programming of heterogeneous systems. This tutorial session will introduce the main concepts behind the standard and illustrate them with some simple code walkthrough. Attendees will also learn how to make efficient use of the API to achieve good performance on the GPU. Cliff Woolley NVIDIA Tools & Libraries Watch now FLV MP4  
2157 DirectX 11 Overview (Pre-Conference Tutorial) This presentation gives an overview of the DirectX 11 pipeline and how it extends previous DirectX versions to enable stunning visual effects in real-time graphics applications.

Cem Cebenoyan NVIDIA Computer Graphics Watch now FLV MP4 PDF
2260 DirectCompute (Pre-Conference Tutorial) Learn how to to use the DirectCompute API to solve GPU computing problems.  This tutorial will introduce the DirectCompute API, cover the recommended best practices for GPU programming, and go over examples of how to use this API efficiently and effectively to solve compute-intensive problems. Eric Young, Matt Sandy NVIDIA, Microsoft Programming Languages & Techniques Watch now FLV MP4 PDF
2127 OpenGL (Pre-Conference Tutorial) This session will discuss the latest OpenGL features offered by NVIDIA for both Quadro and Geforce line of products. Learn more about OpenGL 4 as well as NVIDIA specific OpenGL extensions. Mark Kilgard NVIDIA Corporation Programming Languages & Techniques Watch now FLV MP4 PDF
2245 Parallel Nsight for Microsoft Visual Studio (Pre-Conference Tutorial) NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, you will learn how to use Parallel Nsight to develop GPU computing and graphics applications.
Learn how to use the powerful Parallel Nsight debugger to identify errors in CUDA C/C++ kernels and HLSL shaders using GPU breakpoints and direct memory and variable inspection. See how Parallel Nsight displays system-wide performance characteristics, allowing you to create efficient GPU algorithms. 
Kumar Iyer NVIDIA Tools & Libraries Watch now FLV MP4  
2024 NVIDIA Acceleration Engines Overview (Pre-Conference Tutorial) Come learn of the software engines NVIDIA freely provides to application developers to rapidly leverage new GPU capabilities and dramatically reduce the time it takes to bring compelling features to end users. Phillip Miller, Holger Kunz, Brian Harrison, Thomas Ruge NVIDIA Programming Languages & Techniques Watch now FLV MP4  
2261 Introduction to GPU Ray Tracing with NVIDIA OptiX (Pre-Conference Tutorial) Learn how to use NVIDIA OptiX to quickly develop high performance ray tracing applications for interactive rendering, offline rendering, or scientific visualization. This session will explore the latest available OptiX version.  Dave McAllister, Phillip Miller NVIDIA Ray Tracing Watch now FLV MP4  
2158 Programming the NVIDIA Digital Video Pipeline with OpenGL (Pre-Conference Tutorial) This tutorial session teaches attendees how to program the NVIDIA Quadro Digital Video Pipeline with OpenGL.  It will go in-depth into the techniques and recommended practices. Thomas True NVIDIA Programming Languages & Techniques Watch now FLV MP4  
2159 Programming the NVIDIA Digital Video Pipeline with Direct3D (Pre-Conference Tutorial) Learn how to program the NVIDIA Quadro Digital Video pipeline using Direct3D.  This session will provide an overview of the SDK, discuss device control, data transfers, performance measuring and tuning, ancillary data and application design considerations. Thomas True NVIDIA Programming Languages & Techniques Watch now FLV MP4  
2010 Implementing Stereoscopic 3D in Your Applications (Pre-Conference Tutorial) Let's dive into the 3rd dimension.  This talk presents a comprehensive technical overview of NVIDIA’s stereo technology and tools.  After a complete introduction to NVIDIA’s stereo technology, we will then explore in more detail production techniques for the new artistic space of effects and creativity offered by 3D stereo.  The take away of this session will be a solid understanding of NVIDIA’s stereo technology and how to take best advantage of it. Samuel Gateau, Steve Nash NVIDIA Programming Languages & Techniques Watch now FLV MP4 PDF
  Developer Summit & Research Summit Sessions   More Files
Coming Soon
 
ID Title Abstract Speakers Affiliation Topic Area(s) Streaming Downloads
2015 Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation Learn about new techniques to efficiently implement the Alternating Direction Implicit method on GPU for large 2D and 3D domains with complex boundaries.

A novel tridiagonal solver for systems with variable sizes and a new hybrid approach will be covered in detail. Comprehensive performance analysis and key Fermi optimizations will be explored.

Various applications of tridiagonal solvers such as 3D direct numerical fluid simulation and a 2D depth-of-field effect for games will be briefly discussed.
Nikolai Sakharnykh NVIDIA Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2020 GPU-Accelerated Data Expansion for the Marching Cubes Algorithm Learn how to accelerate marching cubes on the GPU by taking advantage of the GPU’s high memory bandwidth and fast on-chip shared memory in a data expansion algorithm that can extract the complete iso-surface mesh from (dynamic) volume data without requiring any data transfers back to the CPU. Gernot Ziegler, Chris Dyken NVIDIA, SINTEF Algorithms & Numerical Techniques Watch now FLV   PDF
2021 Efficient Volume Segmentation on the GPU Explore a new technique in the detection of common regions in a 2D/3D data array. Connected components along the axes are linked before actual label propagation starts. The algorithm is completely gather-based, which allows for several optimizations in the CUDA C implementation. It enables real-time frame rates for the analysis of typical 2D images and interactive frame rates for the analysis of typical volume data. Allan Rasmusson, Gernot Ziegler University of Aarhus, NVIDIA  Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2038 The Best of Both Worlds: Flexible Data Structures for Heterogeneous Computing Learn how to switch between array of structs (AoS) and struct of arrays (SoA) storage without having to change the data access syntax. A few changes to the struct and container definitions will enable you to evaluate the performance of AoS vs. SoA on your existing AoS code.  We present a simple abstraction that retains the more intuitive AoS syntax array[index]component, yet allows you to switch between AoS and SoA storage with a single template parameter at class definition. Robert Strzodka Max Planck Institut Informatik Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2061 Accelerating Explicit FEM Shock & Blast Simulations Explicit finite element codes are widely used to simulate the response of structures and mechanical equipment subjected to shock, blast and wave propagation phenomena. High resolution models require run times ranging from a few seconds to a few months are common and hence the payoff from GPU acceleration is tremendous. We describe the acceleration of our commercial finite element code NLFLEX using CUDA. We developed GPU kernels in CUDA based on our production code NLFLEX, for linear elasticity, explosives, elasto-plasticity and large deformation elasticity. We attained order of magnitude (10X) acceleration in single precision and approximately (5X) in double precision mode.  Nachiket Gokhale  Weidlinger Associates Inc Algorithms & Numerical Techniques Watch now FLV MP4  
2068 Parallelizing FPGA Technology Mapping using GPUs FPGA technology mapping is an algorithm that is heavily data parallel, but contains many features that make it unattractive for GPU implementation. The algorithm uses data in irregular ways since it is a graph-based algorithm. It also makes heavy use of constructs like recursion which is not supported by GPU hardware. In this paper, we take a state-of-the-art FPGA technology mapping algorithm within Berkeley’s ABC package and attempt to parallelize it on a GPU. We show that runtime gains of 3.1x are achievable while maintaining identical quality as demonstrated by running these netlists through Altera’s Quartus II place-and-route tool. Doris Chen University of Toronto Algorithms & Numerical Techniques Watch now FLV MP4  
2084 State of the Art in GPU Data-Parallel Algorithm Primitives Learn about the importance of optimized data-parallel algorithm primitives as building blocks for efficient real-world applications.  Fundamental parallel algorithms like sorting, parallel reduction, and parallel scan are key components in a wide range of applications from video games to serious science.  This session will cover the state of the art in data-parallel primitive algorithms for GPUs. Starting with an explanation of the purpose and applications of the algorithms, we will discuss key algorithm design principles, demonstrate current open source algorithm libraries for GPUs (CUDPP and Thrust), describe optimizations using new features in the Fermi architecture, and explore future directions. Mark Harris NVIDIA Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2085 Tridiagonal Solvers: Auto-Tuning and Optimizations In this presentation, we will discuss and analyze the performance of three optimization techniques for tridiagonal solvers. We first present a hybrid Parallel Cyclic Reduction(PCR)-Gaussian Elimination(GE) tridiagonal solver, which combines work-efficient and step-efficient algorithms for high performance. We further discuss an auto-tuned variant of this technique which selects the optimal switching point between algorithms on a per-machine basis. Next, we present a technique to handle large systems, where shared memory constraints prohibit previous work to solve these systems directly. Finally, we will discuss optimizations on a cyclic reduction technique that avoid bank conflicts on current hardware. Andrew Davidson, Yao Zhang University of California, Davis Algorithms & Numerical Techniques Watch now FLV MP4  
2136 Pseudo Random Number Generators for Massively Parallel Apps Learn how to select the best and fastest pseudo random number generator for your massively parallel Monte Carlo simulation.Pseudo random numbers generators (PRNG) are a fundamental building block of these simulations and it is thus required to select suitable PRNGs with regard to the specific problem at hand while considering the parallel hardware architecture.

Recent developments in random number generations provide a wide variety of choices, each with different properties and trade-offs. We provide a comprehensive survey of the current state of the art for massively parallel PRNG and show a broad range of applications.
Holger Dammertz Ulm University Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2140 Superfast Nearest Neighbor Searches Using a Minimal kd-tree Learn how to adapt a kd-tree spatial data structure for efficient nearest neighbor (NN) searches on a GPU.  Although the kd-tree is not a natural fit for GPU implementation, it can still be effective with the right engineering decisions. By bounding the maximum height of the kd-tree, minimizing the memory footprint of data structures, and optimizing the GPU kernel code, multi-core GPU NN searches with tens of thousands to tens of millions of points run 10-40 times faster than the equivalent single-core CPU NN searches. Shawn Brown UNC, Chapel Hill Algorithms & Numerical Techniques Watch now FLV   PDF
2163 Leveraging GPUs for Evolutionary Game Theory Learn how GPUs are being used to accelerate the study of the emergence of cooperative behavior in biology, from the interactions of humans to viruses to bacteria.  The work presented here achieves a speedup of 209x on a cluster of 4 Tesla GPUs. Amanda Peters Harvard University Algorithms & Numerical Techniques Watch now FLV   PDF
2166 The Triad of Extreme Computing-Fast Algorithms, Open Software and Heterogeneous Systems The first wave of successful GPU accelerations has been crowded with highly-parallel methods that adapted well to the hardware.  But the easy-pickings are now running out.  The truly challenging applications require "going back to the algorithmic drawing board."  To develop new versions of the most effective fast algorithms, such that our science can most benefit, an ideal environment is created by the open software model, where efforts can be shared.  We will describe one area of application --electrostatics of biomolecules in solution-- where we see at work the triad of extreme computing:  fast algorithms, open software, and heterogeneous computing. Lorena Barba Boston University Algorithms & Numerical Techniques Watch now FLV    
2171 Parallel Algorithms for Interactive Mechanical CAD The broad objective of our research is to develop mechanical Computer-Aided Design tools that provide interactive feedback to the designer. We have developed GPU algorithms for fundamental CAD operations (NURBS evaluation, surface-surface intersection, separation distance computation, moment computation, etc.) that are one to two orders of magnitude faster, and often more accurate, than current commercial CPU implementations. We will touch on strategies we have employed to meet GPU programming challenges, such as the separation of CPU/GPU operations, imposing artificial structure on computations, and transforming problem definitions to suit GPU-computation models. Adarsh Krishnamurthy, Sara McMains University of California Berkeley Algorithms & Numerical Techniques Watch now FLV MP4 PDF
2000 Gravitational N-body Simulations: How Massive Black Holes Interact with Stellar Systems Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, which requires heavy computations to have good representation of what happens in the inner regions of galaxies. We present the results obtained with our high precisioned N-body code, NBSymple, which exploits the joint power of a multi core CPU system together with the high performance NVIDIA Tesla C1060 GPUs. 

The code is available at the website:

astrowww.phys.uniroma1.it/dolcetta/nbsymple.html
Roberto Capuzzo-Dolcetta, Alessandra Mastrobuono Battisti Sapienza Univ. of Roma Astronomy & Astrophysics Watch now FLV MP4  
2044 GRASSY: Leveraging GPU Texture Units for Asteroseismic Data Analysis Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU's texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.

Matt Sinclair UW-Madison Astronomy & Astrophysics Watch now FLV    
2082 CU-LSP: GPU-based Spectral Analysis of Unevenly Sampled Data Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. To this end, I have developed CU-LSP, a CUDA spectral analysis code based on the Lomb-Scargle periodogram. Preliminary benchmarking indicates impressive speed-ups, on the order of 400 relative to a single core of a modern CPU. An initial application of CU-LSP will be the analysis of time-series data from planet-search and asteroseismology satellites.

Richard Townsend University of Wisconsin-Madison Astronomy & Astrophysics Watch now FLV MP4  
2099 Cosmology Powered by GPUs Redux Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. Using CUDA, I have implemented standard cosmological techniques on GPU architecture  (PM N-Body solver, Hydrodynamics & moment-based radiative transfer) and designed them to run on supercomputing facilities by means of MPI+CUDA mixed programming. These applications are able to run on 100 or more graphics devices with typical scalar x50 accelerations and with a communication overhead limited to 15%. It allow to explore physical regimes which were out of reach of current simulations.   Dominique Aubert Strasbourg University Astronomy & Astrophysics Watch now FLV MP4  
2108 Binary Black Holes Simulations using CUDA Get the latest information on how to evolve binary black holes simulations on GPUs. Abdul Mroue CITA, Univ. Of Toronto Astronomy & Astrophysics Watch now FLV MP4  
2178 Using GPUs to Track Changes in the Sun Learn how GPU computing is enabling astrophysicists to study our closest star. NASA's recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths.  This presentation will discuss ways that GPU computing is helping scientists cope with the analysis of the immense data volumes as well as in numerical modeling of the Sun. Mark Cheung Lockheed Martin Solar & Astrophysics Laboratory Astronomy & Astrophysics Watch now FLV   PDF
2042 Interactive 3D Audio Rendering Systems  Learn how to leverage GPUs for interactive audio rendering.  This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illustrate the benefits of GPU-accelerated audio rendering with results from 3D audio processing and sound scattering simulations. Finally, we will discuss best practices for GPU implementations as well as future opportunities for audio rendering on massively parallel architectures. Nicolas Tsingos Dolby Laboratories Audio Processing Watch now FLV MP4 PDF
2076 Implementing CUDA Audio Networks Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices.   Covers the use of inter-process communication to make a plug-in implementation loadable in 32 bit hosts installed in 64 bit systems, distributing the GPU load on remote servers, and creating a CUDA network for high-end purposes such as a big recording facility. Giancarlo Del Sordo Acustica Audio Audio Processing Watch now FLV MP4 PDF
2116 Real-time Multichannel Audio Convolution Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU.  We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks. Jose Antonio Belloch, Alberto Gonzalez, Antonio M. Vidal Institute of Telecommunications and Multimedia Applications, Universidad Politecnica de Valencia Audio Processing Watch now FLV MP4 PDF
2026 MatCloud: Accelerating Matrix Math GPU Operations with SaaS We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands/scripts. Join us to see how GPU technology can not only be applied to cloud computing community, but also boost the adoption of cloud computing for its dramatic performance gains over traditional cloud infrastructures.MatCloud is an in-progress academic project and is under active development.
Xing Wu, Frank Mueller North Carolina State University Cloud Computing Watch now FLV MP4  
2243 Microsoft RemoteFX - GPU Virtualization for Desktop Centralization Learn about Microsoft's upcoming GPU Virtualization feature, RemoteFX, which will ship in Windows Server 2008 R2 SP1. Microsoft RemoteFX enables GPUs to be hosted in the datacenter as a service that can be shared by multiple users for streaming the real-time and complete Windows 7 desktop experience to ultra-lightweight client devices anywhere on the corporate network. With Microsoft RemoteFX, users will be able to work remotely in a Windows Aero desktop environment, watch full-motion video, enjoy Silverlight animations, and run 3D applications – all with the fidelity of local-like performance. Tad Brockway Microsoft Cloud Computing Watch now FLV    
2022 Solving PDEs on Regular Grids with OpenCurrent OpenCurrent is an open source library with support for structured 3D grids and various PDE solvers that operate on them, including a multigrid Poisson solver and an incompressible Navier-Stokes solver.  It also includes extensions for splitting grids across multiple GPUs.  This talk will provide a basic introduction to the code base and its design principles. Jonathan Cohen NVIDIA Research Computational Fluid Dynamics Watch now FLV MP4  
2037 Numtech & GPGPU, a SME Point of View Hear why and how Numtech, a french SME working in the field of atmospheric dispersion and expertise of meteorological events, is benchmarking GPGPU for its futures applications. A compressible and an incompressible interactive flow solvers are described. Vivien Clauzon   Computational Fluid Dynamics Watch now FLV MP4  
2045 Roe-Pike Scheme for 2D Euler Equations Hear how we are improving our elsA and CEDRE computational fluid dynamics software by working on solving the Euler equations set on the GPU.  We discuss how our implementation considers the associated Riemann problem and the Roe-Pike differencing scheme at several orders in space while also introducing immerse boundary conditions.  Covers the significant speedup obtained through algorithmic and computational optimizations. Matthieu Lefebvre ONERA Computational Fluid Dynamics Watch now FLV MP4  
2049 Deflated Preconditioned Conjugate Gradient on the GPU Explore how to use deflation as a second level preconditioning technique to speed up Block Incomplete Cholesky Preconditioned Conjugate Gradient Method.  We use it to solve the Pressure correction equation involved in the solution of the Two-Phase Fluid Flow problem.  Our implementation reaches speedup factors between 25-30, for more than 260,000 unknowns, when compared to the CPU. Rohit Gupta, Kees Vuik Delft University Of Technology Computational Fluid Dynamics Watch now FLV MP4 PDF
2058 A Practical Introduction to Computational Fluid Dynamics on GPUs  Learn step-by-step procedures to write an explicit CFD solver based on final difference methods with staggered grid allocations and boundary fitted coordinates.  We will discuss the derivation of the mathematical model, discretization of the model equations, development of the algorithms, and parallelization and visualization of the computed data using OpenCL and OpenGL.  Compares case studies of natural convection, driven cavity, scaling analysis, and magneto-thermal convection computed using CSIRO's CPU/GPU supercomputer cluster to known analytical and experimental solutions. Tomasz Bednarz, Con Caris, John Taylor  CSIRO Computational Fluid Dynamics Watch now FLV MP4 PDF
2078 Shockingly fast and accurate CFD simulations In the last three years we have demonstrated how GPU accelerated discontinuous Galerkin methods have enabled simulation of time-dependent, electromagnetic scattering from airplanes and helicopters.  In this talk we will discuss how we have extended these techniques to enable GPU accelerated simulation of supersonic airflow as well. Timothy Warburton Rice University Computational Fluid Dynamics Watch now FLV MP4  
2079 A Fast, Scalable High-Order Unstructured Compressible Flow Solver  We will describe a scalable and efficient high-order unstructured compressible flow solver for GPUs. The solver allows the achievement of arbitrary order of accuracy for flows over complex geometries. High-order solvers require more operations per degree of freedom, thus making them highly suitable for massively parallel processors. Preliminary results indicate speed-ups up to 70x with the Tesla C1060 compared to the Intel i7 CPU. Memory access was optimized using shared and texture memory. David M. Williams, Patrice Castonguay Stanford University Computational Fluid Dynamics Watch now FLV MP4  
2083 GPU Accelerated Solver for the 3D Two-phase Incompressible Navier-Stokes Equations  This demonstrates the potential of GPUs for solving complex free surface flow problems using level set methods. These methods are capable of producing complex surface deformations, and therefore are used widely in computer graphics, as well as engineering applications. This work demonstrates that GPUs can be used to accelerate the most computationally expensive part of free surface flow calculations, and therefore allows much larger problems to be solved on workstation machines than was previously possible.  These techniques will be exemplified by our current project to port our in-house fluid solver NaSt3DGPF to the GPU. Peter Zaspel University of Bonn Computational Fluid Dynamics Watch now FLV    
2103 Development of an Efficient GPU-Accelerated Model for Fully Nonlinear Water Waves This work is concerned with the development of an efficient high-throughput scalable model for simulation of fully nonlinear water waves (OceanWave3D) applicable to solve and analyze large-scale problems in coastal engineering. The goal can be achieved through algorithm redesign and parallelization of an optimized sequential single-CPU algorithm based on a flexible-order Finite Difference Method. High performance is pursued by utilizing many-core processing in the model focusing on GPUs for acceleration of code execution. This involves combining analytical methods with an algorithm redesign of the current numerical model.  Allan Peter Engsig-Karup Technical University of Denmark Computational Fluid Dynamics Watch now FLV MP4  
2106 Particleworks: Particle-based CAE Software on Multi-GPU Prometech Software, Inc. is an university launched technology venture in Japan and has been working in the field of particle-based computational fluid dynamics for several years.  Through collaboratinos with major automotive and material companies in Japan, Prometech has implemented our Particle technology on Multi-GPU and delivered as a CAE software, "Particleworks".  In this session, we will discuss the theoretical background of our simulation (MPS; Moving Particle Simulation method), Multi GPU programming techniques of sparse matrix solver, performance results of Particleworks and the analysis examples of the Auto and Material. Issei Masaie Prometech Software, Inc. Computational Fluid Dynamics Watch now FLV MP4  
2110 Acceleration of a Novel Rotorcraft Wake Simulation Dive deep as we present the details of a new CUDA-based algorithm for accurate rotorcraft wake simulations.  We use a vortex particle method, accelerated with a multipole tree algorithm, combined with a traditional grid-based CFD code.  This CUDA algorithm can evaluate the velocity and velocity-gradient with an effective throughput approaching 300 billion interactions per second on a C1060.  This gives 10x speed-up and 2.5x better accuracy compared to the parallel CPU version. Christopher Stone Intelligent Light Computational Fluid Dynamics Watch now FLV MP4  
2118 Large-scale Gas Turbine Simulations on GPU Clusters This talk describes a strategy for implementing structured grid PDE solvers on GPUs. Techniques covered include the use of source-to-source compilation and the use of sparse matrix vector multiplications for complicated boundary conditions. A new production-quality solver for flows in turbomachines called Turbostream that uses these techniques is presented. The impact of the use of GPUs on the turbomachinery design process is demonstrated by two 64-GPU simulations that  have recently been performed on the University of Cambridge's GPU cluster. Tobias Brandvik University of Cambridge Computational Fluid Dynamics Watch now FLV MP4  
2170 Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUs Learn how a very efficient implementation of multiphase lattice Boltzmann methods (LBM) based on CUDA delivers significant benefits for predictions of properties in rocks.  This simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure.  We will show videos of these simulations in complex real world porous media and rocks. Jonas Toelke Ingrain Computational Fluid Dynamics Watch now FLV   PDF
2206 Accelerated Computational Fluid Dynamics Employing GPUs None provided. Daniel Gaudlitz FluiDyna Computational Fluid Dynamics Watch now FLV MP4  
2234 Unstructured Finite Volume Code on a Cluster with Multiple GPUs per Node Explore how a code written to run in parallel using OpenMP and on a single GPU was modified to run across multiple GPUs and nodes on a multi-CPU, multi-GPU cluster installed at the Naval Research Laboratory.  We will discuss the performance of this code running in parallel using MPI/OpenMP and MPI/CUDA. Keith Obenschain, Andrew Corrigan Naval Research Lab Code 6440 Computational Fluid Dynamics Watch now FLV   PDF
2239 Fast GPU Preconditioning for Fluid Simulations in Film Production Explore how a less efficient, but highly parallel algorithm can still be a superior alternative to a sequential CPU method. This talk will present a simple CUDA-based Poisson solver to the conjugate gradient method designed for solving well-conditioned matrices such as those that arise from the pressure projection stage of a Navier-Stokes fluid solver. In contrast to other active areas of research in this field, we show that a more brute force approach can still significantly out-perform the best CPU alternatives by sacrificing a high convergence rate in place of achieving much faster iterations. Dan Bailey Double Negative Computational Fluid Dynamics Not Available FLV   PDF
2292 Implementation of High-Order Adaptive CFD Methods on GPUs A discontinuous high-order formulation named the Correction Procedure via Reconstruction (CPR) is recently implemented on Nvidia GPUs. The CPR formulation is related to the discontinuous Galerkin (DG) method, and unifies several methods such as the DG, spectral volume and spectral difference into a single framework efficient for hybrid meshes. In preliminary 2D inviscid flow computations, a single GPU has been able to deliver a speedup of 44 over a CPU of the same generation. Extension is being made for viscous flow computation, and results will be presented at the final presentation.     Z.J. Wang, Lizandro Solano, Arun Somani Iowa State University Computational Fluid Dynamics Watch now FLV    
2295 Large-scale CFD Applications and a Full GPU Implementation of a Weather Prediction Code on the TSUBAME Supercomputer Many CFD applications have been successfully accelerated on GPUs, but for large-scale simulations that require memory beyond a single GPU, communication is required between GPUs over cluster nodes through PCI-Express and interconnects. To overcome performance bottlenecks and preserve parallel scalability, an overlapping technique between computation and communication is essential. This work presents results of an LBM for incompressible flow, and a Tsunami simulation solving the shallow water equation for simulations on the NVIDIA Tesla-based TSUBAME supercomputer of Tokyo Tech. In addition results will be presented on a complete GPU implementation of a production-level weather prediction code developed by the JMA that achieves 15 TFLOPS for an 80-fold speedup. Takayuki  Aoki Tokyo Institute of Technology Computational Fluid Dynamics Watch now FLV MP4  
2056 Next-Generation Rendering with CgFX Dive into the details of using CgFX – Cg’s effect framework – to combine ray-tracing with real-time rendering and enable the next generation of complex high-quality rendering. You will learn how to use CgFX to create complex rendering effects in a concise and elegant fashion by: Blending material-level and scene-level effects in a consistent way,- Seamlessly integrating CUDA-based data processing within the CgFX rendering pipeline,Mixing OptiX-based rendering with CgFX and OpenGL.
Tristan Lorach NVIDIA Computer Graphics Watch now FLV MP4  
2071 Large Scale Visualization Soup The unprecedented realism that is possible today allows for
visualization at an ever larger scale.  This talk will walk through
several case studies from high resolution single displays to completely
immersive environments.  Details will be shared on how to architect and
implement these installations, with attention to the typical issues
encountered.  It will cover how to implement stereo 3D in OpenGL,
Direct3D, as well as how that relates to the different display
technologies (projectors, multi-display, CAVEs, etc.)
Steve Nash NVIDIA Computer Graphics Watch now FLV MP4 PDF
2129 Hardware Subdivision and Tessellation of Catmull-Clark Surfaces See how the new DirectX 11 Hardware Tessellation and Compute Shader can be used to implement an adaptive Catmull-Clark subdivision surface renderer. We use a table driven approach to performing Catmull-Clark subdivision in parallel utilizing one thread per output mesh vertex. Charles Loop Microsoft Research Computer Graphics Watch now FLV MP4 PDF
2134 Ultra High Resolution Displays and Interactive Eyepoint Using CUDA We'll go over the challenges we have  overcome in  building 100 million pixel seamless displays. One customer requirement  involves interactive changes of the eyepoint as a person moves, relative to the screen, yet the distortions computed are quite non-linear. We discuss our use of a gpu to implement this procedure. Rajeev Surati Scalable Display Technologies Computer Graphics Watch now FLV MP4  
2152 Using Virtual Texturing to Handle Massive Texture Data A virtual texture implementation allows applications the ability to manage gigantic amounts of texture data for rendering complex data sets. However, practical utilization involves feeding it adequate data. The GPU offers a powerful engine capable of accelerating the transcoding of efficient storage formats into formats useful for rendering. This session will demonstrate a virtual texturing implementation and the steps needed to GPU accelerate the non-rendering portions of managing and loading the virtual texture data. Evan Hart, Johannes van Waveren NVIDIA, id Software Computer Graphics Watch now FLV MP4 PDF
2161 NVIDIA Quadro Digital Video Pipeline Overview This session will provide an overview of the Quadro Digital Video Pipeline. It will cover a description of the DVP components, application architectures software architectures, and programming resources available.

Thomas True NVIDIA Computer Graphics Watch now FLV MP4  
2162 Real-time Reyes: Programmable Rendering on Graphics Processors We present a discussion of ideas and techniques behind programmable graphics pipelines on modern GPUs, specifically the example design of a real-time Reyes renderer. Walking through this example, we address the philosophy beneath programmable GPU graphics, the broad strategy for the specific pipeline, and algorithmic and implementation-level details for key rendering stages. We cover several issues concerning GPU efficiency, including those involving work scheduling, parallelization of traditional stages, and balancing of rendering workloads. We expect the audience to gain an in-depth exposure of the state of research in programmable graphics, and an insight into efficient pipeline design for irregular workloads. Anjul Patney, Stanley Tzeng University of California, Davis Computer Graphics Watch now FLV    
2165 Rendering Revolution Learn how GPU technologies are transforming the making of pixels. This talk will cover GPU-centric rendering techniques that leverage both the raw computational capabilities of NVIDIA’s GPUs and advanced pixel-shading techniques for interactive visualization and rendering. Ken Pimentel Autodesk Computer Graphics Watch now FLV MP4 PDF
2227 OpenGL 4.0 Tessellation for Professional Applications The new generation of accelerated graphics is elevating visual computing to new heights. Tessellation, one of its most anticipated features, is already used in many scenarios to bring 3D graphics to an unprecedented level of realism.

This talk will introduce tessellation using OpenGL 4.0. We will also describe how an existing application can be adapted to efficiently take advantage of this new feature and also how to overcome some of the challenges.

Philippe Rollin NVIDIA Computer Graphics Watch now FLV MP4 PDF
2308 Building Cutting-Edge Realtime 3D Applications with NVIDIA SceniX Learn how NVIDIA SceniX is a rapid start to building state of the art, realtime 3D applications, and how raytracing can be combined with raster graphics for new levels of interactive realism. Brian Harrison, Michael Morrison NVIDIA Computer Graphics Watch now FLV    
2029 Computer Vision Algorithms for Automating HD Post-Production Discover how post-production tasks can be accelerated by taking advantage of GPU-based algorithms. In this talk we present computer vision algorithms for corner detection, feature point tracking, image warping and image inpainting, and their efficient implementation on GPUs using CUDA.  We also show how to use these algorithms to do real-time stabilization and temporal re-sampling (re-timing) of high definition video sequences, both common tasks in post-production. Benchmarking of the GPU implementations against optimized CPU algorithms demonstrates a speedup of approximately an order of magnitude. Hannes Fassold JOANNEUM RESEARCH Computer Vision Watch now FLV MP4 PDF
2065 Massively Accelerating Iterative Gauss-Newton Fitting To measure three-dimensional shape data of objects, we build up a measurement system that assigns three-dimensional coordinates to the position of projected measurement labels in a camera image. To achieve high measurement accuracy across high amounts of measurement points, we need a very quick routine to localize measurement labels with high precision. To speed up the computation, we evaluate the fits using the CUDA architecture. The final implementation speeds up the fitting of 104 two-dimensional Gauss functions by a factor of 90. Daniel Härter University of Freiburg, IMTEK,  Laboratory for Process Technology Computer Vision Watch now FLV MP4  
2114 Cascaded HOG on GPU We propose a real time HOG based object detector implemented on GPU. To accelerate the detection process, the proposed method uses two serially-cascaded HOG detectors. The first low dimensional HOG detector discards detection windows obviously not showing target objects. It reduces the computational cost of the second high dimensional HOG detector. This method tested on 640x480 color image and the same size movie. The computation time decreases to 70ms per image. That is 4 times faster than a case of single detector. This method provides real time performance even on middle end GPUs such as GeForce GTS 250. Kento Tarui AquaCast Corporation Computer Vision Watch now FLV MP4  
2123 Enabling Augmented Reality with GPU Computing This talk will take a detailed look at Sportvision's “First and 10” system, perhaps the most widely experienced example of AR ever, with 106 million viewers during the 2010 Superbowl alone. We'll examine the current implementation and the GPU features that enable low latency, video-rate performance. Ryan Ismert Sportvision, Inc. Computer Vision Watch now FLV MP4  
2132 Accelerating Biologically Inspired Computer Vision Models Join us for a discussion on applying commodity-server-based clusters and GPU-based clusters to simulating computer vision algorithms at a scale that approaches that of biological vision. We consider the limitations of each technology, survey approaches taken thus far, and suggest new hybrid models and programming frameworks to overcome current limitations and substantially improve performance. Tom Dean Google Inc. Computer Vision Watch now FLV MP4 PDF
2173 Enabling Large-Scale CCTV Face Recognition  Learn how to use CUDA and GPGPU to perform large scale face search for both forensics as well as CCTV face recognition. Abbas Bigdeli, Ben Lever NICTA Computer Vision Watch now FLV MP4  
2204 Bridging GPU Computing and Neuroscience to Build Large-Scale Face Recognition on Facebook. Biologically-inspired computer vision algorithms – those that aim to mirror the computations performed by the brain's visual system – have emerged as exceptionally promising candidates in object and face recognition research, achieving performance on a range of object and face recognition tasks. Recently, we have begun harnessing the newly-available power of NVIDIA GPUs to tackle the problem of biologically-inspired model selection within a largescale model search framework, drawing inspiration from high-throughput screening approaches in molecular biology and genetics where a large number of organisms are screened in parallel for a given property of interest.  As the available computational power provided by massively paralleltechnology from NVIDIA continues to expand, w e hope that this research will hold great potential for new social networking applications in addition to rapidly accelerating progress in artificial vision, and for generating new, experimentally testable hypotheses for the study of biological vision.
Nicolas Pinto, David Cox MIT, Harvard University Computer Vision Watch now FLV    
2209 Accelerating Computer Vision on the Fermi Architecture GPUS have evolved from fixed function to general purpose, and continue to evolve with new features being added in every generation. This talk will discuss how to exploit the new features introduced by the Fermi architecture (such as concurrent kernel execution, writes to texture) to accelerate computer vision algorithms. James Fung NVIDIA Computer Vision Watch now FLV MP4  
2215 Extending OpenCV with GPU Acceleration OpenCV is a widely popular computer vision library, with millions of downloads and hundreds of thousands of users. Applications span many industries including robotics, industrial machine vision, automotive, film & broadcast, medical, and consumer applications.  NVIDIA and the OpenCV development team are collaborating to provide CUDA implementations of the most demanding algorithms, thus enabling a new level of real-time capability and higher quality results.
This talk with introduce OpenCV, and summarize the new CUDA enabled capabilities, and provide an overview of future plans.
 
Joe Stam NVIDIA Computer Vision Watch now FLV MP4  
2242 Swarming Bacteria and Diffusing Particles: High-Throughput Analysis of Microscopic 3D Motion Ever since the 1827 discovery of Brownian motion by observing pollen grains, quantifying motion under the microscope has led to breakthroughs in physics, biology and engineering. Here, I present methods we have developed using confocal microscopy to deduce 3D structure and dynamics from 2D image sequences. We analyze the motion of diffusing colloidal particles and swarms of bacteria free to swim in 3D, which we observe at the single-organism level. We rely heavily on GPU computing to process our large data sets, making extensive use of NPP, CuFFT and optical-flow CUDA algorithms originally developed for machine vision in automobiles. Peter Lu Harvard University Computer Vision Watch now FLV    
2298 Accelerated Image Quality Assessment using Structural Similarity Explores the GPU porting and performance analysis of the image quality assessment algorithm based on structural similarity index(SSI). This index is a powerful tool for image quality assessment and the algorithm is highly suitable for GPU architecture, offering a rapid image quality assessment in many image restoration applications. Mahesh Khadtare CRL India Computer Vision Watch now FLV    
2069 GPU-Accelerated Business Intelligence Analytics Join us and learn why GPU computing is a game changer for business intelligence (BI).  We will discuss how GPUs can be used to accelerate BI analytics at much lower cost, higher performance, and better power efficiency than other alternatives. Ren Wu HP Labs Databases & Data Mining Watch now FLV MP4  
2092 Integrating CUDA into a  Large-Scale Commercial Database Management System In a large-scale database installation where data tables are distributed across multiple servers, computational throughput can be optimized by using GPUs on each server and integrating database management with GPU resources.  In the Department of Physics and Astronomy at The Johns Hopkins University, we are experimenting with a set of software tools that closely couple SQL statements with GPU functionality. While still under development, the new framework is now routinely used in our research projects, e.g., to study the spatial clustering of galaxies as well as genomics. Richard Wilton, Tamas Budavari, Alex Szalay The Johns Hopkins University Databases & Data Mining Watch now FLV MP4  
2237 Accelerating Business Intelligence Applications with Fast Multidimensional Aggregation In this research session, we present an approach using NVIDIA GPUs as massively parallel coprocessors for in-memory OLAP computations. Early tests have shown speedup factors of more than 40x compared to optimized sequential algorithms on a CPU. In addition to the data structures and algorithms involved, we describe a method to extend the approach to systems with more than one GPU in order to scale it to larger data sets. Tobias Lauer, Christoffer Anselm University of Freiburg, Jedox AG Databases & Data Mining Watch now FLV    
2013 iray - GPUs and the Photorealistic Rendering Revolution Hear about the ongoing revolution in the production of photorealistic imagery being powered by GPUs.  We will explore the algorithms and concepts behind iray – a CUDA accelerated software library from mental images/NVIDIA that provides an interactive, push-button, fast synthetic digital camera in software to a variety of OEM applications and platforms.  We will demonstrate iray embedded in commercial CAD and Digital Content Creation applications as well as in 3D cloud computing platforms. Michael Kaplan, Tamrat Belayneh mental images/NVIDIA, ESRI Digital Content Creation (DCC) Watch now FLV MP4  
2222 Working Man's Guide to 3D Video Editing Video editing is currently at two simultaneous inflections points: use of GPUs for video processing and the beginning of wide spread adoption of 3D. At this time however, identifying and navigating through the necessary tools and equipment to create compelling 3D video content is challenging.  This session is intended to provide a pragmatic guide to creating prosumer 3D video content and how the GPU greatly assists and speeds up this process. The intended audience is anyone interested in how to create compelling 3D movies at a prosumer level.
Ian Williams, Kevan O'Brien NVIDIA Digital Content Creation (DCC) Watch now FLV MP4 PDF
2279 Working Man's Guide to 3D Video Editing Video editing is currently at two simultaneous inflections points: use of GPUs for video processing and the beginning of wide spread adoption of 3D. At this time however, identifying and navigating through the necessary tools and equipment to create compelling 3D video content is challenging. This session is intended to provide a pragmatic guide to creating prosumer 3D video content and how the GPU greatly assists and speeds up this process. The intended audience is anyone interested in how to create compelling 3D movies at a prosumer level.

Ian Williams, Rudy Sarzo, Kevan O'Brien NVIDIA, SMI, NVIDIA Digital Content Creation (DCC) Watch now PDF
2305 PantaRay: Accelerating Out-Of-Core Ray Tracing of Sparsely Sampled Occlusion  Modern VFX rendering pipelines are faced with major complexity challenges: a film like Avatar requires rendering hundreds of thousands of frames, each containing hundreds of millions or billions of polygons. Furthermore, the process of lighting requires many rendering iterations across all shots. In this talk, we present the architecture of an efficient out-of-core ray tracing system designed to make rendering precomputations of gigantic assets practical on GPUs. The system we describe, dubbed PantaRay, leverages the development of modern ray tracing algorithms for massively parallel GPU architectures and combines them with new out-of-core streaming and level of detail rendering techniques.  David Luebke, Sebastian Sylwan NVIDIA, Weta Digital Digital Content Creation (DCC) Not Available      
2175 Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs In this presentation, we will talk about our experiences of implementing an end-to-end automatic speech recognition system that runs in faster than real-time on embedded GPUs, targeted towards small form-factor consumer devices. Focusing specifically on some of the challenges encountered during the design process, a major portion of our talk will focus on giving insights into modifications we made to well-established speech algorithms to fit well within the GPU programming model. We will show how these changes helped us in realizing a highly optimized system on platforms with limited memory bandwidth and compute resources. Kshitij Gupta UC Davis Embedded & Automotive Watch now FLV MP4 PDF
2303 Using Tegra to Solve The Electric Car Power Dilemma Explore how advanced SoC technologies are transforming the world of automotive industry. Learn on how using nVidia Tegra increased the available range while pushing the envelope on next-gen driver experience. Sharing the lessons learned in the world of electric cars and challenges in constructing a mass production electric vehicle. Theo Valich Bright Side Network Inc Embedded & Automotive Watch now FLV MP4 PDF
2304 Harnessing the GPU to Accelerate Automotive Development Learn how GPU technologies broke speed limits in automotive development. By using GPU-accelerated tools, small team of engineers created a complete certifiable vehicle in only two years, using fraction of the budget used in conventional industry. Talk will cover tools and techniques used in creation of XD concept, as well as how to overcome challenges moving a product from concept to mass production stage.  Theo Valich Bright Side Network Inc Embedded & Automotive Watch now FLV MP4  
2014 Scalable Subsurface Data Visualization Framework Mental Images’ DiCE-based geospatial library is a CUDA and cluster-based visualization framework that enables scalable processing and rendering of huge amounts of subsurface data for interactive seismic interpretation.
Geospatial exploration in the oil and gas industries is concerned with scanning the earth’s subsurface structure for detecting oil and for cost-effective drilling of detected oil reservoirs.  Efficient seismic interpretation requires the interpreters to be able to interactively explore huge amounts of volumetric seismic information with embedded stacked horizons to gain visual insight into the subsurface structure and to determine where oil recovery facilities and drilling infrastructure shall be built.
Tom-Michael Thamm, Marc Nienhaus mental images GmbH Energy Exploration Watch now FLV MP4  
2059 Industrial Seismic Imaging on GPUs At Hess Corporation, we have moved the most computationally intensive parts of our seismic imaging codes from CPUs to GPUs over the past few years.  In this talk I will give an overview of seismic imaging, highlighting the physical and computational algorithms of these codes.  I will discuss our software approach and the programming effort to port them to GPUs, concluding with a summary of our progress in adopting GPUs in production. Scott Morton Hess Corporation Energy Exploration Watch now FLV MP4  
2141 Moving the Frontier of Oil and Gas Exploration and Production  with GPUs Learn how the Oil and Gas Industry is embracing GPUs in order to tackle new and complex oil and gas plays around the world. The first part of this talk gives an overview of the business and geopolitical drivers of the industry, followed with the critical contribution of computation in the quest for secure supply of energy. Maurice Nessim, Shashi  Menon Schlumberger Energy Exploration Not Available      
2142 Complex Geophysical Imaging Algorithms Enabled by GPU technology Learn how computational expensive geophysical methods with 100s of TB of data become a commercial reality through the adoption of GPUs. The first part of the talk will give an overview of the computational challenges for imaging facing the oil and gas industry. The second part will show how the current most advanced methods are taking advantage of the GPU technology. David Nichols Schlumberger Energy Exploration Not Available      
2174 Reverse Time Migration on GPUs Learn how GPUs can be used to accelerate subsurface imaging for Oil & Gas exploration.  We will discuss results and lessons learned while implementing a Reverse Time Migration algorithm on GPUs achieving significant performance improvements over a comparable CPU implementation. Alex Loddoch Chevron Energy Exploration Not Available      
2226 Reverse Time Migration with GMAC Get a close look at implementing Reverse Time Migration (RTM) applications across multiple GPUs.  We will focus on how RTM applications can be scaled using the GMAC asymmetric distributed shared memory (ADSM) library to break the problem into manageable chunks.  We will provide an introduction to GMAC and discuss handling boundary conditions and using separate kernels to improve efficiency. Javier Cabezas, Mauricio Araya Barcelona Supercomputing Center Energy Exploration Watch now FLV    
2072 GPUs at the Computer Animation Studio Learn five simple ways in which GPUs have been adopted in the production pipeline at Blue Sky Studios.  Covers how we use GPUs to improve animation tools, add real-time anaglyph support, and accelerate noise functions including code samples from production tools. Hugo Ayala Blue Sky Studios Film Not Available      
2125 Developing GPU Enabled Visual Effects For Film And Video  The arrival of fully programable GPUs is now changing the visual effects industry, which traditionally relied on CPU computation to create their spectacular imagery. Implementing the complex image processing algorithms used by VFX is a challenge, but the payoffs in terms of interactivity and throughput can be enormous. Hear how The Foundry's novel image processing architecture simplifies the implementation of GPU-enabled VFX software and eases the transition from a CPU based infrastructure to a GPU based one. Bruno Nicoletti The Foundry Film Watch now FLV   PDF
2284 GPU implementation of Collision-Based Deformation Addressing the production needs for the upcoming Disney animated movie, we are in the process of developing a new Maya deformer that incorporates state-of-the-art collision-based deformations. Our deformer includes both dynamic and quasi-static solutions. Our solvers conserves volume and constrains surface area by solving linear systems in a graded volume mesh. To achieve realistic deformation in production-ready data at interactive rates, we leverage the computational power of the NVIDIA GPU architecture using CUDA. Our underlying data structure is specifically designed and optimized for CUDA (i.e. coalescing  data access, minimizing CPU-GPU interaction, utilizing shared memory). Dmitriy Pinskiy, Garrett Aldrich Walt Disney Animation Studios Film Not Available      
2285 Walt Disney Animation Studios' GPU-Acelerated Animatic Lighting Process with Soft Shadows and Depth of Field See how Walt Disney Animation's software uses OpenGL and GLSL shaders to interactively display depth of field, accurate lighting, and soft shadows in the Maya viewport. Learn how this improved our animatic process and helps us make better animated movies.

We'll show the tools in action and show the progression of a shot from standard Maya to final animatic look, and will compare the result with a production Renderman render. We'll also walk you through the GLSL shader render passes it uses to do deferred lighting and shadowing.

David Adler Walt Disney Animation Studios Film Not Available      
2032 Practical Methods Beyond Monte Carlo in Finance Murex will share its practical experience using GPUs to accelerate high-performance analytics based on GPU-enabled Monte Carlo and PDE methods.  We will also briefly describe Murex’s experience developing a high-level payoff scripting language that allows user-definable payoffs for single and cross-asset instruments. Pierre Spatz Murex SAS Finance Watch now FLV MP4 PDF
2033 Accelerating Pricing Models with virtual GPUs Join Citadel to explore our three year undertaking on the feasibility of GPGPU computing for option pricing.  We will discuss our 140X performance boost and the hurdles we had to overcome to integrate GPUs into our existing infrastructure.  Please note that our talk will not get into the details of the model (that’s proprietary information), but we will share our innovative solution to drive a grid of virtual GPUs. Scott Donovan Citadel Investment Group Finance Watch now FLV MP4 PDF
2040 Derivatives & Bond Portfolio Valuation in a Hybrid CPU/GPU Environment Learn how to compute traditional end of day computations in real time through the use of a hybrid GPU/CPU computing environment.  We will detail how computing intensive tasks are delegated to the GPU while interface issues are dealt with by the CPU.  We will discuss our methodology consisting of the following three components: (1) valuations; (2) by tenor risk measures; and (3) full distributions allowing for more complex analytics such as exotic options products valuation and counterparty value adjustments calculation. Peter Decrem Quantifi Finance Watch now FLV MP4  
2063 Banking on Monte Carlo… and Beyond Last year NAG presented spectacular results for Monte Carlo techniques on GPUs using NAG’s GPU library.  This year we will talk about new projects in the areas of Monte Carlo and PDE techniques, delivering additional benefits to the finance industry for real-world problems, including credit modeling. Ian Reid NAG Finance Watch now FLV   PDF
2064 Correlated Paths for Monte Carlo Simulations Learn how the GPU can be deployed to generated correlated paths for Monte Carlo simulation. Using Asian Basket options as an example, the session shows the generation of correlated paths with a local volatility model for each of the underlying assets. Once the paths have been computed, the payoff in each scenario is computed and reduced to determine the expected value, all on the GPU. Thomas Bradley NVIDIA Finance Watch now FLV MP4 PDF
2077 Catastrophic Risk Management:  Fast and Flexible with GPU Analytics RMS will describe our experience leveraging GPUs and simple software architectural principles to deliver both spectacular performance gains and enhanced flexibility in next generation portfolio risk management applications. Philippe Stephan RMS Finance Watch now FLV MP4  
2098 Enabling On Demand Value-At-Risk for Financial Markets Learn how financial market risk managers can increase their ability to preempt exposure limit breaching and tighten risk control to increase investor confidence. Gain insight into the techniques for obtaining high performance Monte-Carlo based market value-at-risk (VaR) estimates over a hierarchy of risk aggregation levels. This session will focus on how the new Fermi platform can be used by financial institutions to enable on-demand estimates of the market VaR, and discuss important software architecture decisions, the benefits of the new GigaThread Engine and Parallel DataCache, as well as the guiding principles for constructing efficient algorithms on GPUs. Matthew Dixon, Jike Chong UC Davis, Parasians, LLC Finance Watch now FLV MP4  
2101 Pricing American Options Using GPUs This presentation focuses on the challenging problem of Pricing High-Dimensional American Options (PHAO)

and how GPUs can be involved in this task. On the one hand, we present a method based on Malliavin calculus which is effective for parallel architecture. On the other hand, we compare this method with Longstaff & Schwartz method which is more dedicated to sequential architecture.  We will conclude with some ideas about the parallelization of the former method on a cluster of machines and finally we will discuss this method considering it as a reformulation of a non-linear parabolic problem using BSDEs.
Lokman A. Abbas-Turki Paris-Est University  Finance Watch now FLV MP4  
2081 Morphing a GPU into a Network Processor Modern Internet routers must meet two conflicting objectives, high performance and good programmability, to satisfy the ever-increasing bandwidth requirements under fast changing network protocols. A few recent works prove that GPUs have great potential to serve as the packet processing engine for software routers. However, current GPU’s batched execution model cannot guarantee quality-of-service (QoS) requirement. In this work, we show how to convert a GPU into an effective packet processor through minimal changes in both hardware architecture and scheduling mechanism. Experimental results proved that the new GPU architecture could meet stringent QoS requirements, but maintain a high processing throughput. Yangdong Deng Tsinghua University General Interest Watch now FLV MP4  
2214 Faster Simulations of the National Airspace System  Learn about twenty-four hour, fast-time simulations of traffic in the National Airspace System, which use GPU technology to help perform key steps in the trajectory prediction of flights.  GPUs enabled us to improve the runtime by up to two orders of magnitude versus the previously required tens of minutes per execution.  We will present a brief overview of the problem domain and a description of how the GPU has opened doors to uncharted research areas. Joseph Rios NASA General Interest Watch now FLV MP4 PDF
2223 Academic Welcome Social and Poster Preview This session is open to academic attendees only.  We invite you to join your fellow academics to preview this year’s NVIDIA Research Summit Posters and mingle with your colleagues.   Included will be a special presentation from our 2010-2011 Graduate Fellowship recipients to showcase the research that earned them this prestigious award.  These students were selected from 268 applications in 28 countries. Their research confronts a variety of challenges of immense technical and strategic importance, including light-transport simulation, computer vision, programmability and optimization for heterogeneous systems, and much more.  We believe that these minds lead the future in our industry. Ken Pimentel Autodesk General Interest Watch now FLV MP4  
2262 CUDA Centers of Excellence Super-Session I  Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering.  In this session we will hear from Professor Hanspeter Pfister of Harvard University and Professor Jeff Vetter of Georgia Tech and Oak Ridge National Laboratory.  Hanspeter Pfister, Jeffrey Vetter Harvard University, Georgia Tech and ORNL General Interest Watch now FLV MP4  
2263 CUDA Centers of Excellence Super-Session II Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wei Ge at the Chinese Academy of Science, Professor Amitabh Varshney at the University of Maryland, and Adjunct Assistant Professor Stan Tomov at the University of Tennessee – Knoxville.  Stan Tomov, Amitabh Varshney, Wei Ge University of Tennessee, University of Maryland, Institute of Process Engineering, Chinese Academy of Sciences General Interest Watch now FLV MP4  
2264 CUDA Centers of Excellence Super-Session III Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wen-mei Hwu at the University of Illinois at Urbana – Champaign, Professor Yangdong Deng at Tsinghua University and Dr. Charles D. Hansen at the University of Utah.  Yangdong Deng, Charles Hansen, Wen-mei Hwu Tsinghua University, University of Utah, University of Illinois General Interest Watch now FLV MP4  
2265 CUDA Centers of Excellence Super-Session IV Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering.  In this session we will hear from Professor Ting-wai Chiu at National Taiwan University, Dr. Satoshi Matsuoka at Tokyo Tech and Dr. Paul Calleja at the University of Cambridge.  Paul Calleja, Ting-Wai Chiu, Satoshi Matsuoka University of  Cambridge, National Taiwan University, Tokyo Institute of Technology General Interest Watch now FLV MP4  
2268 Think Data-Parallel!  Building Data-Parallel Code with M     Discover and leverage parallelism inherent in pre-existing codes.  Often times, parallelism is hidden in seemingly serial programs.  This is due obfuscation via indexing or looping wherein the parallelism is seemingly non-existent.  Several real-world examples of seemingly serial code demonstrate simple, yet surprisingly effective rules for detecting potential parallelism.  For each example, learn how to express the code at a higher, more concise level in M by vectorizing computations.  We give several canned techniques of vectorization for many common, and sometimes very difficult, use cases.  Learn how such vectorization concisely brings the parallelism of code to the forefront and transforms programs that might have been originally difficult to run on a SIMT device very suitable for execution on the GPU.  GPU speedups will be shown utilizing Jacket.

Gallagher Pryor AccelerEyes General Interest Watch now FLV MP4  
2275 The Evolution of GPUs for General Purpose Computing Learn how the GPU evolved from its humble beginning as a “VGA Accelerator” to become a massively parallel general purpose accelerator for heterogeneous computing systems.  This talk will focus on significant milestones in GPU hardware architecture and software programming models, covering several key concepts that demonstrate why advances in GPU parallel processing performance and power efficiency will continue to outpace CPUs. Ian Buck NVIDIA General Interest Watch now FLV   PDF
2276 Using GPUs to Run Next-Generation Weather Models We are using GPUs to run a new weather model being developed at NOAA’s Earth System Research Laboratory (ESRL) called the Non-hydrostatic Icosahedral Model (NIM). NIM is slated to run at high resolution (4km global scale) within two years. This presentation will highlight work required to parallelize and run the NIM. We will describe progress running on multiple GPUs, report on our evaluation of two FORTRAN GPU compilers, and give performance updates of NIM using Fermi.  We will also discuss special challenges developing and running operational weather models on GPUs. Mark Govett NOAA Earth System Research Laboratory General Interest Watch now FLV MP4  
2306 Gate-Level Simulation with GP-GPUs Logic simulation is a critical component of the digital design tool flow. It is used from high-level descriptions down to gate-level to validate several aspects of the design, particularly functional correctness. Despite development houses investing vast resources in the simulation task it is still far from achieving the performance demands of validating complex modern designs at gate-level. We developed a GP-GPU accelerated gate-level simulator using NVIDIA CUDA. 

We leverage novel algorithms for circuit netlist partitioning and found that our experimental prototype could handle large, industrial scale designs comprised of millions of gates while delivering 13x speedup on average over a typical commercial simulator.

Debapriya  Chatterjee  University of Michigan General Interest Watch now FLV    
2309 Greater ROI with Green GDDR5 and LPDDR2 High-end graphics memory has been an essential ingredient in designing PC cards for many years, just as mobile DRAM has been a part of virtually all mobile devices since they were first developed. In the face of increasing upward pressures on power consumption, Green GDDR5 and Low Power mobile DDR2 (or LPDDR2) provide outstanding performance at exceptionally low power levels, for a greater return on investment in designing desktop and mobile devices, respectively. This Samsung presentation will provide an overview of Green GDDR5’s and Green LPDDR2’s power savings compared to other much less energy efficient alternatives. The presenter also will take a close look at how GDDR5 and LPDDR2 work to improve performance and extend battery life, while helping to substantially reduce electricity usage worldwide.  Jimmy Chung Samsung Semiconductor Inc. General Interest Watch now FLV    
2019 GPU-Accelerated Internet Technologies & Trends Join us for a whirlwind demo-punctuated tour of up-and-coming technologies that promise to bring GPU acceleration to the Worldwide Web.  We'll cover 2D graphics, 3D graphics and video.  In addition to summarizing the emerging standards and technologies, performance test results showing how they scale on various GPUs will be presented, along with recommendations for how to design for best performance.  Finally, adoption trends and ecosystem dynamics will be summarized.  Attendees should leave with a richer understanding of the possibilities enabled by the GPU-Accelerated Web, and new insights into when and how it will matter.  Chris Pedersen NVIDIA GPU Accelerated Internet Watch now FLV MP4 PDF
2060 GPUs in a Flash:  Mapping the Flash Animated Software Vector Rendering Model to the GPU Explore the Flash rendering architecture including the challenges of mapping from an animated software vector rendering model to a GPU.  We will also discuss how the landscape of mobile, desktop, devices, drivers, and APIs impacts the design and deployment of a GPU based Flash Player. Lee Thomason Adobe Systems GPU Accelerated Internet Watch now     PDF
2113 WebGL: Bringing 3D to the Web WebGL is a newly-emerging standard for 3D graphics and visual computing on the web.  Supported and developed by major web browser vendors, WebGL enables rich interactive 3D graphics delivered through a web browser, on both desktop and mobile platforms.  This session will contain an introduction to WebGL, and will focus application development issues unique to the web platform, optimization concerns, and how web technologies such as offline app support, HTML5 video and audio, File and WebSockets integrate with WebGL.  Experienced OpenGL developers will learn how to transition their knowledge to WebGL development. Vladimir Vukicevic Mozilla Corporation GPU Accelerated Internet Watch now FLV MP4  
2274 Harnessing the Power of the GPU in Internet Explorer 9 Internet Explorer 9 is bringing the power of modern GPUs to Web. Thanks to hardware accelerated graphics, the websites that you use every day become faster and developers can create new classes of web applications which were previously not possible. This session will provide an inside look into how Internet Explorer was redesigned to leverage the GPU. We’ll show detailed performance results, discuss our architectural approach, and look at the impact of the GPU on HTML5. A session by engineers for engineers with lots of fun demos. Jason Weber Microsoft GPU Accelerated Internet Watch now FLV MP4  
2017 Lessons Learned Deploying the World’s First GPU-Based Petaflop System Learn what to expect when deploying PetaFLOP or larger systems.  The June 2010 list of the Top 500 computer systems featured the first GPU based cluster to exceed 1 PetaFLOP of foating point power -- a system that was built in a fraction of the time and the cost a CPU-only system of that performance would have required.  An overview of how system builders and administrators should prepare for large-scale HPC deployments. Dale Southard NVIDIA High Performance Computing Watch now FLV MP4 PDF
2052 Power Management Techniques for Heterogeneous Exascale Computing Power consumption has become the leading design constraint for large scale computing systems. In order to achieve exascale computing, system energy efficiency must be improved significantly. Our approach will focus on investigating software methodologies to achieve energy efficient computing on heterogeneous systems accelerated with GPUs.  Xiaohui Cui Oak Ridge National Laboratory  High Performance Computing Watch now FLV MP4  
2057 CUDA-Accelerated LINPACK on Clusters  This talk will illustrate the use of GPUs to accelerate the LINPACK benchmark on  clusters with GPUs, where both the CPUs and the GPUs are used in synergy.  The acceleration is obtained executing DGEMM (matrix multiply) and DTRSM (for the solution of triangular systems) calls simultaneously on both GPU

and CPU cores.  Details of the implementation will be presented together with results that shows how effective the solution is, both for performance and power efficiency.
Everett Phillips, Massimiliano Fatica NVIDIA High Performance Computing Watch now FLV MP4 PDF
2089 Analyzing CUDA Accelerated Application Performance at 20 PFLOP/s Learn how applications can be executed over multiple GPUs located in multiple hosts, what the challenges are to scale one application to a 20 PFLOP/s machine and why tool support is a necessity. Receive an overview on the available performance analysis tools that support CUDA developers in generating applications with outstanding speedups. Guido Juckeland, Jeremy Meredith TU Dresden - ZIH, Oak Ridge National Laboratory High Performance Computing Watch now FLV MP4  
2100 Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems Large linear algebra problems may be solved using recursive block decomposition in which GPUs efficiently compute the sub-blocks and multicore CPUs put the sub-blocks back together within a large shared memory space.  This talk will present benchmark results for such a hybrid approach, implemented in Matlab® and using Jacket® to access the GPU compute power. Nolan Davis SAIC High Performance Computing Watch now FLV MP4  
2104 Rapid Prototyping Using Thrust:  Saving Lives with High Performance Dosimetry Radiation poisoning is an everpresent danger for intervention teams that must visit nuclear sites. Virtual reality can help teams prepare for intervention, but efficient computation of radiation dosage is critical to study complex scenarios. Radiation protection research often uses codes based on the straight line attenuation method. As with other approaches, geometrical computations (finding all the interactions radiation rays/objects intersection) remain the simulation bottleneck. This talk will describe how we have used the Thrust high-level library for CUDA C/C++ to quickly prototype innovative algorithms and achieve a significant speed up. Guillaume Saupin Atomic and Alternative Energies Commission (CEA) High Performance Computing Watch now FLV MP4  
2117 Migration of C and Fortran Apps to GPGPU using HMPP GPGPU is a tremendous opportunity to many application fields.  Migrating legacy software to GPGPU is a complex process that requires mastering the technological risks (e.g. loss of code portability, extensive code restructuration, debugging complexity) as well as costs. In this talk, we present a methodology based on HMPP (Heterogeneous Multicore Parallel Programming), allowing incremental processes that reduce the cost and risks of porting codes to GPGPU. Francois Bodin CAPS entreprise High Performance Computing Watch now FLV MP4  
2119 Supercomputing for the Masses: Killer-Apps, Parallel Mappings, Scalability and Application Lifespan  Hear the latest on how supercomputing for the masses is changing the world.  We will look at some of the one- to three-orders of magnitude faster killer apps and see how they do it.  We will discuss specific mapping to GPGPU hardware and techniques for high performance and near-linear scalability both within and across multiple GPGPUs.  We will also consider software investment and the decades long longevity of some successful massively parallel Investments in multithreaded software, scalability, balance metrics, lack of consensus on programming models, and lifecycle considerations. Robert Farber PNNL High Performance Computing Watch now FLV MP4 PDF
2133 3D Full Wave EM Simulations Accelerated by GPU Computing 3D Full Wave Electromagnetic simulations of RF components, antennas, printed circuit boards, can be quite time consuming.  Computer Simulation Technology (CST) toolsuite includes the capability to activate GPU Computing.   Examples will be shown of using Tesla C1060 and S1070 configurations to provide significant performance improvement of complex simulations. Fabrizio Zanella CST of America High Performance Computing Watch now FLV MP4 PDF
2135 Processing Petabytes per Second with the ATLAS experiment at the Large Hadron Collider at CERN Learn how GPUs could be adopted by the ATLAS detector at the Large Hadron Collider (LHC) at CERN. The detector, located at one of the collision points, must trigger on unprecedented data acquisition rates

(PB/s), to decide whether to record the event, or lose it forever. In the beginning, we introduce the ATLAS experiment and the computational challenges it faces. The second part will focus on how GPUs can be used for algorithm acceleration - using two critical algorithms as exemplars. Finally, we will outline how GPGPU acceleration could be exploited and incorporated into the future ATLAS computing framework.

Philip Clark, Andrew Washbrook University of Edinburgh High Performance Computing Watch now FLV   PDF
2138 Faster, Cheaper, Better – Hybridization of Linear Algebra for GPUs Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures. Stan Tomov, Hatem Ltaief University of Tennessee High Performance Computing Watch now FLV MP4  
2147 GPGPU Development for Windows HPC Server Attend this demo-driven session to see how to schedule jobs to a Windows compute cluster that includes GPUs.  We will also demonstrate GPU-enhanced versions of some commonly used HPC open-source codes, and show how NVIDIA Parallel Nsight™ can be used to debug GPU applications on a cluster.  Provides a brief introduction to performance profiling tools that allow developers to analyze system, CPU and GPU events. Calvin Clark Microsoft High Performance Computing Not Available   MP4  
2153 CULA - A Hybrid GPU Linear Algebra Package Get the latest information on CULA, an library of hybrid GPU/CPU linear algebra routines optimized for NVIDIA GPUs. CULA launched at GTC2009 and has since received large speedups and many new features. We will cover all the features, performance, inner workings, and how users can integrate CULA into their applications. New features for 2010 and 2011 will be in the spotlight, with exciting new developments for sparse matrices including general direct sparse solvers, iterative sparse solvers, and specialized block tridiagonal solvers. Learn how your existing linear algebra applications can benefit from a high quality library. Much more information is available at www.culatools.com and at our presentation and booth. John Humphrey EM Photonics, Inc High Performance Computing Watch now FLV MP4  
2154 GPU Military Applications: Image Processing, Embedded Computing, and CFD Discover how different branches of the U.S. military are utilizing GPU accelerated solutions in mission-critical operations. This session will detail GPU-related projects that the engineers at EM Photonics have developed specifically for military applications. An image processing example will discuss how GPUs are being used to accelerate long-range battlefield surveillance to protect soldiers. Other military examples include low-power embedded GPU solutions utilized by UAVs and CFD simulations used to model complex interactions between vehicles at sea.   EM Photonics, Inc. High Performance Computing Watch now FLV    
2205 A Highly Reliable RAID System Based on GPUs While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance. Matthew Curry Sandia National Laboratories and the University of Alabama at Birmingham High Performance Computing Watch now FLV MP4  
2208 Acceleration of SIMULIA’s Abaqus Solver on NVIDIA GPUs Learn about Acceleware's and Dassault Systemes' integrated solution that performs an LDL^T factorization on GPUs within the Abaqus software package.  We will discuss efficient GPU parallelization of the factorization algorithm and enabling the CPU and GPU to overlap their computations and data transfers.  Includes an end user simulation case study and GPU performance measurements including 300 GFlops in single precision and 145 GFlops in double precision on NVIDIA Tesla C2050. Chris Mason Acceleware High Performance Computing Watch now FLV MP4 PDF
2217 GPU-Based Conjugate Gradient Solvers for Lattice QCD Learn how to perform state-of-the-art quantum chromodynamics (QCD) computation using NVIDIA GPUs at 1% of the cost of a conventional supercomputer and 10% of its power consumption.  We will discuss how physicists around the world are using GPU clusters to solve QCD.  We will focus upon how TWQCD have been using a large GPU cluster (200 GPUs) to simulate QCD, attaining 36 Teraflops (sustained). Ting-Wai Chiu National Taiwan University High Performance Computing Watch now FLV    
2232 What If You Had a Petabyte of Memory and/or a Petaflop of Compute? (Sponsored by SGI) We will explore application spaces where GPU compute coupled with very large shared memory architectures and/or petaflops of compute are allowing new science or new business questions to be addressed. Bill Mannel SGI High Performance Computing Watch now FLV    
2233 Solving Your GPU Computing Needs (Sponsored by HP) In this session we will go into detail and you will learn about HP’s GPU enabled systems, from  Workstations to our GPU enabled servers and clusters.  You will get the latest information on  configurations, options, GPU management and  use cases. Dave Korf, Will Wade HP High Performance Computing Not Available FLV MP4  
2238 Better Performance at Lower Occupancy It is usually advised to optimize CUDA kernels for higher occupancy to hide memory and arithmetic latencies better. In this presentation, I show that increasing occupancy is not the only way and not always the best way to hide latency on GPU. Instead, it may be

advantageous to rely on the parallelism within threads-instruction-level parallelism. This insight yields a simple  optimization technique that is used in later versions of CUBLAS and CUFFT. I discuss the rationale behind the technique and illustrate it by speeding up matrix multiplication, starting with the basic implementation found in the NVIDIA GPU Computing SDK.
Vasily Volkov UC Berkeley High Performance Computing Watch now FLV   PDF
2240 Accelerating LS-DYNA with MPI, OpenMP, and CUDA When solving implicit problems, the computational bottleneck in LS-DYNA is the multifrontal linear solver.  These operations are performed with double precision arithmetic, hence until the arrival of the Tesla 2050, experiments with GPU acceleration were only a curiosity.  This is no longer the case, and in this talk we will describe how LS-DYNA's hybrid (MPI and OpenMP) solver is further accelerated using GPUs to factor large dense frontal matrices. Bob Lucas USC High Performance Computing Watch now FLV    
2247 Reconfiguring a Pool of GPUs on The Fly (Sponsored by NextIO) Today’s HPC applications break down large data set problems into smaller, independent elements solved by massively parallel processor systems. GPU’s as co processing devices are optimized for this task and their popularity in technical computing is rapidly advancing. Like many rapidly advancing technologies, they leave in their wake new and challenging problems. In the effort to cut costs while increasing performance, damaging ripple-effects can occur; resources can be over or under provisioned, inventory difficult to manage, lots of single points of failure mean constant job interruptions, manual reconfiguration of resources are required for each job, servicing and lifecycle management require outages. Most of these problems can be addressed and overcome by combining GPU resources into managed, structured pools. NextIO will present and demonstrate a new and innovative approach to consolidating and managing pools of NVIDIA GPU resources along with the cost and operational savings benefits associated with top of rack GPU consolidation appliances. K.C. Murphy NextIO High Performance Computing Watch now FLV MP4  
2248 Parallel Processing on GPUs at the University of Utah The University of Utah is a CUDA Center of  Excellence. We have been doing both basic and applied research using CUDA. In this session, we plan to give 3-4 talks on ongoing research. Most of the work that we will be presenting has been peered reviewed at top conferences. Huy Vo, Claudio Silva University of Utah High Performance Computing Watch now FLV    
2270 Appro’s GPU Computing Solutions Learn how GPU’s are changing the High Performance Computing landscape to deliver price/performance levels that were previously considered unachievable. Join Appro (http://www.appro.com), a leading provider of supercomputing solutions; to discuss the introduction of the Appro Tetra server, the most powerful GPU server available today in a 1U form factor and the availability of a new modular GPU expansion blade, both based on NVIDIA Tesla 20-series GPUs. The availability of these two products is a confirmation of Appro’s commitment in providing the most innovative and powerful computing platforms at very attractive prices to the High Performance Computing markets.  John Lee Appro High Performance Computing Watch now FLV MP4  
2273 GPUs In the Front Line of our Defenses (Sponsored by GE) Find out how GPUs are accelerating defense and aerospace applications and providing superior information processing to drive the next generation of capabilities to protect both homelands and soldiers.  Learn how rugged VPX hardware and software architectures are able to scale from small power- & weight-constrained vehicles through to large complex processing arrays, on platforms as diverse as unmanned aerial vehicles (UAV), through tracked ground vehicles, and to ship borne radar. Simon Collins GE Intelligent Platforms High Performance Computing Watch now FLV    
2280 TSUBAME2.0 Experience Tsubame2.0 is the next-generation multi-petaflops supercomputer that  been designed and built at Tokyo Tech, with more than 4000 NVIDIA Fermi GPUs. as a successor to the highly successful Tsubame1. Deep design considerations were made based on experiences on Tsubame1 retrofitted with the previous generation Tesla to maximize the versatility and the competitiveness of the system across considerable number of application domains, as well as accommodating as much strong scaling as possible.  This resulted in a totally new custom system design in collaboration with HP and NEC, rather than a machine with a retrofitted GPUs. The resulting supercomputer hopefully will become a design template of future large-scale GPU systems to come. Satoshi Matsuoka Tokyo Institute of Technology High Performance Computing Watch now FLV    
2283 500 Teraflops Heterogeneous Cluster HPC Affiliated Resource Center (ARC) will be host of a very large interactive HPC.  The large cluster (CONDOR) will integrate cell broadband engine processors, GPGPUs and powerful x86 server nodes, with a combined capability of 500 Teraflops.  Applications will include neuromorphic computing, video synthetic aperture radar backprojection, matrix multiplications, and others.  This presentation will discuss progress on performance optimization using the Heterogeneous Cluster and lessons learned from this research. Mark Barnell Air Force Research Lab (AFRL) High Performance Computing Watch now FLV MP4 PDF
2286 Towards Peta-Scale Green Computation - Applications of the GPU Supercomputers in the Chinese Academy of Sciences (CAS) China now holds three spots in the June 2010 Top500 list of GPU-based supercomputers, and two of them, using NVIDIA GPUs, are related to CAS. Efficient use of these systems is more important than peak or Linpack performance. This session will cover some of the large-scale multi-GPU applications in CAS, ranging from molecular dynamics below nano-scale to complex flows on meter-scale and porous media on geological scales, as well as fundamental linear algebra and data/image analysis. The idea of keeping high-efficiency and generality of the computation platform by maintaining a consistency among the target physical system, the computational model and algorithm, and the computer hardware will be explained in detail and demonstrated through a number of super-computing applications in the chemical, oil, mining, metallurgical and biological industries. Wei Ge, Xiaowei Wang, Yunquan Zhang, Long Wang Institute of Process Engineering, Chinese Academy of Sciences, Institute of Process Engineering, Institute of Software, CAS, Super Computing Center, Institute of Computer Network Information of CAS High Performance Computing Watch now FLV    
2287 Internal GPUs on Dedicated x16 Slots - Are They Needed For HPC? (Sponsored by Dell) We have benchmarked the real performance impact on a series of GPU accelerated applications to understand the benefits and drawbacks of different system level configurations.  Come hear about the effects on performance of GPUs in shared slots and of GPUs that are externally connected. Mark Fernandez Dell High Performance Computing Watch now FLV    
2293 Scaling Up and Scaling Out GPUs with Supermicro's Twin™  Architecture (Sponsored by Supermicro) Find out how Supermicro scales up and scales out GPU performance by using Twin™ architecture.  In this session, we outline Supermicro's Twin™ architecture advantages across 1U/2U GPU servers and the design of personal supercomputer, and how we are able to scale and optimize GPU technology for datacenter environment and for professional workstation. Don Clegg Super Micro Computer, Inc. High Performance Computing Watch now FLV    
2301 GPU Cluster Computing: Accelerating Scientific Discovery We propose holding a research roundtable focussed on using GPU clusters to support scientific research. The roundtable will bring together researchers that have recently deployed or are interested in deploying GPU clusters to enable scientific research.  At the research roundtable they will be able to share their experiences in deploying this new technology and discuss the future of this technology in supporting research to tackle the world’s most challenging scientific problems.

To open discussion we will provide a brief presentation about deployment of the CSIRO's latest supercomputer cluster, which is among the world's first to combine traditional CPUs with more powerful NVIDIA GPUs, that is providing a world class computational and simulation science facility to advance priority CSIRO science.

John Taylor CSIRO High Performance Computing Watch now FLV    
2302 Microsoft Technologies for High Performance Computing (Sponsored by Microsoft) NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, we will expand on the computational power of Visual Studio 2010, Windows HPC Server and the Technical Computing Libraries and show how to increase your performance.   Calvin Clark Microsoft High Performance Computing Not Available      
2051 GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative Suite Learn about leveraging GPUs for commercial software.  We will discuss lessons learned creating and using the Adobe Image Foundation libraries to accelerate image and video processing using GPUs and multi-core.  These libraries are used by most of Adobe's applications as well as integrated by hobbyist and professional applications with different levels of experience with GPUs and diverse user bases. Kevin Goldsmith Adobe Systems, Incorporated Imaging Watch now FLV MP4 PDF
2093 Computational Photography: Real-Time Plenoptic Rendering Get the latest information on GPU-based plenoptic rendering including a demonstration of refocusing, novel view generation, polarization, high dynamic range, and stereo 3D.  Learn how GPU hardware enables plenoptic rendering tasks with high-resolution imagery to be performed interactively, opening up entirely new possibilities for modern photography. Andrew Lumsdaine, Georgi Chunev, Todor Georgiev Indiana University, Indiana University, Adobe Systems Imaging Watch now FLV MP4 PDF
2145 Photo Editing on the GPU with MuseMage See how MuseMage greatly accelerates image processing and editing while providing real-time feedback by harnessing the power of GPUs.  We will discuss the majority of MuseMage tools which are fully implemented on GPUs. Kaiyong Zhao, Yubo Zhang HKBU, UC Davis Imaging Watch now FLV MP4 PDF
2300 High-Performance Compressive Sensing using Jacket This talk will present the ongoing work that I am doing in the L1-optimization group at Rice Universtiy. The purpose of the work is to merge both compressive sensing, for image/signal reconstructions and GPU computation, using NVIDIA’s GPUs to enhance the technology of CS.
This talk will cover basic concepts in compressive sensing and the easy adaptation of operating on the GPU, in particular working with Jacket (by AccelerEyes). We willthen cover some of our numerical experiments that encompass the use of different flavors of algorithms.
Nabor Reyna Rice University Imaging Watch now     PDF
2007 Folding@home: Petaflops on the Cheap Today; Exaflops Soon? Learn how Folding@home has used petascale computing with GPUs to make fundamental breakthroughs in computational biology and how this technology can make an impact in your work. Vijay Pande Stanford University Life Sciences Watch now FLV    
2030 High-Throughput Cell Signaling Network Learning with GPUs Explore how GPUs are being used to enable high-throughput cell signaling network discovery and data-intensive computational systems biology more generally. Systems biology is transitioning from a largely reductive discipline to one focused on building predictive models of large-scale biological systems. New instrumentation will provide the necessary raw data for such an approach, the key challenge now is building the hardware and software tools to efficiently and interactively build these models. This session will describe how GPUs can and will play a key role in these efforts. Michael Linderman Stanford University Life Sciences Watch now FLV MP4  
2034 Reformulating Algorithms for the GPU  Important applications in signal, data processing and bioinformatics that use dynamic programming are difficult to parallelize due to intrinsic data dependencies.  We demonstrate a novel technique to extract parallelism out of data dependent algorithms and reformulate the same for GPUs. This simple technique breaks the dependencies and resolves them at an optimal point later in time, thus obtaining remarkable speedup on GPUs. We present a case study from computational biology i.e., protein motif-finding. We also present how the same technique can be extended and applied to other relevant problems such as gene-prediction and phylogenetics.
Narayan Ganesan, Michela Taufer University of Delaware Life Sciences Watch now FLV MP4  
2055 Application of Fermi GPU to Flow Cytometry and Cancer Detection Learn how a Tesla C2050 enabled scientists to explore cancer data sets 400 times faster than a PC-only implementation.  Discusses how the results of this work may lead to better diagnostics for detecting leukemia in blood cells. Robert Zigon Beckman Coulter Life Sciences Watch now FLV MP4 PDF
2088 Nucleotide String Matching Using CUDA-Accelerated Agrep Dive deep into the intelligent utilization of various CUDA memory spaces to remarkably speedup approximate DNA/RNA nucleotide sequence matching algorithm in bioinformatics by an amazing factor of 67 compared to multi-threaded quad core CPU counterpart. Our talk provides a very good example to demonstrate how to use indexable array to save frequently updated variables directly into GPU registers, how to organize shared memory into a 2D array to avoid bank conflict, and how to shuffle the data structure to satisfy the requirement for coalesced global memory access. Our CUDA implementation employs online approach and can be applied in real time. Hongjian Li The Chinese University of Hong Kong Life Sciences Watch now      
2105 CUDA-FRESCO: An Efficient Algorithm for Mapping Short Reads Learn about CUDA-FRESCO and how it addresses issues with MUMmerGPU.  We will detail how CUDA-FRESCO overcomes MUMmerGPU's problems processing reads with errors or mismatches and delivers additional performance beyond MUMmerGPU's 5-12x speedup with less than 100bp query length. Chun-Yuan Lin Department of CSIE, Chang Gung University Life Sciences Watch now FLV MP4  
2115 Modified Smith-Waterman-Gotoh Algorithm for CUDA Implementation It is axiomatic that computational throughput can be increased by exploiting the parallelism of GPU hardware –- but what if the computational algorithm is not easy to implement in parallel?  We have modified one such algorithm -– the Smith-Waterman-Gotoh dynamic programming algorithm for local sequence alignment -– so as to make it more amenable to data-parallel computation.  The result is a successful CUDA implementation that fully exploits GPU parallelism. Richard Wilton The Johns Hopkins University Life Sciences Watch now FLV MP4  
2172 Unveiling Cellular & Molecular Events of Cardiac Arrhythmias  George Mason University is using CUDA technology to get a 20x speed-up in simulations of intracellular calcium dynamics, thought to play a major role in the generation of cardiac arrhythmias.  We will discuss the novel algorithms we have developed for Markov Chain Monte Carlo Simulation and their use in investigating elementary events of calcium release in the cardiac myocyte.  The resulting extremely fast simulation time has generated new insights into how defects in the control of intracellular calcium may lead to cardiac arrhythmia. Tuan Hoang-Trong George Mason University Life Sciences Watch now FLV MP4 PDF
2203 Modeling Evolution Computing the Tree of Life Learn how GPUs are being used to accelerate our understanding of the tree of life. This session will cover BEAGLE, which is an open API and library for evaluating phylogenetic likelihoods of biomolecular sequence evolution. BEAGLE uses novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on GPUs, making use of the large number of processing cores to efficiently parallelize calculations. Daniel Ayres University of Maryland Life Sciences Watch now FLV MP4  
2046 Efficient Automatic Speech Recognition on the GPU Learn about how the GPU is able to meet the challenges of implementing automatic speech recognition (ASR), gain insights into the data-parallel implementation techniques that can provide 10x faster performance compared to sequentially processing ASR on a CPU. The state-of-art algorithm for ASR performs a graph traversal on a large, irregular graph with millions of states and arcs, guided by speech input only known at runtime. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on a GPU. Jike Chong Parasians, LLC Machine Learning & Artificial Intelligence Watch now FLV MP4 PDF
2091 The GPU in the Reactive Control of Industrial Robots Universal Robotics is using GPUs for real-time visual sensing in the reactive control of industrial robots.   For a robot to work in a complex dynamic environment  to achieve a more loosely specified goal, such as moving arbitrary boxes from a pallet to a conveyor, requires reactivity.  Reactive control requires intensive, concurrent, low-latency computation for motion planning, exception handling, and sensing.  We describe and demonstrate how GPU-based computation enables visual servoing and box moving.  We also discuss the potential of the GPU to solve more difficult sensory problems such as multi-robot cooperation, multimodal sensor binding, attention, sensitization, and habituation. Dr.Alan Peters Universal Robotics, Inc. Machine Learning & Artificial Intelligence Watch now FLV MP4  
2207 Playing Zero-Sum Games on the GPU A Zero-Sum game is a match for which the gain of one results in loss of the other. Tic-Tac-Toe, Checkers and Chess are Zero-Sum board game examples. For realizing the best player move, the game is abstracted as a tree, often quite deep,  consisting of all possible configurations. We present an efficient GPU implementation of the Mini-Max search algorithm, enhanced with Alpha-Beta pruning. We highlight challenges for deploying non-tail recursion of a highly irregular algorithm on GPUs, proposing a hybrid of compiler and user managed stack. We demonstrate superior performance for running many thousands of 3D Tic-Tac-Toe matches, simultaneously. Avi Bleiweiss NVIDIA Corporation Machine Learning & Artificial Intelligence Watch now FLV   PDF
2001 Acceleration of the Freesurfer Suite for Neuroimaging Analysis See how GPU technology has dramatically accelerated the Freesurfer suite of tools used by thousands of researchers for the analysis of neuroimaging data. Richard Edgar Mass. General Hospital Medical Imaging & Visualization Watch now FLV MP4 PDF
2009 4D Visualization and Analysis of Flow 4D flow or vector data is now common in CFD simulations as well as acquisition techniques like 4D flow MRI to study abnormal blood flow patterns. We show how by mixing compute and graphics combined with stereo we are now able to interactively analyze and visualize the resulting data to understand abnormal flow patterns. Topics include flow field rendering, computing derived quantities, merging volumetric rendering with computed geometry such as particles and surfaces, and integration 3d vision stereo. Shalini Venkataraman NVIDIA Medical Imaging & Visualization Watch now FLV MP4 PDF
2036 Algorithms for Automated Segmentation of Medical Imaging Studies Utilizing CUDA Discover how GPU computing can help doctors make sense of modern imaging studies.  This session is intended for a general audience as well as medical informatics specialists.  The focus will be on algorithmic approaches to segmentation as it pertains to CTA (computed tomography angiography) studies.  Topics covered will include specialized optimization algorithms and novel lumen tracking methodologies.   Supratik Moulik University of Pennsylvania Medical Imaging & Visualization Watch now FLV MP4 PDF
2094 Nearly Instantaneous Reconstruction for MRIs GE’s Autocalibrating Reconstruction for Cartesian Imaging (ARC) is a computationally intensive, widely used algorithm in MRI Reconstruction using Parallel Imaging.  We demonstrate that an optimized CUDA implementation of ARC on a GPU can enable nearly instantaneous reconstruction and speedups of up to 10x over an optimized dual socket QuadCore CPU implementation. We will discuss challenges both with computational intensity and data read/write efficiency. We will also compare the Fermi C2050 with the C1060. Srihari Narasimhan GE Global Research Medical Imaging & Visualization Watch now FLV MP4  
2096 High-Speed CT Reconstruction in Medical Diagnosis & Industrial NDT Applications We present the software platform CERA developed by Siemens, which utilizes (multiple) graphics processing units (GPUs) in order to deliver high-speed CT reconstructions, and describe its implementation challenges using CUDA and OpenCL. We further show how GPU acceleration enables the utilization of reconstruction approaches which provide highly improved reconstruction quality in NDT applications.  Holger Scherl Siemens AG Medical Imaging & Visualization Watch now FLV MP4  
2139 Interactive Histology of Large-Scale Biomedical Image Stacks  Get the latest information on leveraging GPU computing to process and visualize large-scale biomedical image stacks.  We will discuss both display-aware processing and GPU-accelerated texture compression for histology applications on the GPU. Won-Ki Jeong, Jens Schneider Harvard University, King Abdullah University of Science and Technology Medical Imaging & Visualization Watch now FLV MP4  
2144 Large-Scale Visualization Using A GPU Cluster Learn how to visualize extremely large-scale scientific data using GPGPU techniques on a GPU-accelerated visualization cluster. Recent advances in general-purpose GPU (GPGPU) computing provide a promising solution to compute-intensive scientific visualization. However, the largest scientific simulations produce datasets that are orders of magnitude larger than the memory available on current GPUs. Many distributed GPUs must be used in parallel. We present Longhorn, currently the world's largest GPU-enhanced cluster dedicated for visualization and data analysis, and describe the distributed HW/SW architecture to interactively visualize massive datasets. Furthermore, we discuss the techniques to optimize a CUDA isosurfacer and to accelerate isosurface extraction of extremely large-scale data using preprocessed metadata. Byungil Jeong, Paul Navratil TACC / UT-Austin, Texas Advanced Computing Center Medical Imaging & Visualization Watch now FLV    
2146 Virtual Surgery  Come see how 3D Vision technology is used in Virtual Surgery Training for Medical Education.  BioDigital Systems in conjuncture with University of California San Francisco (UCSF), has developed a dental injection simulator to teach students of dentistry the mechanics of nerve block injection.  3D Vision Technology has added a new dimension of realism by providing users with a unique immersive experience. Aaron Oliker BioDigital Medical Imaging & Visualization Watch now FLV    
2169 Real-time Volumetric Medical Ultrasound Applications for GPU Computing Real-time volumetric medical ultrasound requires computationaly intensive rapid processing of data for visualization of aquired acoustic data. Clinical applications of GPU-based technologies in obstetrics and cardiology will be discussed. Roee Lazebnik Siemens Healthcare Medical Imaging & Visualization Watch now FLV    
2201 A Case Study of Accelerating Matlab Based Applications using GPUs Learn how to accelerate Matlab based applications using GPUs. We cover a popular neuro-imaging software called SPM and show how to use CUDA and Jacket to speedup computationally intensive Matlab applications.   Aniruddha Dasgupta Georgia Institute of Technology Medical Imaging & Visualization Watch now      
2211 Modern Architecture for Massively Parallel Medical Tomographic Image Reconstruction on a GPU Cluster Learn how to combine GPU and Cluster Programming with a real-world example. Many aspects of medical tomographic image reconstruction are embarrassingly parallel, but require massive compute power. We distribute the load onto a cluster of multi-GPU equipped nodes using Message Passing Interface (MPI) and CUDA. The Thrust library allows for a modern object-oriented approach. Sven Prevrhal, Jingyu Cui Philips, Stanford University Medical Imaging & Visualization Watch now      
2235 Advanced Medical Volume Rendering and Segmentation on the GPU Learn how to speed up your interactive medical visualization pipeline by an order of magnitude and dramatically improve rendering quality at the same time. Leading researchers in medical imaging informatics describe recent advances in volume visualization and interactive segmentation.

Emphasis is on the underlying parallel GPU algorithms and acceleration data structures.
Mike Roberts, Eric Penner Hotchkiss Brain Institute, University of Calgary, Canada Medical Imaging & Visualization Watch now FLV MP4  
2236 A Work-Efficient GPU Algorithm for Level Set Segmentation Explore a novel GPU level set segmentation algorithm that is both work-efficient and step-efficient.  Our algorithm has O(logn) step-complexity, in contrast to previous GPU algorithms which have O(n) step-complexity. We apply our algorithm to 3D medical images and we show that in typical clinical scenarios, our algorithm reduces the total number of processed level set field elements by 16x and is 14x faster than previous GPU algorithms with no reduction in segmentation accuracy. Mike Roberts Hotchkiss Brain Institute, University of Calgary, Canada Medical Imaging & Visualization Watch now      
2282 GPU-Enabled Biomedical Imaging The purpose of this presentation is to describe several novel biomedical imaging applications which make extensive use of GPUs.  In CT iterative reconstructions, for example, high performance computing is allowing us to see details and structures we previously were not able to discern. Homer Pien MGH / HMS Medical Imaging & Visualization Watch now      
2006 Short-Range Molecular Dynamics on GPU Learn how to accelerate short-range molecular dynamics using CUDA C. We will cover building the neighbor list and calculating the forces on the GPU. To handle the case where a few particles have significantly more neighbors than most other particles, we propose a hybrid data structure for the neighbor list that can achieve a good balance between performance and storage efficiency. A CUDA C implementation of the technique for Leonard-Jones forces can be found in the LAMMPS molecular dynamics open source code. Peng Wang NVIDIA Molecular Dynamics Watch now FLV MP4 PDF
2035 Simulations of Large Membrane Regions Learn how to study membrane-bound protein receptors by moving beyond the current state-of-the-art simulations that only consider small patches of physiological membranes. Towards this end, this session presents how to apply large-scale GPU-enabled computations of extended phospholipid bilayer membranes using a GPU code based on the CHARMM force field for MD simulations. Our code enables fast simulations of large membrane regions in NVT and NVE ensembles and includes different methods for the representation of the electrostatic interactions, i.e., reaction force field and Ewald summation (PME) methods. Performance and scientific results for dimyristoylphosphatidylcholine (PC) based lipid bilayers are presented.

Narayan Ganesan, Michela Taufer, Sandeep Patel University of Delaware Molecular Dynamics Watch now FLV MP4  
2054 NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Desktop A supercomputer is only as fast as its weakest link.  The highly parallel molecular dynamics code NAMD was one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007.  Now, after three short years, the Fermi architecture opens the possibility of new algorithms, simpler code, and easier optimization.  Come learn the opportunities and pitfalls of taking GPU computing to the petascale. James Phillips University of Illinois Molecular Dynamics Watch now FLV MP4  
2062 HOOMD-blue: Fast and Flexible Many-Particle Dynamics  See the newest capabilities and performance enhancements in HOOMD-blue, a general-purpose many-particle dynamics application written for GPUs. Speedups of 80-100x are attained for a wide range of simulation types. Topics for this presentation include an overview of HOOMD-blue, design and implementation details of the underlying algorithms, and a discussion on how generality is maintained without sacrificing performance. Joshua Anderson University of Michigan Molecular Dynamics Watch now FLV MP4 PDF
2073 High Performance Molecular Simulation, Visualization, and Analysis on GPUs This talk will present recent successes in the use of GPUs to accelerate interactive visualization and analysis tasks on desktop computers, and batch-mode simulation and analysis jobs

on GPU-accelerated HPC clusters.  We'll present Fermi-specific algorithms and optimizations and compare with those for other devices. We'll also present performance and performance/watt results for NAMD molecular dynamics simulations and VMD analysis calculations on GPU clusters, and conclude with a discussion of

ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.

John Stone University of Illinois at Urbana-Champaign Molecular Dynamics Watch now FLV MP4 PDF
2086 GPGPU DL_POLY Discover DL_POLY.
1. DL_POLY: an MD code ICHEC has ported to CUDA. The presentation especially focuses on the auto-tuning of the work distribution between CPU and GPU
Gilles Civario ICHEC Molecular Dynamics Watch now FLV MP4  
2168 Interactive Molecular Dynamics for Nanomechanical and Nanochemical Experiments Hear how the combination of GPU accelerated molecular dynamics simulation software, 3D TV displays, affordable haptic game controllers, and high performance molecular visualization is leading to new ways to study materials and objects on the nanoscale.  We will present the concept of an appliance for integrated virtual nanoscale experiments and challenges related to software and hardware. Axel Kohlmeyer Institute for Computational Molecular Science, Temple University Molecular Dynamics Watch now     PDF
2218 Redesigning Molecular Dynamics for GPUs and GPU Clusters Generalized Born and Particle Mesh Ewald (PME) molecular dynamics are two computationally intensive algorithms for simulating biological molecules.  While several adaptations of Generalized Born have attained excellent speedup on GPUs, high performance Particle Mesh Ewald has been more elusive.  Here we describe in detail a recent port of PME implemented within AMBER 11 that has achieved performance on par with up to 128 nodes of a top ten supercomputer. Scott Le Grand NVIDIA Molecular Dynamics Watch now      
2269 Bringing GPUs to Mainstream Molecular Dynamics Packages Recent work in close collaboration with NVIDIA has produced a GPU accelerated version of the AMBER Molecular Dynamics Code PMEMD that runs between 20 and 130 times the speed of a single 2.8GHz Intel Nehalem Processor, with even higher performance on multiple GPUs, but which does not make sacrifices in the accuracy or validity of such calculations to achieve this. The GPU accelerated version supports both explicit solvent particle mesh ewald (PME) and implicit solvent simulations and is available as part of the new AMBER 11 package. This talk will provide an overview of the AMBER software, background behind this GPU work, benchmarks, the impact that GPU accelerated MD can have on the field, the techniques used to achieve the performance seen without sacrificing accuracy and finally the validation methods used to ensure simulations are directly equivalent to CPU based calculations. Ensuring that a GPU implementation of a MD package provides results that are indistinguishable from the CPU code is extremely tricky and often the desire to take shortcuts to boost performance can affect accuracy with unpredictable results. We have developed a comprehensive validation suite that can be used to perform the detailed testing that is required to ensure the approximations necessary for GPU performance do not impact the scientific results. Additionally we will discuss how we have made careful use of mixed single and double precision arithmetic in the AMBER implementation to achieve equivalence in the results without excessively compromising performance. Finally we provide examples of recent breakthrough simulations conducted using GPU enabled AMBER 11. Ross Walker San Diego Supercomputer Center Molecular Dynamics Watch now      
2122 Using GPUs for Real-Time Brain-Computer Interfaces Learn how GPU processing can provide researchers with an inexpensive and versatile alternative to dedicated signal processing hardware for real-time neural prosthetics. Topics will include an overview of algorithms, current state-of-the-art hardware, GPU processing in a real-time environment, multi-platform processing, and future directions in BCIs using GPU processing. Adam Wilson University of Cincinnati Neuroscience Watch now FLV MP4  
2252 Simulating Housefly Vision Elements Using OpenCL An OpenCL GPU based computer simulation of a biologically motivated model, based on the anatomy of housefly’s first optic ganglion, the lamina ganglionaris (the lamina layer) is presented. Specific to GPU technology, the computer model demonstrates: the implementation of a 2nd Order Runga-Kutta method to approximate coupled differential equations using GPU hardware; the mapping of a non-Cartesian coordinate system onto the Cartesian layout of the threads.  Testing examined usage and access across device memory spaces to determine the optimal usage/access method for the ANN. This result was generalized for OpenCL GPU devices, using the capabilities of OpenCL.   Karen Haines WASP/The University of Western Australia Neuroscience Watch now      
2066 Accelerating System Level Signal Integrity Simulation  Discuss how GPU acceleration for key parts of the ANSYS Nexxim Simulator resulted in significant speedup over multi-core processors.  We will cover time consumption and data parallelism exposure considerations, and focus on key areas where GPU acceleration was applied including convolution and Eye rendering. Danil Kirsanov, Ekanathan Palamadai ANSYS Physics Simulation Watch now FLV MP4  
2080 Tackling Multi-Gigabit Design Challenges with a Practical Virtual EMI/ESD Lab Learn about efficient methodologies for performant and cost-effective EMI and ESD suppression techniques by means of massive GPU parallel processing for simulations.  We will discuss solving ever more complicated EMI and ESD challenges very early in the design process using in a so called ‘Virtual EMI/ESD lab’. Davy Pissoort, Amolak Badesha, Hany Fahmy KHBO-FMEC, Agilent Technologies, NVIDIA Physics Simulation Watch now FLV MP4  
2090 Developing Highly Scalable Particle-Mesh Codes for GPUs: A Generic Approach Dive deep into a multi-parallel Particle in Cell code that utilizes MPI, pthreads, and CUDA. Around this specific application a general C++ framework for transparent data transfers between GPUs has been developed and will be presented. Further techniques employed include interleaving of communication and computation, particle tiling and a study of how well CUDA performance can be transferred to OpenCL. Guido Juckeland, Michael Bussmann TU Dresden - ZIH, Forschungszentrum Dresden-Rossendorf Physics Simulation Watch now FLV MP4  
2102 Evacuate Now?  Faster-than-real-time Shallow Water Simulation on GPUs Learn how to simulate a half an hour dam break in 27 seconds! We present how shallow water simulation with interactive visualization is successfully mapped to modern graphics hardware. Featuring a live demo, we will present interactive shallow water simulations running on a standard laptop. The implementation has been verified against analytical and experimental data, supports multi-gpusimulation, and can run up-to 6300x6300 domain sizes at 320 million cells per second on the GTX 480. André Rigland Brodtkorb SINTEF ICT Physics Simulation Watch now FLV MP4  
2112 The Heisenberg Spin Glass Model on GPU:  Myth versus Fact Dive into implementations of the 3D Heisenberg spin glass model for GPUs.  We will discuss results showing that fast shared memory gives better performance with respect to slow global memory only under certain conditions.  Covers careful kernel tuning to achieve significant speedup with respect to a state-of-art high end multicore processor. Massimo Bernaschi Istituto Applicazioni del Calcolo - C.N.R. Physics Simulation Watch now FLV MP4  
2137 CUDA for Real-Time Multigrid Finite Element Simulation of Soft Tissue Deformations The take-away of this presentation is an efficient CUDA implementation of a finite hexahedra multigrid solver for simulating elastic deformable models in real time. Due to the regular shape of the numerical stencil induced by the hexahedral regime, computations and data layout can be restructured to avoid execution divergence and to support memory access patterns enabling the hardware to coalesce multiple memory accesses into single memory transactions. This enables to effectively exploit the GPU's parallel processing units and high memory bandwidth. Performance gains of up to a factor of 12 compared to a highly optimized CPU implementation are demonstrated. Christian Dick, Joachim Georgii Technische Universität München Physics Simulation Watch now FLV MP4 PDF
2155 GPGPU in the real world. The ABAQUS experience We describe the ABAQUS experience in integrating GPGPU acceleration into a complex, high performance commercial engineering software. In particular we discuss the trade-off we had to make and the benefits we obtained from this technology. Luis Crivelli Dassualt Systems Simulia Corporation Physics Simulation Watch now FLV MP4  
2231 Driving on Mars, Redux: System Level Simulation of Dynamic Systems Learn how GPU and HPC computing are used to predict through simulation the dynamics of large complex mechanical systems such as tracked vehicles including the Mars Rover.  The presentation outlines the physics based approach and numerical solution methods that enabled the simulation of dynamic systems with millions of bodies on the GPU.  The presentation will also explain how a HPC cluster is used to effectively render scenes with tens of thousands of bodies for generating animations that can be used by Engineers in the design process. Dan Negrut University of Wisconsin-Madison Physics Simulation Watch now      
2246 The challenges of integrating CUDA engines into an existing package, yet not sinking the boat Based on a true story, come listen to a daring tale about the process of integrating a large CUDA component (physical engine) into an existing product (3D engine) replacing some of its functionality. The architectural difficulties and finer points that needed to be addressed. The tuning and testing of such a large system. While not effecting the stability of the original system. Eri Rubin OptiTex Physics Simulation Watch now      
2005 Porting Large-Scale Legacy Fortran Codes Explore a new automatic Fortran translator which has been developed and used to port the numerical subroutines of FEFLO , a general-purpose legacy Computational Fluid Dynamics code operating on unstructured grids, to run on the GPU.  Data transfer to the CPU is minimized throughout the course of a CFD run.  Benchmarks of large-scale production runs will be presented. Andrew Corrigan, Rainald Löhner Naval Research Laboratory & George Mason University, George Mason University Programming Languages & Techniques Watch now FLV MP4 PDF
2011 Fundamental Performance Optimizations for GPUs This presentation covers the major CUDA optimizations.  Topics will include: maximizing memory throughput, kernel launch configuration, using shared memory, and improving GPU/CPU interaction.  While C for CUDA is used for illustration, the concepts covered will apply equally to programs written with OpenCL and DirectCompute APIs. Paulius Micikevicius NVIDIA Programming Languages & Techniques Watch now FLV MP4 PDF
2023 Processing Device Arrays with C++ Metaprogramming I will describe tricks for building APIs using C++ metaprogramming that generate custom kernels for complex manipulation of device-side arrays in CUDA.  Using a variation of Expression Templates, multiple operations can be fused into a single kernel that executes with reasonable efficiency. Jonathan Cohen NVIDIA Research Programming Languages & Techniques Watch now FLV MP4  
2028 Mathematica for GPU Programming Mathematica is widely used in scientific, engineering, mathematical fields and education.  In this session, new tools for general GPU programming in the next release of Mathematica are presented.  These tools build on top of Mathematica’s technology which provides a simple, yet powerful, interface to the large base of compiling tools. Applications of CUDA and OpenCL from within Mathematica will be presented. These examples will provide a general overview of the powerful development environment for GPU programming that Mathematica can offer not just for researchers but for anybody with basic knowledge of Mathematica and GPU programming. Ulises Cervantes-Pimentel Wolfram Research Programming Languages & Techniques Watch now     PDF
2067 Experiences with Code Optimizations for High Performance GPGPU Programs Attend this session to learn and share code optimizations to achieve high performance GPU computing.  We will cover code transformations for memory coalesing, workload management at both thread and thread-block levels, and different ways to handle memory partition conflicts.  We will also discuss Integration of code optimizations into a compiler. Huiyang Zhou, Yi Yang North Carolina State University Programming Languages & Techniques Watch now FLV MP4 PDF
2124 Operating System Abstractions for GPU Programming GPGPU frameworks such as CUDA improve programmability, but GPU parallelism remains inaccessible in many application domains. This session argues that poor OS support causes this problem.  OSes do not provide the kind of high-level abstractions for GPUs that applications expect for other resources like CPUs and file systems.  We advocate reorganizing kernel abstractions to support GPUs as first-class computing resources, with traditional guarantees such as fairness and isolation. We demonstrate shortcomings in Windows 7 GPU support, and show that better OS abstractions can accelerate interactive workloads like gesture recognition by a factor of 10X over a CUDA implementation.
Christopher Rossbach, Emmett Witchel Microsoft Research, University of Texas at Austin Programming Languages & Techniques Watch now FLV MP4 PDF
2167 Designing a Geoscience Accelerator Library Accessible from High Level Languages Explore a library for geoscience applications on CUDA and OpenCL platforms. Target applications span atmosphere, ocean, geomorphology and porous media flows.  These areas are linked by common numerical techniques encapsulated in our library.  We will review the scope of the library, its meta-programming approaches, and its key design attributes.  We will also demonstrate its support for multi-GPU parallelism within and across address spaces and provide examples of is use from high level languages including C, Fortran, and Python. Chris Hill, Alan Richardson M.I.T Programming Languages & Techniques Watch now      
2212 Parallel Nsight for Accelerated DirectX 11 Development [Advanced] Parallel Nsight is NVIDIA's new development environment for graphics and GPU computing.  In this advanced session, you will learn how Parallel Nsight can accelerate debugging and profiling of Direct3D 11 applications.



Attendees will learn how to debug Direct3D frames and HLSL shaders using Parallel Nsight's powerful Graphics Inspector and Debugger which allows developers to inspect Direct3D resources and state, set breakpoints in HLSL shaders, examine shader variables, and see which graphics primitives are live on the GPU.



Attendees will also learn how to use the Frame Profiler to capture and mine performance information, and easily pinpoint bottlenecked GPU units.
Simon Barrett NVIDIA Programming Languages & Techniques Watch now FLV MP4 PDF
2278 Strategies for Code Encapsulation in GPU Implementations  Code encapsulation is a common technique used to reduce code complexity that a given programmer has to understand. It allows the use of increasingly complex systems of hardware, software, and algorithms to tackle increasingly difficult scientific problems. Unfortunately, code encapsulation is not easily attainable in current GPU environments.  We will share our OpenCL development experiences for achieving partial encapsulation in GPU implementations, and discuss best practices in this area. Brian Cole OpenEye Scientific Software Programming Languages & Techniques Watch now      
2281 Domain-Specific Languages Computer graphics has introduced several domain-specific languages (DSLs) that enable high performance and parallelism for narrow problem domains - RenderMan, Cg, GLSL, and recently OpenRL and OptiX. We think that similar approaches can benefit other areas of GPU computing - visualization, animation, physics simulation, or scientific data analysis. In this talk, we present Shadie, a domain-specific shading language for rapid development of complex custom volume visualizations in radiation oncology. The shaders are written in a high-level Python-like language and translated to CUDA for efficiency. We will explain how you can develop your own DSLs using source-to-source translation and a suitable backend library.  Milos  Hasan, Hanspeter Pfister Harvard University Programming Languages & Techniques Watch now      
2294 GPU.NET with TidePowerd Join TidePowerd for a demonstration of GPU.NET, our innovative new product which dramatically cuts the time needed to develop and maintain a GPU-based application by extending Microsoft's .NET Framework onto GPUs. With GPU.NET, your device-accelerated code can be written in any .NET-supported language (e.g., C#, F#, IronPython) and called like any other method - so it's easy to create new GPU-based applications without having to retrain your developers. You'll learn how to use GPU.NET to quickly develop a financial calculator in C#, use the built-in Visual Studio unit-testing tools to ensure the correctness of the code, and seamlessly deploy the application into a mixed Windows / Linux environment. We'll also discuss how GPU.NET expands the frontiers of GPU computing into lucrative new markets such business intelligence, database processing, and data visualization. Jack Pappas TidePowerd Programming Languages & Techniques Watch now     PDF
2296 CUDA Optimization for Ninjas: A Case Study of High-Performance Sorting In this presentation, we use our implementation for high performance radix sorting as a case study for illustrating advanced design patterns and idioms. These techniques have allowed us to demonstrate Fermi sorting rates that exceed 1.0 billion 32-bit keys per second (and over 770 million key-value pairs per second), making it the fastest fully-programmable micro-architecture for this genre of sorting problems.



Although the CUDA programming model is elegantly decoupled from any particular hardware configuration, we present techniques for exploiting knowledge of the NVIDIA GPU machine model in order to produce more efficient implementations.  Our design patterns enable the compiler to specialize a single program text for a variety of architectures, resulting in target code that “fits” the underlying hardware significantly better than more general approaches.  In particular, we discuss strategies for kernel fusion, warp-synchronous programming, flexible granularity via meta-programming, algorithm serialization, and data-movement tuning. 

Duane Merrill University of Virginia Programming Languages & Techniques Watch now      
2047 Bridging Ray and Raster Processing on GPUs Explore new techniques in real time rendering.  We will discuss a system for ray traced global illumination (GI) carefully integrated with a traditional raster renderer using an incremental irradiance cache.  Covers novel GPU methods for spawning secondary GI rays on only visible cells, smoothly sampling the visible 3D cache into 2D, and incrementally ray traced spherical harmonics basis.  Details applying a range of optimizations to achieve real-time frame rates with the OptiX ray tracing engine. Kenny Mitchell Black Rock Studio   Not Available FLV MP4  
2074 Driving a Product from Rasterization to Ray Tracing: The Developer Experience Learn from the challenges encountered while using DirectX to update the Bunkspeed Move rasterization engine to work with Mental Images' iRay.  This work was part of the creation of Bunkspeed Shot, which allows the user to leverage both the high quality image generation of iRay and a highly interactive, good quality rasterization engine (used for quick setup of a scene).  Covers major differences between a ray tracing based interactive system, including GPU based ray tracing, and a traditional GPU rasterization engine. Nicolas Gebbie Bunkspeed Ray Tracing Watch now FLV MP4  
2250 GPU Ray Tracing Exposed: Under the Hood of the NVIDIA OptiX Ray Tracing Engine Take a deep dive into many of the design choices and implementation details of the NVIDIA OptiX ray tracing engine.  Learn how domain specific compilation, a unique execution model and a general object model, are combined into a flexible and powerful API. Steve Parker, Austin Robison, Phillip Miller NVIDIA Ray Tracing Watch now FLV MP4  
2003 Using CUDA to Accelerate Radar Image Processing Come see how current GPU technology provides the means for the first portable real-time radar image processing algorithm. This session will outline how the GPU has afforded nearly three orders of magnitude improvement in performance for Synthetic Aperture Radar's (SAR) hallmark image processing algorithm.  We will present algorithm details and further improvements. Aaron Rogan Neva Ridge Technologies Signal processing Watch now FLV MP4 PDF
2126 Accelerating Signal Processing: Introduction to GPU VSIPL Learn how to use the Vector Signal Image Processing Library to accelerate signal processing applications without needing to understand platform-specific programming and optimization techniques.  We will discuss how GPU VSIPL implements the VSIPL API and uses CUDA-capable GPUs to maximize performance of several example applications. Dan Campbell Georgia Tech Research Institute Signal processing Watch now FLV MP4 PDF
2043 Disparity Map Generation  Explore the algorithms and implementation of disparity maps on the GPU.  We will discuss how a disparity map facilitates stereoscopic content creation, applications and approaches tried, and final results of real time calculations on GPUs. Henry Gu GIC Stereoscopic 3D Watch now FLV MP4  
2107 Accelerating Stereographic and Multi-View Images Using Layered Rendering Explore applications of geometry shaders in improving the performance of stereo pair or multi-viewer image generation. This session will cover the basic approach of single-pass stereo-pair creation and provides guidelines for when layered rendering can be used to increase performance. A particular emphasis will be placed on virtual reality and scientific visualization, but the techniques discussed apply to a wide range of rendering environments. Results will be shown for three GPU architectures, including the new GF100 GPU. Jonathan  Marbach  TerraSpark Geosciences, LLC Stereoscopic 3D Watch now FLV MP4  
2241 Standing Out: Implementing a Great Stereo UI Learn how to make S3D compatible user interfaces, HUDs, and in-game menus. The first part of this session will outline the common problems users encounter when displaying traditional 2D UI in stereoscopic 3D.  The second part will focus on the different techniques, tips/tricks, and best practices developers can use to create high-quality S3D interfaces.  The presentation will highlight examples from several shipped titles, as well as showcase a complete 3D UI game demo running in S3D on multiple devices including PC and mobile. Brendan Iribe Scaleform Stereoscopic 3D Watch now      
2002 CUDA Debugging on Linux and MacOS with cuda-gdb Boost your development speed by mastering the CUDA debugging tools NVIDIA provides. In this session you will learn the basics of cuda-gdb and cuda-memcheck, as well as their more advanced features with live demonstrations on Linux and MacOS. Satish Salian NVIDIA Tools & Libraries Watch now FLV MP4  
2008 OpenCL Optimization Learn how to optimize your OpenCL application to achieve maximum performance on NVIDIA GPUs. We will first briefly discuss how the OpenCL programming model maps onto NVIDIA GPU’s architecture. We will then talk about memory, instruction, and NDRange optimization techniques, illustrating each with small code samples. Peng Wang NVIDIA Tools & Libraries Watch now     PDF
2012 Analysis-Driven Performance Optimization The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process.  There are three fundamental limiters to kernel performance: instruction throughput, memory throughput, and latency.  In this session we will describe:



•how to use profiling tools and source code instrumentation to assess the significance of each limiter;

•what optimizations to apply for each limiter;

•how to determine when hardware limits are reached.

Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development.  It is assumed that attendees are already familiar with the fundamental optimization techniques.

Paulius Micikevicius NVIDIA Tools & Libraries Watch now     PDF
2039 GPU Debugging with Allinea DDT Discover how a debugger can help you fix those hard to find bugs in your GPU software, with this introduction to the special CUDA features in Allinea DDT. David Lecomber Allinea Software Tools & Libraries Watch now FLV MP4  
2041 PyCUDA: Even Simpler GPU Programming with Python Explore PyCUDA, a robust, open-source toolkit that lets you control your GPU from the comfort of Python, a Matlab-like scripting language.   Learn about Fermi tuning with PyCUDA, the new interfaces for CUBLAS and CUFFT, the ecosystem of third-party libraries built on PyCUDA, and examples illustrating PyCUDA's benefits to large-scale applications. Andreas Kloeckner Courant Institute, NYU Tools & Libraries Watch now FLV MP4 PDF
2050 Copperhead: Data-Parallel Python for the GPU Learn how to write Python programs that execute highly efficiently on GPUs using Copperhead, a data-parallel Python runtime.  Using standard Python constructs like map and reduce, we will see how to construct data-parallel computations and embed them in Python programs that interoperate with numerical and visualization libraries such as NumPy, SciPy and Matplotlib.  We will examine how to express computations using Copperhead, explore the performance of Copperhead programs running on GPUs, and discuss Copperhead's runtime model, which enables data-parallel execution from within Python. Bryan Catanzaro University of California, Berkeley Tools & Libraries Watch now FLV MP4  
2053 Pixel Bender: Building a Domain Specific Language on the GPU Examine the challenges and advantages of building the Pixel Bender domain specific language for image processing for the GPU.  We will examine how Pixel Bender was made to work within several Adobe applications across a wide range of hardware systems and platforms. Bob Archer Adobe Systems Inc Tools & Libraries Watch now FLV MP4 PDF
2070 CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices The CUSPARSE library can impact and enable software solutions for computational science and engineering problems in the fields of energy exploration, physical simulations and life sciences among many others. It provides sparse linear algebra primitives that can be used to implement iterative linear system and eigenvalue solvers and can also serve as a building block for the state-of-the-art sparse direct solvers. CUSPARSE library is implemented using CUDA parallel programming model and provides sparse analogs to BLAS level-1,2,3 operations, such as matrix-vector multiplication, triangular solve and format conversion routines.  Maxim Naumov NVIDIA Tools & Libraries Watch now FLV MP4 PDF
2109 Migration of a Complete 3D Poisson Solver from Legacy Fortran to CUDA We describe our journey of migrating a legacy direct solver library for Poisson equations written in Fortran77 to CUDA in order to harness the computational power provided by the Tesla device (“Fermi”). This legacy library is still widely used today as it is the most complete library that can deal with three different boundary conditions (Dirchlet, Neumann and Cyclic) and two grid configurations (staggered and centered) independently in any of the three dimensions (x, y, z); giving a total of over 200 configurations. Huynh Phung Huynh A*STAR Institute of High Performance Computing Tools & Libraries Watch now FLV MP4  
2111 Using R for High-Performance Data Analysis  Data analysis is the art and the science of getting the correct quantitative models and their numerical parameters from the observed data. In this talk, we report on a project to integrate CUDA into the open source data analysis environment R. The combined use of the CPU and GPU resources can efficiently exploit the significant amount of data parallelism inherent in most data analysis problems and methods. This makes interactive analysis possible even for large, compute-intensive problems. The implementation and the achievable performance gains will be demonstrated on a concrete example from quantitative finance. Domokos Vermes Worcester Polytechnic Insitute Tools & Libraries Watch now FLV MP4  
2143 CUDA Fortran Programming for NVIDIA GPUs An introduction to programming NVIDIA GPUs using CUDA Fortran. Suitable for expert Fortran or CUDA C programmers who need to extract maximum performance from GPUs using an explicit GPU Fortran programming model. Introduces the CUDA Fortran language, and through examples, illustrates how to explicitly program GPUs in native Fortran 95/03 through creation of GPU kernel subroutines, management of host and device memory, definition of CUDA grids and thread blocks, launching kernels, and use of the CUDA Fortran runtime API. This talk includes a live component with a Windows laptop containing an NVIDIA GPU and the

PGI CUDA Fortran compiler.
Brent Leback The Portland Group Tools & Libraries Watch now      
2148 Rapid Prototyping and Visualization with OpenCL Studio Learn about OpenCL Studio, an integrated OpenCL and OpenGL development environment for parallel programming and visualization.  We will discuss building end user applications and using its integrated visualization capabilities to better understand the output and internal structure of parallel algorithms.  We will also demonstrate its capabilities using several sample applications including particle systems, volumetric rendering, and image processing. Jochen Stier Geist Software Labs Tools & Libraries Watch now   MP4  
2149 Overview of Parallel Nsight for Visual Studio NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. This session is an entry level overview of the GPU computing and graphics development features of Parallel Nsight as well as a glimpse into the future of this powerful tool. Kumar Iyer NVIDIA Tools & Libraries Watch now FLV MP4  
2150 Parallel Nsight: Debugging Massively Parallel Applications [Advanced] Data parallel algorithms that provide real-time financial options pricing or identification of hidden oil reserves are utilizing the massively parallel nature of the GPU for industry changing performance gains. Developers require industry standard development tools to create the software that accomplishes these parallel tasks.

NVIDIA Parallel Nsight delivers the power of the GPU within the familiar environment of Microsoft Visual Studio. In this session, you will learn advanced techniques for debugging CUDA C/C++ and DirectCompute code using Parallel Nsight, including conditional and data breakpoints as well as out of bound GPU memory access detection.

Sebastien Domine NVIDIA Tools & Libraries Watch now FLV MP4 PDF
2151 Parallel Nsight: Analyzing and Optimizing Massively Parallel Applications [Advanced] Life altering products that provide early detection of breast cancer or simulate molecular behavior, accelerating drug discovery, are becoming reality thanks to the power of the GPU. As these technologies become mainstream, mainstream tools are required to support these development efforts.

NVIDIA Parallel Nsight delivers the power of the GPU within the familiar environment of Microsoft Visual Studio. In this session, you will learn advanced techniques for visualizing your application's workloads and performance characteristics across the CPU, GPU, and operating system, and explore the depths of Parallel Nsight profilers, including GPU performance counters and how to use them.

Sebastien Domine NVIDIA Tools & Libraries Watch now FLV MP4 PDF
2156 GMAC: Global Memory For Accelerators Learn how to use GMAC, a novel run-time for CUDA GPUs. GMAC unifies the host and device memories into a unified virtual address space, enabling the host code to directly access the device memory, and removing the need for data transfers between host and device memories. Moreover, GMAC also allows pointers to be used by both, the host and device code indistinctly.



This session will present the GMAC run-time and show how to use it in current applications. This session will cover from the basics of GMAC to multi-threaded applications using POSIX threads, OpenMP and MPI.
Isaac Gelado Universitat Politecnica de Catalunya Tools & Libraries Watch now FLV MP4  
2160 StarPU: a Runtime System for Scheduling Tasks  See how StarPU provides task scheduling facilities for a hybrid platform and a powerful data management library that transparently takes care of data across the entire machine.  We will discuss the significant performance improvements resulting from its flexible scheduler as well as its ability to mix parallel CPU kernels (eg. written in OpenMP or TBB) with CUDA/OpenCL and MPI. Cedric Augonnet INRIA Tools & Libraries Watch now     PDF
2164 Analytical Performance Models to Improve the Efficiency of GPU Computing Dive deep into a simple analytical model that provides insight into performance bottlenecks of parallel applications on GPU architectures.  We will discuss how the model estimates the execution time of massively parallel programs.  We will also cover how to optimize applications based on our developed performance analysis models. Hyesoon Kim Georgia Tech Tools & Libraries Watch now      
2176 Easy GPU Meta-programming: A Case Study in Biologically-Inspired Computer Vision Learn how to let the computer optimize your CUDA and OpenCL code for you with easy GPU Meta-programming and Scripting (e.g. PyCUDA). We will present a case study in which we consider the step-wise optimization of a 3D filter bank convolution,

using a suite of open-source tools.

Nicolas Pinto,David Cox MIT, Harvard University Tools & Libraries Watch now FLV MP4  
2177 Simplifying Parallel Programming with Domain Specific Languages Explore a new approach in parallel programming which leverages Domain Specific Languages (DSLs) to simplify programming heterogeneous systems (multi-core processors and GPUs). This approach allows DSL users to take advantage of the power of GPUs without having working knowledge of lower level programming models such as CUDA. Topics will cover the advantages of the DSL approach in parallel programming, and the runtime implementation details with optimizations to have the performance benefits of using GPUs. HyoukJoong Lee, Hassan Chafi Stanford University Tools & Libraries Watch now     PDF
2179 GPU - An R Library for Native GPU Objects Come learn about the GPU R package. R is the widely popular open source statistical programming language.  The GPU package extends R by providing GPU-based types, classes and methods implementing GPU versions of R vectors, matrices, lists and data frames.  Subsequent operations with these are executed on the GPU. Users are not required to create special bindings or implement special syntax, nor do they need copy objects between CPU and GPU.  The GPU packages allows programmers access to the computational power of GPUs with little modification to existing code.   Christopher Brown Decision Patterns Tools & Libraries Watch now FLV MP4  
2202 A Programming Model and Tool for Automatic High-Performance C to CUDA Mapping Discover our automatic C-to-CUDA mapper prototype, and how it optimizes execution and data movement for a broad class of loop codes. Coupled with our powerful mapper, C as an input language does not only offer portability but also performance and performance portability. Learn about our optimizations and some of the performance obtained through different uses of the mapper. Benoit Meister Reservoir Labs Tools & Libraries Watch now FLV MP4 PDF
2210 GPU-Ocelot: An Open Source Debugging and Compilation Framework for CUDA Learn how to debug and profile CUDA applications using GPU-Ocelot.  Ocelot is a compilation and emulation framework for CUDA that includes debugging and profiling tools as well as backend compilers for NVIDIA GPUs and x86 CPUs.  We will present examples of applications developed on x86 CPUs and deployed on NVIDIA GPUs.  We will also discuss memory checking, race detection, and deadlock detection tools available within Ocelot. Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili Georgia Institute of Technology Tools & Libraries Watch now FLV MP4  
2213 BCSLIB-GPU:  Significant Performance Gains for CAE Hear product architects and developers describe the algorithmic depths and high level breath of the use of GPUs that have been employed to create BCSLIB-GPU, the GPU enablement of the industry standard sparse matrix software suite, BCSLIB-EXT.  We provide a range of comparison data with Tesla and Fermi compared with multi-core CPU only systems and for a wide range of realisitic demanding real world test problems. Danl Pierce Access Analytics Int'l, LLC Tools & Libraries Watch now FLV MP4  
2216 CUDA Libraries Open House Learn about NVIDIA’s CUDA libraries and meet the engineers that develop them.  Lead developers will cover the capabilities, performance and future directions for NVIDIA’s CUFFT, CUBLAS, CURAND, and NPP libraries (other libraries such as CUSPARSE and open source Thrust are covered in other talks).  After the presentation, NVIDIA developers will remain in the room to chat and answer questions during the lunch break.

Ujval Kapasi, Philippe Vandermersch, Elif Albuz, Nathan Whitehead, Frank Jargstorff NVIDIA Tools & Libraries Watch now     PDF
2219 High-Productivity CUDA Development with the Thrust Template Library Thrust is a parallel template library for developing CUDA applications. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing. Thrust provides host and device variants of the STL vector container to simplify memory management and facilitate data transfers. These containers are complemented with a large collection of generic data-parallel algorithms and a suite of useful iterator adaptors. Together, these features form a flexible high-level interface for GPU programming that greatly enhances developer productivity.



In this session we'll discuss Thrust's features and explain the basic design philosophy of the library.
Nathan Bell NVIDIA Research Tools & Libraries Watch now FLV MP4 PDF
2220 Thrust by Example: Advanced Features and Techniques Thrust is a parallel template library for developing CUDA applications which is modeled after the C++ Standard Template Library (STL).  In this session we'll show how to implement decompose problems into the algorithms provided by Thrust.  We'll also discuss the performance implications of "kernel fusion" and "array of structs" vs. "structure of arrays" memory layouts and how they relate to Thrust. Lastly, we'll present evidence that Thrust implementations are fast, while remaining concise and readable.  Jared Hoberock NVIDIA Tools & Libraries Watch now     PDF
2225 Tools for Managing Clusters of NVIDIA GPUs Learn about the suite of tools NVIDIA provides to manage large installations of GPUs from the NVIDIA Tesla Series. The presentation will cover cluster management – tool and library –, as well as the GPUDirect technology that enables GPUs to communicate faster across the network. Peter Buckingham, Andrew Iles NVIDIA Tools & Libraries Watch now FLV MP4 PDF
2249 New Programming Tools GPU Computing This session will focus on new parallel programming tools for GPU computing. The type of tools that fit into the session include (1) Planning tools for porting legacy applications to use GPU computing, (2) High-level programming and scripting tools for GPU computing, (3) Automation of common performance optimizations for GPU computing, (4) Performance analysis and diagnosis tools for GPU computing, (5) Tools that simplify heterogeneous parallel computing. Andrew Schuh University of Illinois Tools & Libraries Not Available      
2249 New Programming Tools GPU Computing This session will focus on new parallel programming tools for GPU computing. The type of tools that fit into the session include (1) Planning tools for porting legacy applications to use GPU computing, (2) High-level programming and scripting tools for GPU computing, (3) Automation of common performance optimizations for GPU computing, (4) Performance analysis and diagnosis tools for GPU computing, (5) Tools that simplify heterogeneous parallel computing. Wen-mei Hwu University of Illinois Tools & Libraries Watch now      
2251 TotalView Debugger for CUDA Hear how the TotalView debugger is being extended to support GPU computation with CUDA. In addition to the basic challenges associated with debugging parallel programming, CUDA programming introduces a 
number of new concepts for which developers need visibility in
debugging: a hierarchical memory, near-SIMD warps, streams, and
kernels, among others. How do we create a tool that handles it all?  

We'll be discussing the status of our work and the challenges encountered in bringing this all together into a single package, TotalView for CUDA.
Chris Gottbrath TotalView Technologies, Inc., a Rogue Wave Software company Tools & Libraries Watch now      
2267 GPU Computing with MATLAB®  MATLAB is a widely used tool for scientific, engineering and financial applications.  As the popularity of GPUs has grown, there is strong interest from engineers and scientists who solve computationally intensive problems to be able to leverage GPUs within MATLAB and other products from MathWorks. This talk will discuss how MathWorks tools can help engineers and scientist to take advantage of GPU resources while continuing to work in the familiar MATLAB environment.  A range of capabilities will be discussed and demonstrated. Loren Dean MathWorks Tools & Libraries Watch now FLV MP4 PDF
2271 Seven Tricks We Learned to Get Top Performance Many people find the performance a first draft CUDA implementation unimpressive.  We'll show seven techniques we used to push Jacket's convolution and matrix multiply code from vanilla to record performance. Included is a rapid fire coverage of various topics: scaling from mobile cards to Fermi, balancing register pressure, counting FLOPS, loop unrolling, efficient memory addressing, guarded memory access, and more.



We'll also debut Jacket's new C/C++ interface aimed at providing the same performance and functionality already available in Jacket's GPU support for MATLAB.
James Malcolm AccelerEyes Tools & Libraries Watch now      
2272 GStream: A General-Purpose Data Streaming  Framework on GPUs We present GStream, a general-purpose, scalable and C++ template run-time
framework amenable to both the streaming problem and GPU
 architectures. GStream offers transparent streaming data
 transmissions and automatic memory synchronization over a rich
 collection of computing resources that are transparently allocated and
reused.



Various problems other than streaming application, such as scientific computing, numerical codes and text processing, can b
 easily expressed using GStream and subsequently integrated with our
 GStream library. GStream's ease of use combined with efficient
exploitation of GPU resources have the potential to lead to higher
coding productivity and application performance through our
data-centric specification paradigm.

Xing Wu, Frank Mueller North Carolina State University Tools & Libraries Watch now      
2297 Developing CUDA Accelerated .NET Plugins for Microsoft Excel Quantifi will demo its xLDevelopment environment, which provide developers with an easy to use development environment which allows cuda functionality to be in Microsoft Excel.  With as little as  four lines, one will also select the position of the function in the menu bar, xml markup language will display in the excel help functionality, and objects can be easily added to the object cache.  These objects can then be inspected by the end user or developer.  Performance information can also be displayed in the object cache. The environment provides the developer an environment where he can focus on developing high performance functionality, and all intermediate layers of interface are taking care of by the environment. Peter Decrem Quantifi Tools & Libraries Watch now FLV MP4 PDF
2299 Integrating CUDA BLAS with IMSL Fortran  As GPU hardware becomes more prevalent in both research and commercial institutions, software that takes advantage of this specialized hardware is growing in demand. In many cases, it is infeasible or impossible to rewrite an existing program to run entirely on the GPU, so the goal is often to offload as much work as possible. As the IMSL Library team at Rogue Wave Software considers how best to tackle the GPU realm with a general mathematical library, the IMSL Fortran Library takes an initial step where the CUDA BLAS library is utilized to offload CPU work to GPU hardware. This presentation will discuss the approach and architecture of the solution. Benchmark results will show where success has been found. Plans for future products will also be covered. Chris Gottbrath TotalView Technologies, Inc., a Rogue Wave Software company Tools & Libraries Watch now FLV MP4  
2016 VDPAU: PureVideo on Unix Learn about VDPAU (Video Decode and Presentation API for Unix). VDPAU provides GPU-accelerated video decoding, post-processing, UI compositing, and display on Unix. VDPAU also supports sharing surfaces with OpenGL and CUDA ("interop"). This allows developers to implement their own post-processing algorithms or scene analysis, or to use decoded video surfaces as part of a scene rendered using OpenGL. Stephen Warren NVIDIA Video Processing Watch now FLV MP4 PDF
2027 GPU-Based Image Processing in Military Applications There are more than 6000 Unmanned Aerial Vehicles (UAVs) in use in the US Military. The US Army alone has flown more than 1 million UAV flight hours. Every UAV captures at least one stream of video; some as many as 9. All this video needs to be processed and analyzed both during the mission, and post-mission. Traditionally, custom ASICs, and FPGAs were required for even the most rudimentary image processing tasks. Now, GPUs provide orders of magnitude more compute at a fraction of the cost. Hear how MotionDSP uses GPUs to provide previously impossible capabilities to military imaging.  Sean Varah MotionDSP Inc. Video Processing Watch now FLV MP4  
2048 H.264/AVC Video Encoding with CUDA and OpenCL Join experts from MainConcept, a leading provider of video codecs to the professional market, as they demonstrate the latest version of their CUDA-based H.264/AVC Encoder. Thomas Kramer MainConcept Video Processing Watch now FLV MP4  
2075 GPU-Accelerated Video Encoding Learn how to accelerate video encoding using the GPU. We will give an overview of the typical video encoding pipeline and discuss how different parts of the pipeline can be ported to GPU using various approaches. We will focus on block-based Motion Estimation, in particular, as it is the corner stone of video encoding algorithms. The efficiency of its implementation on the GPU is crucial to the speed and quality of the encoder. Anton Obukhov NVIDIA Video Processing Watch now FLV MP4 PDF
2087 Fast High-Quality Panorama Stitching  We present a panorama stitching application implemented with CUDA C on the GPU. The image processing pipeline consist of SIFT feature detection and matching and Graphcut image stitching to achieve high-quality results. We demonstrate live panorama creation with a Webcam. Timo Stich NVIDIA Video Processing Watch now FLV MP4 PDF
2095 Building High Density Real-Time Video Processing Systems Learn how GPU Direct can be used to effectively build real time, high performance, cost effective video processing products.  We will focus especially on how to optimize bus throughput while keeping CPU load and latency minimal. Ronny Dewaele Barco Video Processing Watch now FLV MP4  
2121 Maximizing Throughput of Barco's GPU-Enabled Video Processing Server  Find out how Imec middleware realizes the full potential of GPU-enabled video processing servers to manage multiple video processing pipelines.  We will discuss how the middleware monitors GPU and CPU execution to best balance the load.  Covers how we achieved a 30% increase in throughput with only a minimal 0.05% overhead on Barco's GPU-enabled video processing server. Maja D'Hondt imec Video Processing Watch now FLV MP4 PDF
2224 GPU Acceleration in Adobe Creative Tools Hear experts explain how Adobe Creative Suite 5 harnesses the power of CUDA technology in several of its core software applications.  We will focus on the complete redesign of the core video playback and rendering engine in Adobe Premiere Pro CS5 and how it uses the power of GPUs to deliver superior performance and change the game for Adobe in professional video production. Paul Young, Steve Hoeg, Al Mooney Adobe Video Processing Watch now FLV MP4  
  Emerging Companies Summit Presentations More Files
Coming Soon
 
ID Title Abstract Speakers Affiliation Topic Area(s) Downloads
4000 Emerging Companies Summit Opening Address The Emerging Companies Summit is a unique forum for startup companies to showcase innovative applications that leverage the GPU to solve visual and compute-intensive problems. The Opening Address includes an overview of NVIDIA’s GPU ecosystem development activities and an interaction on stage with selected companies building groundbreaking applications on top of the GPU platform.
The ECS is a great opportunity to discover new players in the GPU ecosystem, find great investments, explore partnership opportunities, network/ build relationships, and discuss the future of an industry that is reshaping computing.
Jeff Herbst NVIDIA General Interest FLV   PDF
4001A Emerging Companies: CEO on Stage featuring Elemental Technologies See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Elemental Technologies - covering the field of video processing. Find this session at 5 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).
Sam Blackman Elemental Technologies, Inc. General Interest FLV   PDF
4001B Emerging Companies: CEO on Stage featuring Mersive See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Mersive - covering the field of imaging. Find this session at 20 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).

Rob Balgley Mersive General Interest FLV   PDF
4001C Emerging Companies: CEO on Stage featuring Geomerics See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Geomerics - covering the field of computer graphics. Find this session at 35 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).

Chris Doran Geomerics General Interest FLV   PDF
4002A Emerging Companies:  CEO on Stage featuring miGenius See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features miGenius - covering the field of cloud computing. Find this session at 5 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).

Chris Blewitt miGenius General Interest FLV   PDF
4002B Emerging Companies:  CEO on Stage featuring Allegorithmic See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Allegorithmic - covering the field of mobile devices. Find this session at 20 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).

Dr Sébastien Deguy Allegorithmic General Interest FLV   PDF
4002C Emerging Companies:  CEO on Stage featuring Bunkspeed See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Bunkspeed - covering the field of computer graphics. Find this session at 35 minutes into the video.
Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft),  Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).

Philip Lunn Bunkspeed General Interest FLV   PDF
4003 Emerging Companies Summit Panel:  GPUs for Computer Vision Moderated by Jon Peddie (President, Jon Peddie Research)

The GPU (graphics processing unit) runs advanced applications which are transforming existing industries and creating new ones. Join our panel of leading industry experts as they discuss the latest technology advances in the usage of GPU for Computer Vision, they will cover facial, gesture, human motion, and biometrics recognition, augmented reality, robotic computing and more.

Panelists:
Joe Stam (Sr. Applications Engineer, NVIDIA)
Yoram Yaacovi (CTO & General Manager, Technologies at Microsoft Israel, R&D Center)
Sam Cox (CEO, Milabra)
Janko Mrsic-Flogel (CTO, Mirriad)
Tom Dean (Research Scientist, Google)
Jon Peddie Jon Peddie Research General Interest FLV   PDF
4004A Emerging Companies:  CEO on Stage featuring empulse GmbH See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features empulse GmbH - covering the field of databases & data mining. Find this session at 5 minutes into the video.
Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Michael Hummel empulse GmbH General Interest FLV   PDF
4004B Emerging Companies:  CEO on Stage featuring Playcast Media Systems See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Playcast Media Systems - covering the field of video processing. Find this session at 20 minutes into the video.
Panelists will include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Natan Peterfreund Playcast Media Systems General Interest FLV   PDF
4004C Emerging Companies:  CEO on Stage featuring Cooliris See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Cooliris - covering the field of computer graphics. Find this session at 35 minutes into the video.
Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Austin Shoemaker Cooliris General Interest FLV   PDF
4005A Emerging Companies:  CEO on Stage featuring Softkinetic See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Softkinetic - covering the field of computer vision. Find this session at 5 minutes into the video.
Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Michel Tombroff Softkinetic General Interest FLV   PDF
4005B Emerging Companies:  CEO on Stage featuring Rocketick See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Rocketick - covering the field of high performance computing. Find this session at 20 minutes into the video.
Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Uri Tal Rocketick General Interest FLV   PDF
4005C Emerging Companies:  CEO on Stage featuring Jedox AG See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Jedox AG - covering the field of databases & data mining. Find this session at 35 minutes into the video.
Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).
Kristian Raue Jedox AG General Interest FLV   PDF
4007A Emerging Companies:  CEO on Stage featuring Scalable Display Technologies See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Scalable Display Technologies - covering the field of imaging. Find this session at 5 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
Andrew Jamison Scalable Display Technologies General Interest FLV   PDF
4007B Emerging Companies:  CEO on Stage featuring RTT See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features RTT - covering the field of computer graphics. Find this session at 20 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
Jeroen Snepvangers RTT General Interest FLV   PDF
4007C Emerging Companies:  CEO on Stage featuring Aqumin See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Aqumin - covering the field of finance. Find this session at 35 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
Michael Zeitlin Aqumin General Interest FLV   PDF
4008A Emerging Companies: CEO on Stage featuring OTOY See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features OTOY - covering the field of cloud computing. Find this session at 5 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
Jules Urbach OTOY General Interest FLV   PDF
4008B Emerging Companies: CEO on Stage featuring Universal Robotics See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Universal Robotics - covering the field of machine learning & artificial intelligence. Find this session at 20 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
David  Peters Universal Robotics General Interest FLV   PDF
4008C Emerging Companies: CEO on Stage featuring ICD See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features ICD - covering the field of mobile devices. Find this session at 35 minutes into the video.
Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).
David Hayes ICD General Interest FLV   PDF
4009 Emerging Companies Summit Panel:  The "New Normal" For Building Emerging Companies Based On Disruptive Technologies Moderated by Jeff Herbst (Vice President of Business Development, NVIDIA)

Start-ups are facing unique challenges as aresult of the current economic and business environment.   Not only is the venture funding environment very difficult, but small companies are finding it increasingly difficult to “break out” of the pack through IPO’s and attractive M&A exits.   This panel of experts (which includes VC and corporate investors) attempt to assess the current state of both the public and private markets, and explore various strategies and options for building successful companies in this “new” environment.   Topics include traditional forms of equity and debt, angel financing, as well as other creative/strategic financing options (eg. NRE arrangements, strategic partnerships etc.).   The discussion promises to be both lively and provocative.

Panelists:
Garrett Herbert (Partner, M&A Transaction Services, Deloitte & Touche LLP)
Peter Kidder (Division Risk Manager, Silicon Valley Bank)
Michael Tedescor (Managing Director, Citigroup Global Markets)
Andrew T. Sheehan (Managing Director, Sutter Hill Ventures)
Eric Jensen (Partner, Business Department Chair, Cooley LLP)
Jeff Herbst NVIDIA General Interest FLV   PDF
4010A Emerging Companies: CEO on Stage featuring OptiTex See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features OptiTex - covering the field of physics simulation. Find this session at 5 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Yoram Burg OptiTex USA Inc. General Interest FLV   PDF
4010B Emerging Companies: CEO on Stage featuring Useful Progress See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Useful Progress - covering the field of medical imaging & visualization. Find this session at 20 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Sylvain Ordureau UsefulProgress General Interest FLV   PDF
4010C Emerging Companies: CEO on Stage featuring NaturalMotion Limited See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features NaturalMotion Limited - covering the field of computer graphics. Find this session at 35 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Torsten Reil NaturalMotion Limited General Interest FLV   PDF
4011A Emerging Companies: CEO on Stage featuring Perceptive Pixel See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Perceptive Pixel - covering the field of imaging. Find this session at 5 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Jeff Han Perceptive Pixel General Interest FLV   PDF
4011B Emerging Companies: CEO on Stage featuring Cinnafilm See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Cinnafilm - covering the field of film. Find this session at 20 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Lance Maurer Cinnafilm, Inc. General Interest FLV   PDF
4011C Emerging Companies: CEO on Stage featuring Total Immersion See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders.
This CEO on Stage session features Total Immersion - covering the field of computer vision. Find this session at 35 minutes into the video.

Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).
Bruno Uzzan Total Immersion General Interest FLV   PDF