GPU Technology Conference 2010 - Recorded Sessions
	Keynotes							More Files Coming Soon
	ID	Title	Abstract	Speakers	Affiliation	Topic Area(s)	Streaming	Downloads
	1001	Opening Keynote with Jen-Hsun Huang, NVIDIA	The opening keynote, features Jen-Hsun Huang, CEO and Co-Founder of NVIDIA and special guests. Hear about what's next in computing and graphics, and preview disruptive technologies and exciting demonstrations from across industries.	Jen-Hsun Huang	NVIDIA	General Interest	Watch now	WMV	Video
	1002	Day 2 Keynote with Dr. Klaus Schluten, University of Illinois at Urbana-Champaign	How does the H1N1 “Swine Flu” virus avoid drugs while attacking our cells? What can we learn about solar energy by studying biological photosynthesis? How do our cells read the genetic code? What comes next in computational biology? Computational biology is approaching a new and exciting frontier: the ability to simulate structures and processes in living cells. Come learn about the “computational microscope,” a new research instrument that scientists can use to simulate biomolecules at nearly infinite resolution. The computational microscope complements the most advanced physical microscopes to guide today’s biomedical research. In this keynote address, computational biology pioneer Dr. Klaus Schulten of the University of Illinois, Urbana-Champaign, will introduce the computational microscope, showcase the widely used software underlying it, and highlight major discoveries made with the aid of the computational microscope ranging from viewing protein folding, translating the genetic code in cells, and harvesting solar energy in photosynthesis. He will also look towards a future when cell tomography and computing will establish atom-by-atom views of entire life forms.	Klaus Schluten	University of Illinois at Urbana-Champaign	General Interest	Watch now	WMV	Video
	1003	Closing Keynote with Dr. Sebastien Thrun, Stanford University and Google	What really causes accidents and congestion on our roadways? How close are we to fully autonomous cars? In his keynote address, Stanford Professor and Google Distinguished Engineer, Dr. Sebastian Thrun, will show how his two autonomous vehicles, Stanley (DARPA Grand Challenge winner), and Junior (2nd Place in the DARPA Urban Challenge) demonstrate how close yet how far away we are to fully autonomous cars. Using computer vision combined with lasers, radars, GPS sensors, gyros, accelerometers, and wheel velocity, the vehicle control systems are able to perceive and plan the routes to safely navigate Stanley and Junior through the courses. However, these closed courses are a far cry from everyday driving. Find out what the team will do next to get one step closer to the “holy grail” of computer vision, and a huge leap forward toward the concept of fully autonomous vehicles.	Sebastien Thrun	Stanford University and Google	General Interest	Watch now	WMV	Video
	4006	Fireside Chat with Jen-Hsun Huang (Co-Founder & CEO, NVIDIA)	Jen-Hsun Huang was joined in a fireside chat by Quentin Hardy, National Editor at Forbes Magazine. They discussed the rise of GPUs, current trends in visual and parallel computing, and the transformational changes ahead for the industry.	Jen-Hsun Huang	NVIDIA	General Interest	Not Available	FLV		PDF
	Pre-Conference Tutorials							More Files Coming Soon
	ID	Title	Abstract	Speakers	Affiliation	Topic Area(s)	Streaming	Downloads
	2004	Languages, APIs and Development Tools for GPU Computing (Pre-Conference Tutorial)	Get a head start on the conference with this first-day introduction to key technologies for GPU Computing. This 90-minute tutorial session will cover the key features and differences between the major programming languages, APIs and development tools available today. Attendees will also learn several high level design patterns for consumer, professional and HPC applications, with practical programming considerations for each.	Will Ramey	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2131	Introduction to CUDA C (Pre-Conference Tutorial)	Starting with a background in C or C++, learn everything you need to know in order to start programming in CUDA C. Beginning with a "Hello, World" CUDA C program, explore parallel programming with CUDA through a number of hands-on code examples. Examine more deeply the various APIs available to CUDA applications and learn the best (and worst) ways in which to employ them in applications. Master the first half of the book "CUDA by Example" as taught by the author, pointing you on a trajectory to complete the second half on your own after course completion.	Jason Sanders	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2018	OpenCL on the GPU (Pre-Conference Tutorial)	OpenCL is Khronos’ new open standard for parallel programming of heterogeneous systems. This tutorial session will introduce the main concepts behind the standard and illustrate them with some simple code walkthrough. Attendees will also learn how to make efficient use of the API to achieve good performance on the GPU.	Cliff Woolley	NVIDIA	Tools & Libraries	Watch now	FLV	MP4
	2157	DirectX 11 Overview (Pre-Conference Tutorial)	This presentation gives an overview of the DirectX 11 pipeline and how it extends previous DirectX versions to enable stunning visual effects in real-time graphics applications.	Cem Cebenoyan	NVIDIA	Computer Graphics	Watch now	FLV	MP4	PDF
	2260	DirectCompute (Pre-Conference Tutorial)	Learn how to to use the DirectCompute API to solve GPU computing problems. This tutorial will introduce the DirectCompute API, cover the recommended best practices for GPU programming, and go over examples of how to use this API efficiently and effectively to solve compute-intensive problems.	Eric Young, Matt Sandy	NVIDIA, Microsoft	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2127	OpenGL (Pre-Conference Tutorial)	This session will discuss the latest OpenGL features offered by NVIDIA for both Quadro and Geforce line of products. Learn more about OpenGL 4 as well as NVIDIA specific OpenGL extensions.	Mark Kilgard	NVIDIA Corporation	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2245	Parallel Nsight for Microsoft Visual Studio (Pre-Conference Tutorial)	NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, you will learn how to use Parallel Nsight to develop GPU computing and graphics applications. Learn how to use the powerful Parallel Nsight debugger to identify errors in CUDA C/C++ kernels and HLSL shaders using GPU breakpoints and direct memory and variable inspection. See how Parallel Nsight displays system-wide performance characteristics, allowing you to create efficient GPU algorithms.	Kumar Iyer	NVIDIA	Tools & Libraries	Watch now	FLV	MP4
	2024	NVIDIA Acceleration Engines Overview (Pre-Conference Tutorial)	Come learn of the software engines NVIDIA freely provides to application developers to rapidly leverage new GPU capabilities and dramatically reduce the time it takes to bring compelling features to end users.	Phillip Miller, Holger Kunz, Brian Harrison, Thomas Ruge	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4
	2261	Introduction to GPU Ray Tracing with NVIDIA OptiX (Pre-Conference Tutorial)	Learn how to use NVIDIA OptiX to quickly develop high performance ray tracing applications for interactive rendering, offline rendering, or scientific visualization. This session will explore the latest available OptiX version.	Dave McAllister, Phillip Miller	NVIDIA	Ray Tracing	Watch now	FLV	MP4
	2158	Programming the NVIDIA Digital Video Pipeline with OpenGL (Pre-Conference Tutorial)	This tutorial session teaches attendees how to program the NVIDIA Quadro Digital Video Pipeline with OpenGL. It will go in-depth into the techniques and recommended practices.	Thomas True	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4
	2159	Programming the NVIDIA Digital Video Pipeline with Direct3D (Pre-Conference Tutorial)	Learn how to program the NVIDIA Quadro Digital Video pipeline using Direct3D. This session will provide an overview of the SDK, discuss device control, data transfers, performance measuring and tuning, ancillary data and application design considerations.	Thomas True	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4
	2010	Implementing Stereoscopic 3D in Your Applications (Pre-Conference Tutorial)	Let's dive into the 3rd dimension. This talk presents a comprehensive technical overview of NVIDIA’s stereo technology and tools. After a complete introduction to NVIDIA’s stereo technology, we will then explore in more detail production techniques for the new artistic space of effects and creativity offered by 3D stereo. The take away of this session will be a solid understanding of NVIDIA’s stereo technology and how to take best advantage of it.	Samuel Gateau, Steve Nash	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	Developer Summit & Research Summit Sessions							More Files Coming Soon
	ID	Title	Abstract	Speakers	Affiliation	Topic Area(s)	Streaming	Downloads
	2015	Efficient Tridiagonal Solvers for ADI methods and Fluid Simulation	Learn about new techniques to efficiently implement the Alternating Direction Implicit method on GPU for large 2D and 3D domains with complex boundaries. A novel tridiagonal solver for systems with variable sizes and a new hybrid approach will be covered in detail. Comprehensive performance analysis and key Fermi optimizations will be explored. Various applications of tridiagonal solvers such as 3D direct numerical fluid simulation and a 2D depth-of-field effect for games will be briefly discussed.	Nikolai Sakharnykh	NVIDIA	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2020	GPU-Accelerated Data Expansion for the Marching Cubes Algorithm	Learn how to accelerate marching cubes on the GPU by taking advantage of the GPU’s high memory bandwidth and fast on-chip shared memory in a data expansion algorithm that can extract the complete iso-surface mesh from (dynamic) volume data without requiring any data transfers back to the CPU.	Gernot Ziegler, Chris Dyken	NVIDIA, SINTEF	Algorithms & Numerical Techniques	Watch now	FLV		PDF
	2021	Efficient Volume Segmentation on the GPU	Explore a new technique in the detection of common regions in a 2D/3D data array. Connected components along the axes are linked before actual label propagation starts. The algorithm is completely gather-based, which allows for several optimizations in the CUDA C implementation. It enables real-time frame rates for the analysis of typical 2D images and interactive frame rates for the analysis of typical volume data.	Allan Rasmusson, Gernot Ziegler	University of Aarhus, NVIDIA	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2038	The Best of Both Worlds: Flexible Data Structures for Heterogeneous Computing	Learn how to switch between array of structs (AoS) and struct of arrays (SoA) storage without having to change the data access syntax. A few changes to the struct and container definitions will enable you to evaluate the performance of AoS vs. SoA on your existing AoS code. We present a simple abstraction that retains the more intuitive AoS syntax array[index]component, yet allows you to switch between AoS and SoA storage with a single template parameter at class definition.	Robert Strzodka	Max Planck Institut Informatik	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2061	Accelerating Explicit FEM Shock & Blast Simulations	Explicit finite element codes are widely used to simulate the response of structures and mechanical equipment subjected to shock, blast and wave propagation phenomena. High resolution models require run times ranging from a few seconds to a few months are common and hence the payoff from GPU acceleration is tremendous. We describe the acceleration of our commercial finite element code NLFLEX using CUDA. We developed GPU kernels in CUDA based on our production code NLFLEX, for linear elasticity, explosives, elasto-plasticity and large deformation elasticity. We attained order of magnitude (10X) acceleration in single precision and approximately (5X) in double precision mode.	Nachiket Gokhale	Weidlinger Associates Inc	Algorithms & Numerical Techniques	Watch now	FLV	MP4
	2068	Parallelizing FPGA Technology Mapping using GPUs	FPGA technology mapping is an algorithm that is heavily data parallel, but contains many features that make it unattractive for GPU implementation. The algorithm uses data in irregular ways since it is a graph-based algorithm. It also makes heavy use of constructs like recursion which is not supported by GPU hardware. In this paper, we take a state-of-the-art FPGA technology mapping algorithm within Berkeley’s ABC package and attempt to parallelize it on a GPU. We show that runtime gains of 3.1x are achievable while maintaining identical quality as demonstrated by running these netlists through Altera’s Quartus II place-and-route tool.	Doris Chen	University of Toronto	Algorithms & Numerical Techniques	Watch now	FLV	MP4
	2084	State of the Art in GPU Data-Parallel Algorithm Primitives	Learn about the importance of optimized data-parallel algorithm primitives as building blocks for efficient real-world applications. Fundamental parallel algorithms like sorting, parallel reduction, and parallel scan are key components in a wide range of applications from video games to serious science. This session will cover the state of the art in data-parallel primitive algorithms for GPUs. Starting with an explanation of the purpose and applications of the algorithms, we will discuss key algorithm design principles, demonstrate current open source algorithm libraries for GPUs (CUDPP and Thrust), describe optimizations using new features in the Fermi architecture, and explore future directions.	Mark Harris	NVIDIA	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2085	Tridiagonal Solvers: Auto-Tuning and Optimizations	In this presentation, we will discuss and analyze the performance of three optimization techniques for tridiagonal solvers. We first present a hybrid Parallel Cyclic Reduction(PCR)-Gaussian Elimination(GE) tridiagonal solver, which combines work-efficient and step-efficient algorithms for high performance. We further discuss an auto-tuned variant of this technique which selects the optimal switching point between algorithms on a per-machine basis. Next, we present a technique to handle large systems, where shared memory constraints prohibit previous work to solve these systems directly. Finally, we will discuss optimizations on a cyclic reduction technique that avoid bank conflicts on current hardware.	Andrew Davidson, Yao Zhang	University of California, Davis	Algorithms & Numerical Techniques	Watch now	FLV	MP4
	2136	Pseudo Random Number Generators for Massively Parallel Apps	Learn how to select the best and fastest pseudo random number generator for your massively parallel Monte Carlo simulation.Pseudo random numbers generators (PRNG) are a fundamental building block of these simulations and it is thus required to select suitable PRNGs with regard to the specific problem at hand while considering the parallel hardware architecture. Recent developments in random number generations provide a wide variety of choices, each with different properties and trade-offs. We provide a comprehensive survey of the current state of the art for massively parallel PRNG and show a broad range of applications.	Holger Dammertz	Ulm University	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2140	Superfast Nearest Neighbor Searches Using a Minimal kd-tree	Learn how to adapt a kd-tree spatial data structure for efficient nearest neighbor (NN) searches on a GPU. Although the kd-tree is not a natural fit for GPU implementation, it can still be effective with the right engineering decisions. By bounding the maximum height of the kd-tree, minimizing the memory footprint of data structures, and optimizing the GPU kernel code, multi-core GPU NN searches with tens of thousands to tens of millions of points run 10-40 times faster than the equivalent single-core CPU NN searches.	Shawn Brown	UNC, Chapel Hill	Algorithms & Numerical Techniques	Watch now	FLV		PDF
	2163	Leveraging GPUs for Evolutionary Game Theory	Learn how GPUs are being used to accelerate the study of the emergence of cooperative behavior in biology, from the interactions of humans to viruses to bacteria. The work presented here achieves a speedup of 209x on a cluster of 4 Tesla GPUs.	Amanda Peters	Harvard University	Algorithms & Numerical Techniques	Watch now	FLV		PDF
	2166	The Triad of Extreme Computing-Fast Algorithms, Open Software and Heterogeneous Systems	The first wave of successful GPU accelerations has been crowded with highly-parallel methods that adapted well to the hardware. But the easy-pickings are now running out. The truly challenging applications require "going back to the algorithmic drawing board." To develop new versions of the most effective fast algorithms, such that our science can most benefit, an ideal environment is created by the open software model, where efforts can be shared. We will describe one area of application --electrostatics of biomolecules in solution-- where we see at work the triad of extreme computing: fast algorithms, open software, and heterogeneous computing.	Lorena Barba	Boston University	Algorithms & Numerical Techniques	Watch now	FLV
	2171	Parallel Algorithms for Interactive Mechanical CAD	The broad objective of our research is to develop mechanical Computer-Aided Design tools that provide interactive feedback to the designer. We have developed GPU algorithms for fundamental CAD operations (NURBS evaluation, surface-surface intersection, separation distance computation, moment computation, etc.) that are one to two orders of magnitude faster, and often more accurate, than current commercial CPU implementations. We will touch on strategies we have employed to meet GPU programming challenges, such as the separation of CPU/GPU operations, imposing artificial structure on computations, and transforming problem definitions to suit GPU-computation models.	Adarsh Krishnamurthy, Sara McMains	University of California Berkeley	Algorithms & Numerical Techniques	Watch now	FLV	MP4	PDF
	2000	Gravitational N-body Simulations: How Massive Black Holes Interact with Stellar Systems	Astrophysics is a field where super computing is a must to obtain new scientific results. in particular, the study of the interaction among massive black holes and surrounding stars is a hot topic, which requires heavy computations to have good representation of what happens in the inner regions of galaxies. We present the results obtained with our high precisioned N-body code, NBSymple, which exploits the joint power of a multi core CPU system together with the high performance NVIDIA Tesla C1060 GPUs. The code is available at the website: astrowww.phys.uniroma1.it/dolcetta/nbsymple.html	Roberto Capuzzo-Dolcetta, Alessandra Mastrobuono Battisti	Sapienza Univ. of Roma	Astronomy & Astrophysics	Watch now	FLV	MP4
	2044	GRASSY: Leveraging GPU Texture Units for Asteroseismic Data Analysis	Learn how to use the hidden computation capability of GPU texture units for general purpose computation. We describe GRASSY, a system for stellar spectral synthesis where the core problem is interpolation between pre-computed intensity value. We map these pre-computed tables to the GPU's texture memory. Interpolation then becomes a texture lookup where the hardware automatically performs the interpolation, albeit at very low precision. Our mathematical framework reasons about the impact of this precision and our performance results show 500X speedups. This work generalizes the GPU texture units as computation engines and opens up new problems for GPU acceleration.	Matt Sinclair	UW-Madison	Astronomy & Astrophysics	Watch now	FLV
	2082	CU-LSP: GPU-based Spectral Analysis of Unevenly Sampled Data	Standard FFT algorithms cannot be applied to spectral analysis of unevenly sampled data. Alternative approaches scale as O(N^2), making them an ideal target for harnessing the raw computing power of GPUs. To this end, I have developed CU-LSP, a CUDA spectral analysis code based on the Lomb-Scargle periodogram. Preliminary benchmarking indicates impressive speed-ups, on the order of 400 relative to a single core of a modern CPU. An initial application of CU-LSP will be the analysis of time-series data from planet-search and asteroseismology satellites.	Richard Townsend	University of Wisconsin-Madison	Astronomy & Astrophysics	Watch now	FLV	MP4
	2099	Cosmology Powered by GPUs Redux	Cosmological simulations aim at reproducing the physical processes which occur on the largest scales of the Universe since the Big-Bang by means of numerical calculations on supercomputers. Using CUDA, I have implemented standard cosmological techniques on GPU architecture (PM N-Body solver, Hydrodynamics & moment-based radiative transfer) and designed them to run on supercomputing facilities by means of MPI+CUDA mixed programming. These applications are able to run on 100 or more graphics devices with typical scalar x50 accelerations and with a communication overhead limited to 15%. It allow to explore physical regimes which were out of reach of current simulations.	Dominique Aubert	Strasbourg University	Astronomy & Astrophysics	Watch now	FLV	MP4
	2108	Binary Black Holes Simulations using CUDA	Get the latest information on how to evolve binary black holes simulations on GPUs.	Abdul Mroue	CITA, Univ. Of Toronto	Astronomy & Astrophysics	Watch now	FLV	MP4
	2178	Using GPUs to Track Changes in the Sun	Learn how GPU computing is enabling astrophysicists to study our closest star. NASA's recently launched Solar Dynamics Observatory is continuously streaming full-disk images of the Sun at visible, UV and EUV wavelengths. This presentation will discuss ways that GPU computing is helping scientists cope with the analysis of the immense data volumes as well as in numerical modeling of the Sun.	Mark Cheung	Lockheed Martin Solar & Astrophysics Laboratory	Astronomy & Astrophysics	Watch now	FLV		PDF
	2042	Interactive 3D Audio Rendering Systems	Learn how to leverage GPUs for interactive audio rendering. This session will give a short overview of the architecture of current GPUs, emphasizing some key differences between GPU and CPUs programming models for audio processing. We will illustrate the benefits of GPU-accelerated audio rendering with results from 3D audio processing and sound scattering simulations. Finally, we will discuss best practices for GPU implementations as well as future opportunities for audio rendering on massively parallel architectures.	Nicolas Tsingos	Dolby Laboratories	Audio Processing	Watch now	FLV	MP4	PDF
	2076	Implementing CUDA Audio Networks	Learn how to implement a commercial software library that exploits CUDA for audio applications. We focus on the overall threading architecture and the underlying math for implementing general purpose audio processing in CUDA devices. Covers the use of inter-process communication to make a plug-in implementation loadable in 32 bit hosts installed in 64 bit systems, distributing the GPU load on remote servers, and creating a CUDA network for high-end purposes such as a big recording facility.	Giancarlo Del Sordo	Acustica Audio	Audio Processing	Watch now	FLV	MP4	PDF
	2116	Real-time Multichannel Audio Convolution	Learn how a synthesis of 3D sound scenes can be achieved using a peer-to-peer music streaming environment and GPU. We will discuss the technical and cost benefits to this approach, while noting that it frees the CPU for other tasks.	Jose Antonio Belloch, Alberto Gonzalez, Antonio M. Vidal	Institute of Telecommunications and Multimedia Applications, Universidad Politecnica de Valencia	Audio Processing	Watch now	FLV	MP4	PDF
	2026	MatCloud: Accelerating Matrix Math GPU Operations with SaaS	We present MatCloud (www.mat-cloud.com), a cloud infrastructure and service for scientific computing using state-of-the-art GPU clusters. MatCloud is a service infrastructure exposed by a simple web terminal interface to run Matlab-like commands/scripts. Join us to see how GPU technology can not only be applied to cloud computing community, but also boost the adoption of cloud computing for its dramatic performance gains over traditional cloud infrastructures.MatCloud is an in-progress academic project and is under active development.	Xing Wu, Frank Mueller	North Carolina State University	Cloud Computing	Watch now	FLV	MP4
	2243	Microsoft RemoteFX - GPU Virtualization for Desktop Centralization	Learn about Microsoft's upcoming GPU Virtualization feature, RemoteFX, which will ship in Windows Server 2008 R2 SP1. Microsoft RemoteFX enables GPUs to be hosted in the datacenter as a service that can be shared by multiple users for streaming the real-time and complete Windows 7 desktop experience to ultra-lightweight client devices anywhere on the corporate network. With Microsoft RemoteFX, users will be able to work remotely in a Windows Aero desktop environment, watch full-motion video, enjoy Silverlight animations, and run 3D applications – all with the fidelity of local-like performance.	Tad Brockway	Microsoft	Cloud Computing	Watch now	FLV
	2022	Solving PDEs on Regular Grids with OpenCurrent	OpenCurrent is an open source library with support for structured 3D grids and various PDE solvers that operate on them, including a multigrid Poisson solver and an incompressible Navier-Stokes solver. It also includes extensions for splitting grids across multiple GPUs. This talk will provide a basic introduction to the code base and its design principles.	Jonathan Cohen	NVIDIA Research	Computational Fluid Dynamics	Watch now	FLV	MP4
	2037	Numtech & GPGPU, a SME Point of View	Hear why and how Numtech, a french SME working in the field of atmospheric dispersion and expertise of meteorological events, is benchmarking GPGPU for its futures applications. A compressible and an incompressible interactive flow solvers are described.	Vivien Clauzon		Computational Fluid Dynamics	Watch now	FLV	MP4
	2045	Roe-Pike Scheme for 2D Euler Equations	Hear how we are improving our elsA and CEDRE computational fluid dynamics software by working on solving the Euler equations set on the GPU. We discuss how our implementation considers the associated Riemann problem and the Roe-Pike differencing scheme at several orders in space while also introducing immerse boundary conditions. Covers the significant speedup obtained through algorithmic and computational optimizations.	Matthieu Lefebvre	ONERA	Computational Fluid Dynamics	Watch now	FLV	MP4
	2049	Deflated Preconditioned Conjugate Gradient on the GPU	Explore how to use deflation as a second level preconditioning technique to speed up Block Incomplete Cholesky Preconditioned Conjugate Gradient Method. We use it to solve the Pressure correction equation involved in the solution of the Two-Phase Fluid Flow problem. Our implementation reaches speedup factors between 25-30, for more than 260,000 unknowns, when compared to the CPU.	Rohit Gupta, Kees Vuik	Delft University Of Technology	Computational Fluid Dynamics	Watch now	FLV	MP4	PDF
	2058	A Practical Introduction to Computational Fluid Dynamics on GPUs	Learn step-by-step procedures to write an explicit CFD solver based on final difference methods with staggered grid allocations and boundary fitted coordinates. We will discuss the derivation of the mathematical model, discretization of the model equations, development of the algorithms, and parallelization and visualization of the computed data using OpenCL and OpenGL. Compares case studies of natural convection, driven cavity, scaling analysis, and magneto-thermal convection computed using CSIRO's CPU/GPU supercomputer cluster to known analytical and experimental solutions.	Tomasz Bednarz, Con Caris, John Taylor	CSIRO	Computational Fluid Dynamics	Watch now	FLV	MP4	PDF
	2078	Shockingly fast and accurate CFD simulations	In the last three years we have demonstrated how GPU accelerated discontinuous Galerkin methods have enabled simulation of time-dependent, electromagnetic scattering from airplanes and helicopters. In this talk we will discuss how we have extended these techniques to enable GPU accelerated simulation of supersonic airflow as well.	Timothy Warburton	Rice University	Computational Fluid Dynamics	Watch now	FLV	MP4
	2079	A Fast, Scalable High-Order Unstructured Compressible Flow Solver	We will describe a scalable and efficient high-order unstructured compressible flow solver for GPUs. The solver allows the achievement of arbitrary order of accuracy for flows over complex geometries. High-order solvers require more operations per degree of freedom, thus making them highly suitable for massively parallel processors. Preliminary results indicate speed-ups up to 70x with the Tesla C1060 compared to the Intel i7 CPU. Memory access was optimized using shared and texture memory.	David M. Williams, Patrice Castonguay	Stanford University	Computational Fluid Dynamics	Watch now	FLV	MP4
	2083	GPU Accelerated Solver for the 3D Two-phase Incompressible Navier-Stokes Equations	This demonstrates the potential of GPUs for solving complex free surface flow problems using level set methods. These methods are capable of producing complex surface deformations, and therefore are used widely in computer graphics, as well as engineering applications. This work demonstrates that GPUs can be used to accelerate the most computationally expensive part of free surface flow calculations, and therefore allows much larger problems to be solved on workstation machines than was previously possible. These techniques will be exemplified by our current project to port our in-house fluid solver NaSt3DGPF to the GPU.	Peter Zaspel	University of Bonn	Computational Fluid Dynamics	Watch now	FLV
	2103	Development of an Efficient GPU-Accelerated Model for Fully Nonlinear Water Waves	This work is concerned with the development of an efficient high-throughput scalable model for simulation of fully nonlinear water waves (OceanWave3D) applicable to solve and analyze large-scale problems in coastal engineering. The goal can be achieved through algorithm redesign and parallelization of an optimized sequential single-CPU algorithm based on a flexible-order Finite Difference Method. High performance is pursued by utilizing many-core processing in the model focusing on GPUs for acceleration of code execution. This involves combining analytical methods with an algorithm redesign of the current numerical model.	Allan Peter Engsig-Karup	Technical University of Denmark	Computational Fluid Dynamics	Watch now	FLV	MP4
	2106	Particleworks: Particle-based CAE Software on Multi-GPU	Prometech Software, Inc. is an university launched technology venture in Japan and has been working in the field of particle-based computational fluid dynamics for several years. Through collaboratinos with major automotive and material companies in Japan, Prometech has implemented our Particle technology on Multi-GPU and delivered as a CAE software, "Particleworks". In this session, we will discuss the theoretical background of our simulation (MPS; Moving Particle Simulation method), Multi GPU programming techniques of sparse matrix solver, performance results of Particleworks and the analysis examples of the Auto and Material.	Issei Masaie	Prometech Software, Inc.	Computational Fluid Dynamics	Watch now	FLV	MP4
	2110	Acceleration of a Novel Rotorcraft Wake Simulation	Dive deep as we present the details of a new CUDA-based algorithm for accurate rotorcraft wake simulations. We use a vortex particle method, accelerated with a multipole tree algorithm, combined with a traditional grid-based CFD code. This CUDA algorithm can evaluate the velocity and velocity-gradient with an effective throughput approaching 300 billion interactions per second on a C1060. This gives 10x speed-up and 2.5x better accuracy compared to the parallel CPU version.	Christopher Stone	Intelligent Light	Computational Fluid Dynamics	Watch now	FLV	MP4
	2118	Large-scale Gas Turbine Simulations on GPU Clusters	This talk describes a strategy for implementing structured grid PDE solvers on GPUs. Techniques covered include the use of source-to-source compilation and the use of sparse matrix vector multiplications for complicated boundary conditions. A new production-quality solver for flows in turbomachines called Turbostream that uses these techniques is presented. The impact of the use of GPUs on the turbomachinery design process is demonstrated by two 64-GPU simulations that have recently been performed on the University of Cambridge's GPU cluster.	Tobias Brandvik	University of Cambridge	Computational Fluid Dynamics	Watch now	FLV	MP4
	2170	Lattice Boltzmann Multi-Phase Simulations in Porous Media using GPUs	Learn how a very efficient implementation of multiphase lattice Boltzmann methods (LBM) based on CUDA delivers significant benefits for predictions of properties in rocks. This simulator on NVIDIA hardware enables us to perform pore scale multi-phase (oil-water-matrix) simulations in natural porous media and to predict important rock properties like absolute permeability, relative permeabilites, and capillary pressure. We will show videos of these simulations in complex real world porous media and rocks.	Jonas Toelke	Ingrain	Computational Fluid Dynamics	Watch now	FLV		PDF
	2206	Accelerated Computational Fluid Dynamics Employing GPUs	None provided.	Daniel Gaudlitz	FluiDyna	Computational Fluid Dynamics	Watch now	FLV	MP4
	2234	Unstructured Finite Volume Code on a Cluster with Multiple GPUs per Node	Explore how a code written to run in parallel using OpenMP and on a single GPU was modified to run across multiple GPUs and nodes on a multi-CPU, multi-GPU cluster installed at the Naval Research Laboratory. We will discuss the performance of this code running in parallel using MPI/OpenMP and MPI/CUDA.	Keith Obenschain, Andrew Corrigan	Naval Research Lab Code 6440	Computational Fluid Dynamics	Watch now	FLV		PDF
	2239	Fast GPU Preconditioning for Fluid Simulations in Film Production	Explore how a less efficient, but highly parallel algorithm can still be a superior alternative to a sequential CPU method. This talk will present a simple CUDA-based Poisson solver to the conjugate gradient method designed for solving well-conditioned matrices such as those that arise from the pressure projection stage of a Navier-Stokes fluid solver. In contrast to other active areas of research in this field, we show that a more brute force approach can still significantly out-perform the best CPU alternatives by sacrificing a high convergence rate in place of achieving much faster iterations.	Dan Bailey	Double Negative	Computational Fluid Dynamics	Not Available	FLV		PDF
	2292	Implementation of High-Order Adaptive CFD Methods on GPUs	A discontinuous high-order formulation named the Correction Procedure via Reconstruction (CPR) is recently implemented on Nvidia GPUs. The CPR formulation is related to the discontinuous Galerkin (DG) method, and unifies several methods such as the DG, spectral volume and spectral difference into a single framework efficient for hybrid meshes. In preliminary 2D inviscid flow computations, a single GPU has been able to deliver a speedup of 44 over a CPU of the same generation. Extension is being made for viscous flow computation, and results will be presented at the final presentation.	Z.J. Wang, Lizandro Solano, Arun Somani	Iowa State University	Computational Fluid Dynamics	Watch now	FLV
	2295	Large-scale CFD Applications and a Full GPU Implementation of a Weather Prediction Code on the TSUBAME Supercomputer	Many CFD applications have been successfully accelerated on GPUs, but for large-scale simulations that require memory beyond a single GPU, communication is required between GPUs over cluster nodes through PCI-Express and interconnects. To overcome performance bottlenecks and preserve parallel scalability, an overlapping technique between computation and communication is essential. This work presents results of an LBM for incompressible flow, and a Tsunami simulation solving the shallow water equation for simulations on the NVIDIA Tesla-based TSUBAME supercomputer of Tokyo Tech. In addition results will be presented on a complete GPU implementation of a production-level weather prediction code developed by the JMA that achieves 15 TFLOPS for an 80-fold speedup.	Takayuki Aoki	Tokyo Institute of Technology	Computational Fluid Dynamics	Watch now	FLV	MP4
	2056	Next-Generation Rendering with CgFX	Dive into the details of using CgFX – Cg’s effect framework – to combine ray-tracing with real-time rendering and enable the next generation of complex high-quality rendering. You will learn how to use CgFX to create complex rendering effects in a concise and elegant fashion by: Blending material-level and scene-level effects in a consistent way,- Seamlessly integrating CUDA-based data processing within the CgFX rendering pipeline,Mixing OptiX-based rendering with CgFX and OpenGL.	Tristan Lorach	NVIDIA	Computer Graphics	Watch now	FLV	MP4
	2071	Large Scale Visualization Soup	The unprecedented realism that is possible today allows for visualization at an ever larger scale. This talk will walk through several case studies from high resolution single displays to completely immersive environments. Details will be shared on how to architect and implement these installations, with attention to the typical issues encountered. It will cover how to implement stereo 3D in OpenGL, Direct3D, as well as how that relates to the different display technologies (projectors, multi-display, CAVEs, etc.)	Steve Nash	NVIDIA	Computer Graphics	Watch now	FLV	MP4	PDF
	2129	Hardware Subdivision and Tessellation of Catmull-Clark Surfaces	See how the new DirectX 11 Hardware Tessellation and Compute Shader can be used to implement an adaptive Catmull-Clark subdivision surface renderer. We use a table driven approach to performing Catmull-Clark subdivision in parallel utilizing one thread per output mesh vertex.	Charles Loop	Microsoft Research	Computer Graphics	Watch now	FLV	MP4	PDF
	2134	Ultra High Resolution Displays and Interactive Eyepoint Using CUDA	We'll go over the challenges we have overcome in building 100 million pixel seamless displays. One customer requirement involves interactive changes of the eyepoint as a person moves, relative to the screen, yet the distortions computed are quite non-linear. We discuss our use of a gpu to implement this procedure.	Rajeev Surati	Scalable Display Technologies	Computer Graphics	Watch now	FLV	MP4
	2152	Using Virtual Texturing to Handle Massive Texture Data	A virtual texture implementation allows applications the ability to manage gigantic amounts of texture data for rendering complex data sets. However, practical utilization involves feeding it adequate data. The GPU offers a powerful engine capable of accelerating the transcoding of efficient storage formats into formats useful for rendering. This session will demonstrate a virtual texturing implementation and the steps needed to GPU accelerate the non-rendering portions of managing and loading the virtual texture data.	Evan Hart, Johannes van Waveren	NVIDIA, id Software	Computer Graphics	Watch now	FLV	MP4	PDF
	2161	NVIDIA Quadro Digital Video Pipeline Overview	This session will provide an overview of the Quadro Digital Video Pipeline. It will cover a description of the DVP components, application architectures software architectures, and programming resources available.	Thomas True	NVIDIA	Computer Graphics	Watch now	FLV	MP4
	2162	Real-time Reyes: Programmable Rendering on Graphics Processors	We present a discussion of ideas and techniques behind programmable graphics pipelines on modern GPUs, specifically the example design of a real-time Reyes renderer. Walking through this example, we address the philosophy beneath programmable GPU graphics, the broad strategy for the specific pipeline, and algorithmic and implementation-level details for key rendering stages. We cover several issues concerning GPU efficiency, including those involving work scheduling, parallelization of traditional stages, and balancing of rendering workloads. We expect the audience to gain an in-depth exposure of the state of research in programmable graphics, and an insight into efficient pipeline design for irregular workloads.	Anjul Patney, Stanley Tzeng	University of California, Davis	Computer Graphics	Watch now	FLV
	2165	Rendering Revolution	Learn how GPU technologies are transforming the making of pixels. This talk will cover GPU-centric rendering techniques that leverage both the raw computational capabilities of NVIDIA’s GPUs and advanced pixel-shading techniques for interactive visualization and rendering.	Ken Pimentel	Autodesk	Computer Graphics	Watch now	FLV	MP4	PDF
	2227	OpenGL 4.0 Tessellation for Professional Applications	The new generation of accelerated graphics is elevating visual computing to new heights. Tessellation, one of its most anticipated features, is already used in many scenarios to bring 3D graphics to an unprecedented level of realism. This talk will introduce tessellation using OpenGL 4.0. We will also describe how an existing application can be adapted to efficiently take advantage of this new feature and also how to overcome some of the challenges.	Philippe Rollin	NVIDIA	Computer Graphics	Watch now	FLV	MP4	PDF
	2308	Building Cutting-Edge Realtime 3D Applications with NVIDIA SceniX	Learn how NVIDIA SceniX is a rapid start to building state of the art, realtime 3D applications, and how raytracing can be combined with raster graphics for new levels of interactive realism.	Brian Harrison, Michael Morrison	NVIDIA	Computer Graphics	Watch now	FLV
	2029	Computer Vision Algorithms for Automating HD Post-Production	Discover how post-production tasks can be accelerated by taking advantage of GPU-based algorithms. In this talk we present computer vision algorithms for corner detection, feature point tracking, image warping and image inpainting, and their efficient implementation on GPUs using CUDA. We also show how to use these algorithms to do real-time stabilization and temporal re-sampling (re-timing) of high definition video sequences, both common tasks in post-production. Benchmarking of the GPU implementations against optimized CPU algorithms demonstrates a speedup of approximately an order of magnitude.	Hannes Fassold	JOANNEUM RESEARCH	Computer Vision	Watch now	FLV	MP4	PDF
	2065	Massively Accelerating Iterative Gauss-Newton Fitting	To measure three-dimensional shape data of objects, we build up a measurement system that assigns three-dimensional coordinates to the position of projected measurement labels in a camera image. To achieve high measurement accuracy across high amounts of measurement points, we need a very quick routine to localize measurement labels with high precision. To speed up the computation, we evaluate the fits using the CUDA architecture. The final implementation speeds up the fitting of 104 two-dimensional Gauss functions by a factor of 90.	Daniel Härter	University of Freiburg, IMTEK, Laboratory for Process Technology	Computer Vision	Watch now	FLV	MP4
	2114	Cascaded HOG on GPU	We propose a real time HOG based object detector implemented on GPU. To accelerate the detection process, the proposed method uses two serially-cascaded HOG detectors. The first low dimensional HOG detector discards detection windows obviously not showing target objects. It reduces the computational cost of the second high dimensional HOG detector. This method tested on 640x480 color image and the same size movie. The computation time decreases to 70ms per image. That is 4 times faster than a case of single detector. This method provides real time performance even on middle end GPUs such as GeForce GTS 250.	Kento Tarui	AquaCast Corporation	Computer Vision	Watch now	FLV	MP4
	2123	Enabling Augmented Reality with GPU Computing	This talk will take a detailed look at Sportvision's “First and 10” system, perhaps the most widely experienced example of AR ever, with 106 million viewers during the 2010 Superbowl alone. We'll examine the current implementation and the GPU features that enable low latency, video-rate performance.	Ryan Ismert	Sportvision, Inc.	Computer Vision	Watch now	FLV	MP4
	2132	Accelerating Biologically Inspired Computer Vision Models	Join us for a discussion on applying commodity-server-based clusters and GPU-based clusters to simulating computer vision algorithms at a scale that approaches that of biological vision. We consider the limitations of each technology, survey approaches taken thus far, and suggest new hybrid models and programming frameworks to overcome current limitations and substantially improve performance.	Tom Dean	Google Inc.	Computer Vision	Watch now	FLV	MP4	PDF
	2173	Enabling Large-Scale CCTV Face Recognition	Learn how to use CUDA and GPGPU to perform large scale face search for both forensics as well as CCTV face recognition.	Abbas Bigdeli, Ben Lever	NICTA	Computer Vision	Watch now	FLV	MP4
	2204	Bridging GPU Computing and Neuroscience to Build Large-Scale Face Recognition on Facebook.	Biologically-inspired computer vision algorithms – those that aim to mirror the computations performed by the brain's visual system – have emerged as exceptionally promising candidates in object and face recognition research, achieving performance on a range of object and face recognition tasks. Recently, we have begun harnessing the newly-available power of NVIDIA GPUs to tackle the problem of biologically-inspired model selection within a largescale model search framework, drawing inspiration from high-throughput screening approaches in molecular biology and genetics where a large number of organisms are screened in parallel for a given property of interest. As the available computational power provided by massively paralleltechnology from NVIDIA continues to expand, w e hope that this research will hold great potential for new social networking applications in addition to rapidly accelerating progress in artificial vision, and for generating new, experimentally testable hypotheses for the study of biological vision.	Nicolas Pinto, David Cox	MIT, Harvard University	Computer Vision	Watch now	FLV
	2209	Accelerating Computer Vision on the Fermi Architecture	GPUS have evolved from fixed function to general purpose, and continue to evolve with new features being added in every generation. This talk will discuss how to exploit the new features introduced by the Fermi architecture (such as concurrent kernel execution, writes to texture) to accelerate computer vision algorithms.	James Fung	NVIDIA	Computer Vision	Watch now	FLV	MP4
	2215	Extending OpenCV with GPU Acceleration	OpenCV is a widely popular computer vision library, with millions of downloads and hundreds of thousands of users. Applications span many industries including robotics, industrial machine vision, automotive, film & broadcast, medical, and consumer applications. NVIDIA and the OpenCV development team are collaborating to provide CUDA implementations of the most demanding algorithms, thus enabling a new level of real-time capability and higher quality results. This talk with introduce OpenCV, and summarize the new CUDA enabled capabilities, and provide an overview of future plans.	Joe Stam	NVIDIA	Computer Vision	Watch now	FLV	MP4
	2242	Swarming Bacteria and Diffusing Particles: High-Throughput Analysis of Microscopic 3D Motion	Ever since the 1827 discovery of Brownian motion by observing pollen grains, quantifying motion under the microscope has led to breakthroughs in physics, biology and engineering. Here, I present methods we have developed using confocal microscopy to deduce 3D structure and dynamics from 2D image sequences. We analyze the motion of diffusing colloidal particles and swarms of bacteria free to swim in 3D, which we observe at the single-organism level. We rely heavily on GPU computing to process our large data sets, making extensive use of NPP, CuFFT and optical-flow CUDA algorithms originally developed for machine vision in automobiles.	Peter Lu	Harvard University	Computer Vision	Watch now	FLV
	2298	Accelerated Image Quality Assessment using Structural Similarity	Explores the GPU porting and performance analysis of the image quality assessment algorithm based on structural similarity index(SSI). This index is a powerful tool for image quality assessment and the algorithm is highly suitable for GPU architecture, offering a rapid image quality assessment in many image restoration applications.	Mahesh Khadtare	CRL India	Computer Vision	Watch now	FLV
	2069	GPU-Accelerated Business Intelligence Analytics	Join us and learn why GPU computing is a game changer for business intelligence (BI). We will discuss how GPUs can be used to accelerate BI analytics at much lower cost, higher performance, and better power efficiency than other alternatives.	Ren Wu	HP Labs	Databases & Data Mining	Watch now	FLV	MP4
	2092	Integrating CUDA into a Large-Scale Commercial Database Management System	In a large-scale database installation where data tables are distributed across multiple servers, computational throughput can be optimized by using GPUs on each server and integrating database management with GPU resources. In the Department of Physics and Astronomy at The Johns Hopkins University, we are experimenting with a set of software tools that closely couple SQL statements with GPU functionality. While still under development, the new framework is now routinely used in our research projects, e.g., to study the spatial clustering of galaxies as well as genomics.	Richard Wilton, Tamas Budavari, Alex Szalay	The Johns Hopkins University	Databases & Data Mining	Watch now	FLV	MP4
	2237	Accelerating Business Intelligence Applications with Fast Multidimensional Aggregation	In this research session, we present an approach using NVIDIA GPUs as massively parallel coprocessors for in-memory OLAP computations. Early tests have shown speedup factors of more than 40x compared to optimized sequential algorithms on a CPU. In addition to the data structures and algorithms involved, we describe a method to extend the approach to systems with more than one GPU in order to scale it to larger data sets.	Tobias Lauer, Christoffer Anselm	University of Freiburg, Jedox AG	Databases & Data Mining	Watch now	FLV
	2013	iray - GPUs and the Photorealistic Rendering Revolution	Hear about the ongoing revolution in the production of photorealistic imagery being powered by GPUs. We will explore the algorithms and concepts behind iray – a CUDA accelerated software library from mental images/NVIDIA that provides an interactive, push-button, fast synthetic digital camera in software to a variety of OEM applications and platforms. We will demonstrate iray embedded in commercial CAD and Digital Content Creation applications as well as in 3D cloud computing platforms.	Michael Kaplan, Tamrat Belayneh	mental images/NVIDIA, ESRI	Digital Content Creation (DCC)	Watch now	FLV	MP4
	2222	Working Man's Guide to 3D Video Editing	Video editing is currently at two simultaneous inflections points: use of GPUs for video processing and the beginning of wide spread adoption of 3D. At this time however, identifying and navigating through the necessary tools and equipment to create compelling 3D video content is challenging. This session is intended to provide a pragmatic guide to creating prosumer 3D video content and how the GPU greatly assists and speeds up this process. The intended audience is anyone interested in how to create compelling 3D movies at a prosumer level.	Ian Williams, Kevan O'Brien	NVIDIA	Digital Content Creation (DCC)	Watch now	FLV	MP4	PDF
	2279	Working Man's Guide to 3D Video Editing	Video editing is currently at two simultaneous inflections points: use of GPUs for video processing and the beginning of wide spread adoption of 3D. At this time however, identifying and navigating through the necessary tools and equipment to create compelling 3D video content is challenging. This session is intended to provide a pragmatic guide to creating prosumer 3D video content and how the GPU greatly assists and speeds up this process. The intended audience is anyone interested in how to create compelling 3D movies at a prosumer level.	Ian Williams, Rudy Sarzo, Kevan O'Brien	NVIDIA, SMI, NVIDIA	Digital Content Creation (DCC)	Watch now			PDF
	2305	PantaRay: Accelerating Out-Of-Core Ray Tracing of Sparsely Sampled Occlusion	Modern VFX rendering pipelines are faced with major complexity challenges: a film like Avatar requires rendering hundreds of thousands of frames, each containing hundreds of millions or billions of polygons. Furthermore, the process of lighting requires many rendering iterations across all shots. In this talk, we present the architecture of an efficient out-of-core ray tracing system designed to make rendering precomputations of gigantic assets practical on GPUs. The system we describe, dubbed PantaRay, leverages the development of modern ray tracing algorithms for massively parallel GPU architectures and combines them with new out-of-core streaming and level of detail rendering techniques.	David Luebke, Sebastian Sylwan	NVIDIA, Weta Digital	Digital Content Creation (DCC)	Not Available
	2175	Hello GPU: High-Quality, Real-Time Speech Recognition on Embedded GPUs	In this presentation, we will talk about our experiences of implementing an end-to-end automatic speech recognition system that runs in faster than real-time on embedded GPUs, targeted towards small form-factor consumer devices. Focusing specifically on some of the challenges encountered during the design process, a major portion of our talk will focus on giving insights into modifications we made to well-established speech algorithms to fit well within the GPU programming model. We will show how these changes helped us in realizing a highly optimized system on platforms with limited memory bandwidth and compute resources.	Kshitij Gupta	UC Davis	Embedded & Automotive	Watch now	FLV	MP4	PDF
	2303	Using Tegra to Solve The Electric Car Power Dilemma	Explore how advanced SoC technologies are transforming the world of automotive industry. Learn on how using nVidia Tegra increased the available range while pushing the envelope on next-gen driver experience. Sharing the lessons learned in the world of electric cars and challenges in constructing a mass production electric vehicle.	Theo Valich	Bright Side Network Inc	Embedded & Automotive	Watch now	FLV	MP4	PDF
	2304	Harnessing the GPU to Accelerate Automotive Development	Learn how GPU technologies broke speed limits in automotive development. By using GPU-accelerated tools, small team of engineers created a complete certifiable vehicle in only two years, using fraction of the budget used in conventional industry. Talk will cover tools and techniques used in creation of XD concept, as well as how to overcome challenges moving a product from concept to mass production stage.	Theo Valich	Bright Side Network Inc	Embedded & Automotive	Watch now	FLV	MP4
	2014	Scalable Subsurface Data Visualization Framework	Mental Images’ DiCE-based geospatial library is a CUDA and cluster-based visualization framework that enables scalable processing and rendering of huge amounts of subsurface data for interactive seismic interpretation. Geospatial exploration in the oil and gas industries is concerned with scanning the earth’s subsurface structure for detecting oil and for cost-effective drilling of detected oil reservoirs. Efficient seismic interpretation requires the interpreters to be able to interactively explore huge amounts of volumetric seismic information with embedded stacked horizons to gain visual insight into the subsurface structure and to determine where oil recovery facilities and drilling infrastructure shall be built.	Tom-Michael Thamm, Marc Nienhaus	mental images GmbH	Energy Exploration	Watch now	FLV	MP4
	2059	Industrial Seismic Imaging on GPUs	At Hess Corporation, we have moved the most computationally intensive parts of our seismic imaging codes from CPUs to GPUs over the past few years. In this talk I will give an overview of seismic imaging, highlighting the physical and computational algorithms of these codes. I will discuss our software approach and the programming effort to port them to GPUs, concluding with a summary of our progress in adopting GPUs in production.	Scott Morton	Hess Corporation	Energy Exploration	Watch now	FLV	MP4
	2141	Moving the Frontier of Oil and Gas Exploration and Production with GPUs	Learn how the Oil and Gas Industry is embracing GPUs in order to tackle new and complex oil and gas plays around the world. The first part of this talk gives an overview of the business and geopolitical drivers of the industry, followed with the critical contribution of computation in the quest for secure supply of energy.	Maurice Nessim, Shashi Menon	Schlumberger	Energy Exploration	Not Available
	2142	Complex Geophysical Imaging Algorithms Enabled by GPU technology	Learn how computational expensive geophysical methods with 100s of TB of data become a commercial reality through the adoption of GPUs. The first part of the talk will give an overview of the computational challenges for imaging facing the oil and gas industry. The second part will show how the current most advanced methods are taking advantage of the GPU technology.	David Nichols	Schlumberger	Energy Exploration	Not Available
	2174	Reverse Time Migration on GPUs	Learn how GPUs can be used to accelerate subsurface imaging for Oil & Gas exploration. We will discuss results and lessons learned while implementing a Reverse Time Migration algorithm on GPUs achieving significant performance improvements over a comparable CPU implementation.	Alex Loddoch	Chevron	Energy Exploration	Not Available
	2226	Reverse Time Migration with GMAC	Get a close look at implementing Reverse Time Migration (RTM) applications across multiple GPUs. We will focus on how RTM applications can be scaled using the GMAC asymmetric distributed shared memory (ADSM) library to break the problem into manageable chunks. We will provide an introduction to GMAC and discuss handling boundary conditions and using separate kernels to improve efficiency.	Javier Cabezas, Mauricio Araya	Barcelona Supercomputing Center	Energy Exploration	Watch now	FLV
	2072	GPUs at the Computer Animation Studio	Learn five simple ways in which GPUs have been adopted in the production pipeline at Blue Sky Studios. Covers how we use GPUs to improve animation tools, add real-time anaglyph support, and accelerate noise functions including code samples from production tools.	Hugo Ayala	Blue Sky Studios	Film	Not Available
	2125	Developing GPU Enabled Visual Effects For Film And Video	The arrival of fully programable GPUs is now changing the visual effects industry, which traditionally relied on CPU computation to create their spectacular imagery. Implementing the complex image processing algorithms used by VFX is a challenge, but the payoffs in terms of interactivity and throughput can be enormous. Hear how The Foundry's novel image processing architecture simplifies the implementation of GPU-enabled VFX software and eases the transition from a CPU based infrastructure to a GPU based one.	Bruno Nicoletti	The Foundry	Film	Watch now	FLV		PDF
	2284	GPU implementation of Collision-Based Deformation	Addressing the production needs for the upcoming Disney animated movie, we are in the process of developing a new Maya deformer that incorporates state-of-the-art collision-based deformations. Our deformer includes both dynamic and quasi-static solutions. Our solvers conserves volume and constrains surface area by solving linear systems in a graded volume mesh. To achieve realistic deformation in production-ready data at interactive rates, we leverage the computational power of the NVIDIA GPU architecture using CUDA. Our underlying data structure is specifically designed and optimized for CUDA (i.e. coalescing data access, minimizing CPU-GPU interaction, utilizing shared memory).	Dmitriy Pinskiy, Garrett Aldrich	Walt Disney Animation Studios	Film	Not Available
	2285	Walt Disney Animation Studios' GPU-Acelerated Animatic Lighting Process with Soft Shadows and Depth of Field	See how Walt Disney Animation's software uses OpenGL and GLSL shaders to interactively display depth of field, accurate lighting, and soft shadows in the Maya viewport. Learn how this improved our animatic process and helps us make better animated movies. We'll show the tools in action and show the progression of a shot from standard Maya to final animatic look, and will compare the result with a production Renderman render. We'll also walk you through the GLSL shader render passes it uses to do deferred lighting and shadowing.	David Adler	Walt Disney Animation Studios	Film	Not Available
	2032	Practical Methods Beyond Monte Carlo in Finance	Murex will share its practical experience using GPUs to accelerate high-performance analytics based on GPU-enabled Monte Carlo and PDE methods. We will also briefly describe Murex’s experience developing a high-level payoff scripting language that allows user-definable payoffs for single and cross-asset instruments.	Pierre Spatz	Murex SAS	Finance	Watch now	FLV	MP4	PDF
	2033	Accelerating Pricing Models with virtual GPUs	Join Citadel to explore our three year undertaking on the feasibility of GPGPU computing for option pricing. We will discuss our 140X performance boost and the hurdles we had to overcome to integrate GPUs into our existing infrastructure. Please note that our talk will not get into the details of the model (that’s proprietary information), but we will share our innovative solution to drive a grid of virtual GPUs.	Scott Donovan	Citadel Investment Group	Finance	Watch now	FLV	MP4	PDF
	2040	Derivatives & Bond Portfolio Valuation in a Hybrid CPU/GPU Environment	Learn how to compute traditional end of day computations in real time through the use of a hybrid GPU/CPU computing environment. We will detail how computing intensive tasks are delegated to the GPU while interface issues are dealt with by the CPU. We will discuss our methodology consisting of the following three components: (1) valuations; (2) by tenor risk measures; and (3) full distributions allowing for more complex analytics such as exotic options products valuation and counterparty value adjustments calculation.	Peter Decrem	Quantifi	Finance	Watch now	FLV	MP4
	2063	Banking on Monte Carlo… and Beyond	Last year NAG presented spectacular results for Monte Carlo techniques on GPUs using NAG’s GPU library. This year we will talk about new projects in the areas of Monte Carlo and PDE techniques, delivering additional benefits to the finance industry for real-world problems, including credit modeling.	Ian Reid	NAG	Finance	Watch now	FLV		PDF
	2064	Correlated Paths for Monte Carlo Simulations	Learn how the GPU can be deployed to generated correlated paths for Monte Carlo simulation. Using Asian Basket options as an example, the session shows the generation of correlated paths with a local volatility model for each of the underlying assets. Once the paths have been computed, the payoff in each scenario is computed and reduced to determine the expected value, all on the GPU.	Thomas Bradley	NVIDIA	Finance	Watch now	FLV	MP4	PDF
	2077	Catastrophic Risk Management: Fast and Flexible with GPU Analytics	RMS will describe our experience leveraging GPUs and simple software architectural principles to deliver both spectacular performance gains and enhanced flexibility in next generation portfolio risk management applications.	Philippe Stephan	RMS	Finance	Watch now	FLV	MP4
	2098	Enabling On Demand Value-At-Risk for Financial Markets	Learn how financial market risk managers can increase their ability to preempt exposure limit breaching and tighten risk control to increase investor confidence. Gain insight into the techniques for obtaining high performance Monte-Carlo based market value-at-risk (VaR) estimates over a hierarchy of risk aggregation levels. This session will focus on how the new Fermi platform can be used by financial institutions to enable on-demand estimates of the market VaR, and discuss important software architecture decisions, the benefits of the new GigaThread Engine and Parallel DataCache, as well as the guiding principles for constructing efficient algorithms on GPUs.	Matthew Dixon, Jike Chong	UC Davis, Parasians, LLC	Finance	Watch now	FLV	MP4
	2101	Pricing American Options Using GPUs	This presentation focuses on the challenging problem of Pricing High-Dimensional American Options (PHAO) and how GPUs can be involved in this task. On the one hand, we present a method based on Malliavin calculus which is effective for parallel architecture. On the other hand, we compare this method with Longstaff & Schwartz method which is more dedicated to sequential architecture. We will conclude with some ideas about the parallelization of the former method on a cluster of machines and finally we will discuss this method considering it as a reformulation of a non-linear parabolic problem using BSDEs.	Lokman A. Abbas-Turki	Paris-Est University	Finance	Watch now	FLV	MP4
	2081	Morphing a GPU into a Network Processor	Modern Internet routers must meet two conflicting objectives, high performance and good programmability, to satisfy the ever-increasing bandwidth requirements under fast changing network protocols. A few recent works prove that GPUs have great potential to serve as the packet processing engine for software routers. However, current GPU’s batched execution model cannot guarantee quality-of-service (QoS) requirement. In this work, we show how to convert a GPU into an effective packet processor through minimal changes in both hardware architecture and scheduling mechanism. Experimental results proved that the new GPU architecture could meet stringent QoS requirements, but maintain a high processing throughput.	Yangdong Deng	Tsinghua University	General Interest	Watch now	FLV	MP4
	2214	Faster Simulations of the National Airspace System	Learn about twenty-four hour, fast-time simulations of traffic in the National Airspace System, which use GPU technology to help perform key steps in the trajectory prediction of flights. GPUs enabled us to improve the runtime by up to two orders of magnitude versus the previously required tens of minutes per execution. We will present a brief overview of the problem domain and a description of how the GPU has opened doors to uncharted research areas.	Joseph Rios	NASA	General Interest	Watch now	FLV	MP4	PDF
	2223	Academic Welcome Social and Poster Preview	This session is open to academic attendees only. We invite you to join your fellow academics to preview this year’s NVIDIA Research Summit Posters and mingle with your colleagues. Included will be a special presentation from our 2010-2011 Graduate Fellowship recipients to showcase the research that earned them this prestigious award. These students were selected from 268 applications in 28 countries. Their research confronts a variety of challenges of immense technical and strategic importance, including light-transport simulation, computer vision, programmability and optimization for heterogeneous systems, and much more. We believe that these minds lead the future in our industry.	Ken Pimentel	Autodesk	General Interest	Watch now	FLV	MP4
	2262	CUDA Centers of Excellence Super-Session I	Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Professor Hanspeter Pfister of Harvard University and Professor Jeff Vetter of Georgia Tech and Oak Ridge National Laboratory.	Hanspeter Pfister, Jeffrey Vetter	Harvard University, Georgia Tech and ORNL	General Interest	Watch now	FLV	MP4
	2263	CUDA Centers of Excellence Super-Session II	Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wei Ge at the Chinese Academy of Science, Professor Amitabh Varshney at the University of Maryland, and Adjunct Assistant Professor Stan Tomov at the University of Tennessee – Knoxville.	Stan Tomov, Amitabh Varshney, Wei Ge	University of Tennessee, University of Maryland, Institute of Process Engineering, Chinese Academy of Sciences	General Interest	Watch now	FLV	MP4
	2264	CUDA Centers of Excellence Super-Session III	Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Dr. Wen-mei Hwu at the University of Illinois at Urbana – Champaign, Professor Yangdong Deng at Tsinghua University and Dr. Charles D. Hansen at the University of Utah.	Yangdong Deng, Charles Hansen, Wen-mei Hwu	Tsinghua University, University of Utah, University of Illinois	General Interest	Watch now	FLV	MP4
	2265	CUDA Centers of Excellence Super-Session IV	Come hear about the groundbreaking research taking place at the CUDA Centers of Excellence, an elite group of world-renown research universities that are pushing the frontier of massively parallel computing using CUDA. Researchers from these top institutions will survey cutting-edge research that is advancing the state of the art in GPU computing and dozens of application fields across science and engineering. In this session we will hear from Professor Ting-wai Chiu at National Taiwan University, Dr. Satoshi Matsuoka at Tokyo Tech and Dr. Paul Calleja at the University of Cambridge.	Paul Calleja, Ting-Wai Chiu, Satoshi Matsuoka	University of Cambridge, National Taiwan University, Tokyo Institute of Technology	General Interest	Watch now	FLV	MP4
	2268	Think Data-Parallel! Building Data-Parallel Code with M	Discover and leverage parallelism inherent in pre-existing codes. Often times, parallelism is hidden in seemingly serial programs. This is due obfuscation via indexing or looping wherein the parallelism is seemingly non-existent. Several real-world examples of seemingly serial code demonstrate simple, yet surprisingly effective rules for detecting potential parallelism. For each example, learn how to express the code at a higher, more concise level in M by vectorizing computations. We give several canned techniques of vectorization for many common, and sometimes very difficult, use cases. Learn how such vectorization concisely brings the parallelism of code to the forefront and transforms programs that might have been originally difficult to run on a SIMT device very suitable for execution on the GPU. GPU speedups will be shown utilizing Jacket.	Gallagher Pryor	AccelerEyes	General Interest	Watch now	FLV	MP4
	2275	The Evolution of GPUs for General Purpose Computing	Learn how the GPU evolved from its humble beginning as a “VGA Accelerator” to become a massively parallel general purpose accelerator for heterogeneous computing systems. This talk will focus on significant milestones in GPU hardware architecture and software programming models, covering several key concepts that demonstrate why advances in GPU parallel processing performance and power efficiency will continue to outpace CPUs.	Ian Buck	NVIDIA	General Interest	Watch now	FLV		PDF
	2276	Using GPUs to Run Next-Generation Weather Models	We are using GPUs to run a new weather model being developed at NOAA’s Earth System Research Laboratory (ESRL) called the Non-hydrostatic Icosahedral Model (NIM). NIM is slated to run at high resolution (4km global scale) within two years. This presentation will highlight work required to parallelize and run the NIM. We will describe progress running on multiple GPUs, report on our evaluation of two FORTRAN GPU compilers, and give performance updates of NIM using Fermi. We will also discuss special challenges developing and running operational weather models on GPUs.	Mark Govett	NOAA Earth System Research Laboratory	General Interest	Watch now	FLV	MP4
	2306	Gate-Level Simulation with GP-GPUs	Logic simulation is a critical component of the digital design tool flow. It is used from high-level descriptions down to gate-level to validate several aspects of the design, particularly functional correctness. Despite development houses investing vast resources in the simulation task it is still far from achieving the performance demands of validating complex modern designs at gate-level. We developed a GP-GPU accelerated gate-level simulator using NVIDIA CUDA. We leverage novel algorithms for circuit netlist partitioning and found that our experimental prototype could handle large, industrial scale designs comprised of millions of gates while delivering 13x speedup on average over a typical commercial simulator.	Debapriya Chatterjee	University of Michigan	General Interest	Watch now	FLV
	2309	Greater ROI with Green GDDR5 and LPDDR2	High-end graphics memory has been an essential ingredient in designing PC cards for many years, just as mobile DRAM has been a part of virtually all mobile devices since they were first developed. In the face of increasing upward pressures on power consumption, Green GDDR5 and Low Power mobile DDR2 (or LPDDR2) provide outstanding performance at exceptionally low power levels, for a greater return on investment in designing desktop and mobile devices, respectively. This Samsung presentation will provide an overview of Green GDDR5’s and Green LPDDR2’s power savings compared to other much less energy efficient alternatives. The presenter also will take a close look at how GDDR5 and LPDDR2 work to improve performance and extend battery life, while helping to substantially reduce electricity usage worldwide.	Jimmy Chung	Samsung Semiconductor Inc.	General Interest	Watch now	FLV
	2019	GPU-Accelerated Internet Technologies & Trends	Join us for a whirlwind demo-punctuated tour of up-and-coming technologies that promise to bring GPU acceleration to the Worldwide Web. We'll cover 2D graphics, 3D graphics and video. In addition to summarizing the emerging standards and technologies, performance test results showing how they scale on various GPUs will be presented, along with recommendations for how to design for best performance. Finally, adoption trends and ecosystem dynamics will be summarized. Attendees should leave with a richer understanding of the possibilities enabled by the GPU-Accelerated Web, and new insights into when and how it will matter.	Chris Pedersen	NVIDIA	GPU Accelerated Internet	Watch now	FLV	MP4	PDF
	2060	GPUs in a Flash: Mapping the Flash Animated Software Vector Rendering Model to the GPU	Explore the Flash rendering architecture including the challenges of mapping from an animated software vector rendering model to a GPU. We will also discuss how the landscape of mobile, desktop, devices, drivers, and APIs impacts the design and deployment of a GPU based Flash Player.	Lee Thomason	Adobe Systems	GPU Accelerated Internet	Watch now			PDF
	2113	WebGL: Bringing 3D to the Web	WebGL is a newly-emerging standard for 3D graphics and visual computing on the web. Supported and developed by major web browser vendors, WebGL enables rich interactive 3D graphics delivered through a web browser, on both desktop and mobile platforms. This session will contain an introduction to WebGL, and will focus application development issues unique to the web platform, optimization concerns, and how web technologies such as offline app support, HTML5 video and audio, File and WebSockets integrate with WebGL. Experienced OpenGL developers will learn how to transition their knowledge to WebGL development.	Vladimir Vukicevic	Mozilla Corporation	GPU Accelerated Internet	Watch now	FLV	MP4
	2274	Harnessing the Power of the GPU in Internet Explorer 9	Internet Explorer 9 is bringing the power of modern GPUs to Web. Thanks to hardware accelerated graphics, the websites that you use every day become faster and developers can create new classes of web applications which were previously not possible. This session will provide an inside look into how Internet Explorer was redesigned to leverage the GPU. We’ll show detailed performance results, discuss our architectural approach, and look at the impact of the GPU on HTML5. A session by engineers for engineers with lots of fun demos.	Jason Weber	Microsoft	GPU Accelerated Internet	Watch now	FLV	MP4
	2017	Lessons Learned Deploying the World’s First GPU-Based Petaflop System	Learn what to expect when deploying PetaFLOP or larger systems. The June 2010 list of the Top 500 computer systems featured the first GPU based cluster to exceed 1 PetaFLOP of foating point power -- a system that was built in a fraction of the time and the cost a CPU-only system of that performance would have required. An overview of how system builders and administrators should prepare for large-scale HPC deployments.	Dale Southard	NVIDIA	High Performance Computing	Watch now	FLV	MP4	PDF
	2052	Power Management Techniques for Heterogeneous Exascale Computing	Power consumption has become the leading design constraint for large scale computing systems. In order to achieve exascale computing, system energy efficiency must be improved significantly. Our approach will focus on investigating software methodologies to achieve energy efficient computing on heterogeneous systems accelerated with GPUs.	Xiaohui Cui	Oak Ridge National Laboratory	High Performance Computing	Watch now	FLV	MP4
	2057	CUDA-Accelerated LINPACK on Clusters	This talk will illustrate the use of GPUs to accelerate the LINPACK benchmark on clusters with GPUs, where both the CPUs and the GPUs are used in synergy. The acceleration is obtained executing DGEMM (matrix multiply) and DTRSM (for the solution of triangular systems) calls simultaneously on both GPU and CPU cores. Details of the implementation will be presented together with results that shows how effective the solution is, both for performance and power efficiency.	Everett Phillips, Massimiliano Fatica	NVIDIA	High Performance Computing	Watch now	FLV	MP4	PDF
	2089	Analyzing CUDA Accelerated Application Performance at 20 PFLOP/s	Learn how applications can be executed over multiple GPUs located in multiple hosts, what the challenges are to scale one application to a 20 PFLOP/s machine and why tool support is a necessity. Receive an overview on the available performance analysis tools that support CUDA developers in generating applications with outstanding speedups.	Guido Juckeland, Jeremy Meredith	TU Dresden - ZIH, Oak Ridge National Laboratory	High Performance Computing	Watch now	FLV	MP4
	2100	Hybrid GPU/Multicore Solutions for Large Linear Algebra Problems	Large linear algebra problems may be solved using recursive block decomposition in which GPUs efficiently compute the sub-blocks and multicore CPUs put the sub-blocks back together within a large shared memory space. This talk will present benchmark results for such a hybrid approach, implemented in Matlab® and using Jacket® to access the GPU compute power.	Nolan Davis	SAIC	High Performance Computing	Watch now	FLV	MP4
	2104	Rapid Prototyping Using Thrust: Saving Lives with High Performance Dosimetry	Radiation poisoning is an everpresent danger for intervention teams that must visit nuclear sites. Virtual reality can help teams prepare for intervention, but efficient computation of radiation dosage is critical to study complex scenarios. Radiation protection research often uses codes based on the straight line attenuation method. As with other approaches, geometrical computations (finding all the interactions radiation rays/objects intersection) remain the simulation bottleneck. This talk will describe how we have used the Thrust high-level library for CUDA C/C++ to quickly prototype innovative algorithms and achieve a significant speed up.	Guillaume Saupin	Atomic and Alternative Energies Commission (CEA)	High Performance Computing	Watch now	FLV	MP4
	2117	Migration of C and Fortran Apps to GPGPU using HMPP	GPGPU is a tremendous opportunity to many application fields. Migrating legacy software to GPGPU is a complex process that requires mastering the technological risks (e.g. loss of code portability, extensive code restructuration, debugging complexity) as well as costs. In this talk, we present a methodology based on HMPP (Heterogeneous Multicore Parallel Programming), allowing incremental processes that reduce the cost and risks of porting codes to GPGPU.	Francois Bodin	CAPS entreprise	High Performance Computing	Watch now	FLV	MP4
	2119	Supercomputing for the Masses: Killer-Apps, Parallel Mappings, Scalability and Application Lifespan	Hear the latest on how supercomputing for the masses is changing the world. We will look at some of the one- to three-orders of magnitude faster killer apps and see how they do it. We will discuss specific mapping to GPGPU hardware and techniques for high performance and near-linear scalability both within and across multiple GPGPUs. We will also consider software investment and the decades long longevity of some successful massively parallel Investments in multithreaded software, scalability, balance metrics, lack of consensus on programming models, and lifecycle considerations.	Robert Farber	PNNL	High Performance Computing	Watch now	FLV	MP4	PDF
	2133	3D Full Wave EM Simulations Accelerated by GPU Computing	3D Full Wave Electromagnetic simulations of RF components, antennas, printed circuit boards, can be quite time consuming. Computer Simulation Technology (CST) toolsuite includes the capability to activate GPU Computing. Examples will be shown of using Tesla C1060 and S1070 configurations to provide significant performance improvement of complex simulations.	Fabrizio Zanella	CST of America	High Performance Computing	Watch now	FLV	MP4	PDF
	2135	Processing Petabytes per Second with the ATLAS experiment at the Large Hadron Collider at CERN	Learn how GPUs could be adopted by the ATLAS detector at the Large Hadron Collider (LHC) at CERN. The detector, located at one of the collision points, must trigger on unprecedented data acquisition rates (PB/s), to decide whether to record the event, or lose it forever. In the beginning, we introduce the ATLAS experiment and the computational challenges it faces. The second part will focus on how GPUs can be used for algorithm acceleration - using two critical algorithms as exemplars. Finally, we will outline how GPGPU acceleration could be exploited and incorporated into the future ATLAS computing framework.	Philip Clark, Andrew Washbrook	University of Edinburgh	High Performance Computing	Watch now	FLV		PDF
	2138	Faster, Cheaper, Better – Hybridization of Linear Algebra for GPUs	Learn how to develop faster, cheaper and better linear algebra software for GPUs through a hybridization methodology that is built on (1) Representing linear algebra algorithms as directed acyclic graphs where nodes correspond to tasks and edges to dependencies among them, and (2) Scheduling the execution of the tasks over hybrid architectures of GPUs and multicore. Examples will be given using MAGMA, a new generation of linear algebra libraries that extends the sequential LAPACK-style algorithms to the highly parallel GPU and multicore heterogeneous architectures.	Stan Tomov, Hatem Ltaief	University of Tennessee	High Performance Computing	Watch now	FLV	MP4
	2147	GPGPU Development for Windows HPC Server	Attend this demo-driven session to see how to schedule jobs to a Windows compute cluster that includes GPUs. We will also demonstrate GPU-enhanced versions of some commonly used HPC open-source codes, and show how NVIDIA Parallel Nsight™ can be used to debug GPU applications on a cluster. Provides a brief introduction to performance profiling tools that allow developers to analyze system, CPU and GPU events.	Calvin Clark	Microsoft	High Performance Computing	Not Available		MP4
	2153	CULA - A Hybrid GPU Linear Algebra Package	Get the latest information on CULA, an library of hybrid GPU/CPU linear algebra routines optimized for NVIDIA GPUs. CULA launched at GTC2009 and has since received large speedups and many new features. We will cover all the features, performance, inner workings, and how users can integrate CULA into their applications. New features for 2010 and 2011 will be in the spotlight, with exciting new developments for sparse matrices including general direct sparse solvers, iterative sparse solvers, and specialized block tridiagonal solvers. Learn how your existing linear algebra applications can benefit from a high quality library. Much more information is available at www.culatools.com and at our presentation and booth.	John Humphrey	EM Photonics, Inc	High Performance Computing	Watch now	FLV	MP4
	2154	GPU Military Applications: Image Processing, Embedded Computing, and CFD	Discover how different branches of the U.S. military are utilizing GPU accelerated solutions in mission-critical operations. This session will detail GPU-related projects that the engineers at EM Photonics have developed specifically for military applications. An image processing example will discuss how GPUs are being used to accelerate long-range battlefield surveillance to protect soldiers. Other military examples include low-power embedded GPU solutions utilized by UAVs and CFD simulations used to model complex interactions between vehicles at sea.		EM Photonics, Inc.	High Performance Computing	Watch now	FLV
	2205	A Highly Reliable RAID System Based on GPUs	While RAID is the prevailing method of creating reliable secondary storage infrastructure, many users desire more flexibility than offered by current implementations. To attain needed performance, customers have often sought after hardware-based RAID solutions. This talk describes a RAID system that offloads erasure correction coding calculations to GPUs, allowing increased reliability by supporting new RAID levels while maintaining high performance.	Matthew Curry	Sandia National Laboratories and the University of Alabama at Birmingham	High Performance Computing	Watch now	FLV	MP4
	2208	Acceleration of SIMULIA’s Abaqus Solver on NVIDIA GPUs	Learn about Acceleware's and Dassault Systemes' integrated solution that performs an LDL^T factorization on GPUs within the Abaqus software package. We will discuss efficient GPU parallelization of the factorization algorithm and enabling the CPU and GPU to overlap their computations and data transfers. Includes an end user simulation case study and GPU performance measurements including 300 GFlops in single precision and 145 GFlops in double precision on NVIDIA Tesla C2050.	Chris Mason	Acceleware	High Performance Computing	Watch now	FLV	MP4	PDF
	2217	GPU-Based Conjugate Gradient Solvers for Lattice QCD	Learn how to perform state-of-the-art quantum chromodynamics (QCD) computation using NVIDIA GPUs at 1% of the cost of a conventional supercomputer and 10% of its power consumption. We will discuss how physicists around the world are using GPU clusters to solve QCD. We will focus upon how TWQCD have been using a large GPU cluster (200 GPUs) to simulate QCD, attaining 36 Teraflops (sustained).	Ting-Wai Chiu	National Taiwan University	High Performance Computing	Watch now	FLV
	2232	What If You Had a Petabyte of Memory and/or a Petaflop of Compute? (Sponsored by SGI)	We will explore application spaces where GPU compute coupled with very large shared memory architectures and/or petaflops of compute are allowing new science or new business questions to be addressed.	Bill Mannel	SGI	High Performance Computing	Watch now	FLV
	2233	Solving Your GPU Computing Needs (Sponsored by HP)	In this session we will go into detail and you will learn about HP’s GPU enabled systems, from Workstations to our GPU enabled servers and clusters. You will get the latest information on configurations, options, GPU management and use cases.	Dave Korf, Will Wade	HP	High Performance Computing	Not Available	FLV	MP4
	2238	Better Performance at Lower Occupancy	It is usually advised to optimize CUDA kernels for higher occupancy to hide memory and arithmetic latencies better. In this presentation, I show that increasing occupancy is not the only way and not always the best way to hide latency on GPU. Instead, it may be advantageous to rely on the parallelism within threads-instruction-level parallelism. This insight yields a simple optimization technique that is used in later versions of CUBLAS and CUFFT. I discuss the rationale behind the technique and illustrate it by speeding up matrix multiplication, starting with the basic implementation found in the NVIDIA GPU Computing SDK.	Vasily Volkov	UC Berkeley	High Performance Computing	Watch now	FLV		PDF
	2240	Accelerating LS-DYNA with MPI, OpenMP, and CUDA	When solving implicit problems, the computational bottleneck in LS-DYNA is the multifrontal linear solver. These operations are performed with double precision arithmetic, hence until the arrival of the Tesla 2050, experiments with GPU acceleration were only a curiosity. This is no longer the case, and in this talk we will describe how LS-DYNA's hybrid (MPI and OpenMP) solver is further accelerated using GPUs to factor large dense frontal matrices.	Bob Lucas	USC	High Performance Computing	Watch now	FLV
	2247	Reconfiguring a Pool of GPUs on The Fly (Sponsored by NextIO)	Today’s HPC applications break down large data set problems into smaller, independent elements solved by massively parallel processor systems. GPU’s as co processing devices are optimized for this task and their popularity in technical computing is rapidly advancing. Like many rapidly advancing technologies, they leave in their wake new and challenging problems. In the effort to cut costs while increasing performance, damaging ripple-effects can occur; resources can be over or under provisioned, inventory difficult to manage, lots of single points of failure mean constant job interruptions, manual reconfiguration of resources are required for each job, servicing and lifecycle management require outages. Most of these problems can be addressed and overcome by combining GPU resources into managed, structured pools. NextIO will present and demonstrate a new and innovative approach to consolidating and managing pools of NVIDIA GPU resources along with the cost and operational savings benefits associated with top of rack GPU consolidation appliances.	K.C. Murphy	NextIO	High Performance Computing	Watch now	FLV	MP4
	2248	Parallel Processing on GPUs at the University of Utah	The University of Utah is a CUDA Center of Excellence. We have been doing both basic and applied research using CUDA. In this session, we plan to give 3-4 talks on ongoing research. Most of the work that we will be presenting has been peered reviewed at top conferences.	Huy Vo, Claudio Silva	University of Utah	High Performance Computing	Watch now	FLV
	2270	Appro’s GPU Computing Solutions	Learn how GPU’s are changing the High Performance Computing landscape to deliver price/performance levels that were previously considered unachievable. Join Appro (http://www.appro.com), a leading provider of supercomputing solutions; to discuss the introduction of the Appro Tetra server, the most powerful GPU server available today in a 1U form factor and the availability of a new modular GPU expansion blade, both based on NVIDIA Tesla 20-series GPUs. The availability of these two products is a confirmation of Appro’s commitment in providing the most innovative and powerful computing platforms at very attractive prices to the High Performance Computing markets.	John Lee	Appro	High Performance Computing	Watch now	FLV	MP4
	2273	GPUs In the Front Line of our Defenses (Sponsored by GE)	Find out how GPUs are accelerating defense and aerospace applications and providing superior information processing to drive the next generation of capabilities to protect both homelands and soldiers. Learn how rugged VPX hardware and software architectures are able to scale from small power- & weight-constrained vehicles through to large complex processing arrays, on platforms as diverse as unmanned aerial vehicles (UAV), through tracked ground vehicles, and to ship borne radar.	Simon Collins	GE Intelligent Platforms	High Performance Computing	Watch now	FLV
	2280	TSUBAME2.0 Experience	Tsubame2.0 is the next-generation multi-petaflops supercomputer that been designed and built at Tokyo Tech, with more than 4000 NVIDIA Fermi GPUs. as a successor to the highly successful Tsubame1. Deep design considerations were made based on experiences on Tsubame1 retrofitted with the previous generation Tesla to maximize the versatility and the competitiveness of the system across considerable number of application domains, as well as accommodating as much strong scaling as possible. This resulted in a totally new custom system design in collaboration with HP and NEC, rather than a machine with a retrofitted GPUs. The resulting supercomputer hopefully will become a design template of future large-scale GPU systems to come.	Satoshi Matsuoka	Tokyo Institute of Technology	High Performance Computing	Watch now	FLV
	2283	500 Teraflops Heterogeneous Cluster	HPC Affiliated Resource Center (ARC) will be host of a very large interactive HPC. The large cluster (CONDOR) will integrate cell broadband engine processors, GPGPUs and powerful x86 server nodes, with a combined capability of 500 Teraflops. Applications will include neuromorphic computing, video synthetic aperture radar backprojection, matrix multiplications, and others. This presentation will discuss progress on performance optimization using the Heterogeneous Cluster and lessons learned from this research.	Mark Barnell	Air Force Research Lab (AFRL)	High Performance Computing	Watch now	FLV	MP4	PDF
	2286	Towards Peta-Scale Green Computation - Applications of the GPU Supercomputers in the Chinese Academy of Sciences (CAS)	China now holds three spots in the June 2010 Top500 list of GPU-based supercomputers, and two of them, using NVIDIA GPUs, are related to CAS. Efficient use of these systems is more important than peak or Linpack performance. This session will cover some of the large-scale multi-GPU applications in CAS, ranging from molecular dynamics below nano-scale to complex flows on meter-scale and porous media on geological scales, as well as fundamental linear algebra and data/image analysis. The idea of keeping high-efficiency and generality of the computation platform by maintaining a consistency among the target physical system, the computational model and algorithm, and the computer hardware will be explained in detail and demonstrated through a number of super-computing applications in the chemical, oil, mining, metallurgical and biological industries.	Wei Ge, Xiaowei Wang, Yunquan Zhang, Long Wang	Institute of Process Engineering, Chinese Academy of Sciences, Institute of Process Engineering, Institute of Software, CAS, Super Computing Center, Institute of Computer Network Information of CAS	High Performance Computing	Watch now	FLV
	2287	Internal GPUs on Dedicated x16 Slots - Are They Needed For HPC? (Sponsored by Dell)	We have benchmarked the real performance impact on a series of GPU accelerated applications to understand the benefits and drawbacks of different system level configurations. Come hear about the effects on performance of GPUs in shared slots and of GPUs that are externally connected.	Mark Fernandez	Dell	High Performance Computing	Watch now	FLV
	2293	Scaling Up and Scaling Out GPUs with Supermicro's Twin™ Architecture (Sponsored by Supermicro)	Find out how Supermicro scales up and scales out GPU performance by using Twin™ architecture. In this session, we outline Supermicro's Twin™ architecture advantages across 1U/2U GPU servers and the design of personal supercomputer, and how we are able to scale and optimize GPU technology for datacenter environment and for professional workstation.	Don Clegg	Super Micro Computer, Inc.	High Performance Computing	Watch now	FLV
	2301	GPU Cluster Computing: Accelerating Scientific Discovery	We propose holding a research roundtable focussed on using GPU clusters to support scientific research. The roundtable will bring together researchers that have recently deployed or are interested in deploying GPU clusters to enable scientific research. At the research roundtable they will be able to share their experiences in deploying this new technology and discuss the future of this technology in supporting research to tackle the world’s most challenging scientific problems. To open discussion we will provide a brief presentation about deployment of the CSIRO's latest supercomputer cluster, which is among the world's first to combine traditional CPUs with more powerful NVIDIA GPUs, that is providing a world class computational and simulation science facility to advance priority CSIRO science.	John Taylor	CSIRO	High Performance Computing	Watch now	FLV
	2302	Microsoft Technologies for High Performance Computing (Sponsored by Microsoft)	NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. In this session, we will expand on the computational power of Visual Studio 2010, Windows HPC Server and the Technical Computing Libraries and show how to increase your performance.	Calvin Clark	Microsoft	High Performance Computing	Not Available
	2051	GPGPU in Commercial Software: Lessons From Three Cycles of the Adobe Creative Suite	Learn about leveraging GPUs for commercial software. We will discuss lessons learned creating and using the Adobe Image Foundation libraries to accelerate image and video processing using GPUs and multi-core. These libraries are used by most of Adobe's applications as well as integrated by hobbyist and professional applications with different levels of experience with GPUs and diverse user bases.	Kevin Goldsmith	Adobe Systems, Incorporated	Imaging	Watch now	FLV	MP4	PDF
	2093	Computational Photography: Real-Time Plenoptic Rendering	Get the latest information on GPU-based plenoptic rendering including a demonstration of refocusing, novel view generation, polarization, high dynamic range, and stereo 3D. Learn how GPU hardware enables plenoptic rendering tasks with high-resolution imagery to be performed interactively, opening up entirely new possibilities for modern photography.	Andrew Lumsdaine, Georgi Chunev, Todor Georgiev	Indiana University, Indiana University, Adobe Systems	Imaging	Watch now	FLV	MP4	PDF
	2145	Photo Editing on the GPU with MuseMage	See how MuseMage greatly accelerates image processing and editing while providing real-time feedback by harnessing the power of GPUs. We will discuss the majority of MuseMage tools which are fully implemented on GPUs.	Kaiyong Zhao, Yubo Zhang	HKBU, UC Davis	Imaging	Watch now	FLV	MP4	PDF
	2300	High-Performance Compressive Sensing using Jacket	This talk will present the ongoing work that I am doing in the L1-optimization group at Rice Universtiy. The purpose of the work is to merge both compressive sensing, for image/signal reconstructions and GPU computation, using NVIDIA’s GPUs to enhance the technology of CS. This talk will cover basic concepts in compressive sensing and the easy adaptation of operating on the GPU, in particular working with Jacket (by AccelerEyes). We willthen cover some of our numerical experiments that encompass the use of different flavors of algorithms.	Nabor Reyna	Rice University	Imaging	Watch now			PDF
	2007	Folding@home: Petaflops on the Cheap Today; Exaflops Soon?	Learn how Folding@home has used petascale computing with GPUs to make fundamental breakthroughs in computational biology and how this technology can make an impact in your work.	Vijay Pande	Stanford University	Life Sciences	Watch now	FLV
	2030	High-Throughput Cell Signaling Network Learning with GPUs	Explore how GPUs are being used to enable high-throughput cell signaling network discovery and data-intensive computational systems biology more generally. Systems biology is transitioning from a largely reductive discipline to one focused on building predictive models of large-scale biological systems. New instrumentation will provide the necessary raw data for such an approach, the key challenge now is building the hardware and software tools to efficiently and interactively build these models. This session will describe how GPUs can and will play a key role in these efforts.	Michael Linderman	Stanford University	Life Sciences	Watch now	FLV	MP4
	2034	Reformulating Algorithms for the GPU	Important applications in signal, data processing and bioinformatics that use dynamic programming are difficult to parallelize due to intrinsic data dependencies. We demonstrate a novel technique to extract parallelism out of data dependent algorithms and reformulate the same for GPUs. This simple technique breaks the dependencies and resolves them at an optimal point later in time, thus obtaining remarkable speedup on GPUs. We present a case study from computational biology i.e., protein motif-finding. We also present how the same technique can be extended and applied to other relevant problems such as gene-prediction and phylogenetics.	Narayan Ganesan, Michela Taufer	University of Delaware	Life Sciences	Watch now	FLV	MP4
	2055	Application of Fermi GPU to Flow Cytometry and Cancer Detection	Learn how a Tesla C2050 enabled scientists to explore cancer data sets 400 times faster than a PC-only implementation. Discusses how the results of this work may lead to better diagnostics for detecting leukemia in blood cells.	Robert Zigon	Beckman Coulter	Life Sciences	Watch now	FLV	MP4	PDF
	2088	Nucleotide String Matching Using CUDA-Accelerated Agrep	Dive deep into the intelligent utilization of various CUDA memory spaces to remarkably speedup approximate DNA/RNA nucleotide sequence matching algorithm in bioinformatics by an amazing factor of 67 compared to multi-threaded quad core CPU counterpart. Our talk provides a very good example to demonstrate how to use indexable array to save frequently updated variables directly into GPU registers, how to organize shared memory into a 2D array to avoid bank conflict, and how to shuffle the data structure to satisfy the requirement for coalesced global memory access. Our CUDA implementation employs online approach and can be applied in real time.	Hongjian Li	The Chinese University of Hong Kong	Life Sciences	Watch now
	2105	CUDA-FRESCO: An Efficient Algorithm for Mapping Short Reads	Learn about CUDA-FRESCO and how it addresses issues with MUMmerGPU. We will detail how CUDA-FRESCO overcomes MUMmerGPU's problems processing reads with errors or mismatches and delivers additional performance beyond MUMmerGPU's 5-12x speedup with less than 100bp query length.	Chun-Yuan Lin	Department of CSIE, Chang Gung University	Life Sciences	Watch now	FLV	MP4
	2115	Modified Smith-Waterman-Gotoh Algorithm for CUDA Implementation	It is axiomatic that computational throughput can be increased by exploiting the parallelism of GPU hardware –- but what if the computational algorithm is not easy to implement in parallel? We have modified one such algorithm -– the Smith-Waterman-Gotoh dynamic programming algorithm for local sequence alignment -– so as to make it more amenable to data-parallel computation. The result is a successful CUDA implementation that fully exploits GPU parallelism.	Richard Wilton	The Johns Hopkins University	Life Sciences	Watch now	FLV	MP4
	2172	Unveiling Cellular & Molecular Events of Cardiac Arrhythmias	George Mason University is using CUDA technology to get a 20x speed-up in simulations of intracellular calcium dynamics, thought to play a major role in the generation of cardiac arrhythmias. We will discuss the novel algorithms we have developed for Markov Chain Monte Carlo Simulation and their use in investigating elementary events of calcium release in the cardiac myocyte. The resulting extremely fast simulation time has generated new insights into how defects in the control of intracellular calcium may lead to cardiac arrhythmia.	Tuan Hoang-Trong	George Mason University	Life Sciences	Watch now	FLV	MP4	PDF
	2203	Modeling Evolution Computing the Tree of Life	Learn how GPUs are being used to accelerate our understanding of the tree of life. This session will cover BEAGLE, which is an open API and library for evaluating phylogenetic likelihoods of biomolecular sequence evolution. BEAGLE uses novel algorithms and methods for evaluating phylogenies under arbitrary molecular evolutionary models on GPUs, making use of the large number of processing cores to efficiently parallelize calculations.	Daniel Ayres	University of Maryland	Life Sciences	Watch now	FLV	MP4
	2046	Efficient Automatic Speech Recognition on the GPU	Learn about how the GPU is able to meet the challenges of implementing automatic speech recognition (ASR), gain insights into the data-parallel implementation techniques that can provide 10x faster performance compared to sequentially processing ASR on a CPU. The state-of-art algorithm for ASR performs a graph traversal on a large, irregular graph with millions of states and arcs, guided by speech input only known at runtime. We present four generalizable techniques including: dynamic data-gather buffer, find-unique, lock-free data structures using atomics, and hybrid global/local task queues. When used together, these techniques can effectively resolve ASR implementation challenges on a GPU.	Jike Chong	Parasians, LLC	Machine Learning & Artificial Intelligence	Watch now	FLV	MP4	PDF
	2091	The GPU in the Reactive Control of Industrial Robots	Universal Robotics is using GPUs for real-time visual sensing in the reactive control of industrial robots. For a robot to work in a complex dynamic environment to achieve a more loosely specified goal, such as moving arbitrary boxes from a pallet to a conveyor, requires reactivity. Reactive control requires intensive, concurrent, low-latency computation for motion planning, exception handling, and sensing. We describe and demonstrate how GPU-based computation enables visual servoing and box moving. We also discuss the potential of the GPU to solve more difficult sensory problems such as multi-robot cooperation, multimodal sensor binding, attention, sensitization, and habituation.	Dr.Alan Peters	Universal Robotics, Inc.	Machine Learning & Artificial Intelligence	Watch now	FLV	MP4
	2207	Playing Zero-Sum Games on the GPU	A Zero-Sum game is a match for which the gain of one results in loss of the other. Tic-Tac-Toe, Checkers and Chess are Zero-Sum board game examples. For realizing the best player move, the game is abstracted as a tree, often quite deep, consisting of all possible configurations. We present an efficient GPU implementation of the Mini-Max search algorithm, enhanced with Alpha-Beta pruning. We highlight challenges for deploying non-tail recursion of a highly irregular algorithm on GPUs, proposing a hybrid of compiler and user managed stack. We demonstrate superior performance for running many thousands of 3D Tic-Tac-Toe matches, simultaneously.	Avi Bleiweiss	NVIDIA Corporation	Machine Learning & Artificial Intelligence	Watch now	FLV		PDF
	2001	Acceleration of the Freesurfer Suite for Neuroimaging Analysis	See how GPU technology has dramatically accelerated the Freesurfer suite of tools used by thousands of researchers for the analysis of neuroimaging data.	Richard Edgar	Mass. General Hospital	Medical Imaging & Visualization	Watch now	FLV	MP4	PDF
	2009	4D Visualization and Analysis of Flow	4D flow or vector data is now common in CFD simulations as well as acquisition techniques like 4D flow MRI to study abnormal blood flow patterns. We show how by mixing compute and graphics combined with stereo we are now able to interactively analyze and visualize the resulting data to understand abnormal flow patterns. Topics include flow field rendering, computing derived quantities, merging volumetric rendering with computed geometry such as particles and surfaces, and integration 3d vision stereo.	Shalini Venkataraman	NVIDIA	Medical Imaging & Visualization	Watch now	FLV	MP4	PDF
	2036	Algorithms for Automated Segmentation of Medical Imaging Studies Utilizing CUDA	Discover how GPU computing can help doctors make sense of modern imaging studies. This session is intended for a general audience as well as medical informatics specialists. The focus will be on algorithmic approaches to segmentation as it pertains to CTA (computed tomography angiography) studies. Topics covered will include specialized optimization algorithms and novel lumen tracking methodologies.	Supratik Moulik	University of Pennsylvania	Medical Imaging & Visualization	Watch now	FLV	MP4	PDF
	2094	Nearly Instantaneous Reconstruction for MRIs	GE’s Autocalibrating Reconstruction for Cartesian Imaging (ARC) is a computationally intensive, widely used algorithm in MRI Reconstruction using Parallel Imaging. We demonstrate that an optimized CUDA implementation of ARC on a GPU can enable nearly instantaneous reconstruction and speedups of up to 10x over an optimized dual socket QuadCore CPU implementation. We will discuss challenges both with computational intensity and data read/write efficiency. We will also compare the Fermi C2050 with the C1060.	Srihari Narasimhan	GE Global Research	Medical Imaging & Visualization	Watch now	FLV	MP4
	2096	High-Speed CT Reconstruction in Medical Diagnosis & Industrial NDT Applications	We present the software platform CERA developed by Siemens, which utilizes (multiple) graphics processing units (GPUs) in order to deliver high-speed CT reconstructions, and describe its implementation challenges using CUDA and OpenCL. We further show how GPU acceleration enables the utilization of reconstruction approaches which provide highly improved reconstruction quality in NDT applications.	Holger Scherl	Siemens AG	Medical Imaging & Visualization	Watch now	FLV	MP4
	2139	Interactive Histology of Large-Scale Biomedical Image Stacks	Get the latest information on leveraging GPU computing to process and visualize large-scale biomedical image stacks. We will discuss both display-aware processing and GPU-accelerated texture compression for histology applications on the GPU.	Won-Ki Jeong, Jens Schneider	Harvard University, King Abdullah University of Science and Technology	Medical Imaging & Visualization	Watch now	FLV	MP4
	2144	Large-Scale Visualization Using A GPU Cluster	Learn how to visualize extremely large-scale scientific data using GPGPU techniques on a GPU-accelerated visualization cluster. Recent advances in general-purpose GPU (GPGPU) computing provide a promising solution to compute-intensive scientific visualization. However, the largest scientific simulations produce datasets that are orders of magnitude larger than the memory available on current GPUs. Many distributed GPUs must be used in parallel. We present Longhorn, currently the world's largest GPU-enhanced cluster dedicated for visualization and data analysis, and describe the distributed HW/SW architecture to interactively visualize massive datasets. Furthermore, we discuss the techniques to optimize a CUDA isosurfacer and to accelerate isosurface extraction of extremely large-scale data using preprocessed metadata.	Byungil Jeong, Paul Navratil	TACC / UT-Austin, Texas Advanced Computing Center	Medical Imaging & Visualization	Watch now	FLV
	2146	Virtual Surgery	Come see how 3D Vision technology is used in Virtual Surgery Training for Medical Education. BioDigital Systems in conjuncture with University of California San Francisco (UCSF), has developed a dental injection simulator to teach students of dentistry the mechanics of nerve block injection. 3D Vision Technology has added a new dimension of realism by providing users with a unique immersive experience.	Aaron Oliker	BioDigital	Medical Imaging & Visualization	Watch now	FLV
	2169	Real-time Volumetric Medical Ultrasound Applications for GPU Computing	Real-time volumetric medical ultrasound requires computationaly intensive rapid processing of data for visualization of aquired acoustic data. Clinical applications of GPU-based technologies in obstetrics and cardiology will be discussed.	Roee Lazebnik	Siemens Healthcare	Medical Imaging & Visualization	Watch now	FLV
	2201	A Case Study of Accelerating Matlab Based Applications using GPUs	Learn how to accelerate Matlab based applications using GPUs. We cover a popular neuro-imaging software called SPM and show how to use CUDA and Jacket to speedup computationally intensive Matlab applications.	Aniruddha Dasgupta	Georgia Institute of Technology	Medical Imaging & Visualization	Watch now
	2211	Modern Architecture for Massively Parallel Medical Tomographic Image Reconstruction on a GPU Cluster	Learn how to combine GPU and Cluster Programming with a real-world example. Many aspects of medical tomographic image reconstruction are embarrassingly parallel, but require massive compute power. We distribute the load onto a cluster of multi-GPU equipped nodes using Message Passing Interface (MPI) and CUDA. The Thrust library allows for a modern object-oriented approach.	Sven Prevrhal, Jingyu Cui	Philips, Stanford University	Medical Imaging & Visualization	Watch now
	2235	Advanced Medical Volume Rendering and Segmentation on the GPU	Learn how to speed up your interactive medical visualization pipeline by an order of magnitude and dramatically improve rendering quality at the same time. Leading researchers in medical imaging informatics describe recent advances in volume visualization and interactive segmentation. Emphasis is on the underlying parallel GPU algorithms and acceleration data structures.	Mike Roberts, Eric Penner	Hotchkiss Brain Institute, University of Calgary, Canada	Medical Imaging & Visualization	Watch now	FLV	MP4
	2236	A Work-Efficient GPU Algorithm for Level Set Segmentation	Explore a novel GPU level set segmentation algorithm that is both work-efﬁcient and step-efﬁcient. Our algorithm has O(logn) step-complexity, in contrast to previous GPU algorithms which have O(n) step-complexity. We apply our algorithm to 3D medical images and we show that in typical clinical scenarios, our algorithm reduces the total number of processed level set ﬁeld elements by 16x and is 14x faster than previous GPU algorithms with no reduction in segmentation accuracy.	Mike Roberts	Hotchkiss Brain Institute, University of Calgary, Canada	Medical Imaging & Visualization	Watch now
	2282	GPU-Enabled Biomedical Imaging	The purpose of this presentation is to describe several novel biomedical imaging applications which make extensive use of GPUs. In CT iterative reconstructions, for example, high performance computing is allowing us to see details and structures we previously were not able to discern.	Homer Pien	MGH / HMS	Medical Imaging & Visualization	Watch now
	2006	Short-Range Molecular Dynamics on GPU	Learn how to accelerate short-range molecular dynamics using CUDA C. We will cover building the neighbor list and calculating the forces on the GPU. To handle the case where a few particles have significantly more neighbors than most other particles, we propose a hybrid data structure for the neighbor list that can achieve a good balance between performance and storage efficiency. A CUDA C implementation of the technique for Leonard-Jones forces can be found in the LAMMPS molecular dynamics open source code.	Peng Wang	NVIDIA	Molecular Dynamics	Watch now	FLV	MP4	PDF
	2035	Simulations of Large Membrane Regions	Learn how to study membrane-bound protein receptors by moving beyond the current state-of-the-art simulations that only consider small patches of physiological membranes. Towards this end, this session presents how to apply large-scale GPU-enabled computations of extended phospholipid bilayer membranes using a GPU code based on the CHARMM force field for MD simulations. Our code enables fast simulations of large membrane regions in NVT and NVE ensembles and includes different methods for the representation of the electrostatic interactions, i.e., reaction force field and Ewald summation (PME) methods. Performance and scientific results for dimyristoylphosphatidylcholine (PC) based lipid bilayers are presented.	Narayan Ganesan, Michela Taufer, Sandeep Patel	University of Delaware	Molecular Dynamics	Watch now	FLV	MP4
	2054	NAMD, CUDA, and Clusters: Taking GPU Molecular Dynamics Beyond the Desktop	A supercomputer is only as fast as its weakest link. The highly parallel molecular dynamics code NAMD was one of the first codes to run on a GPU cluster when G80 and CUDA were introduced in 2007. Now, after three short years, the Fermi architecture opens the possibility of new algorithms, simpler code, and easier optimization. Come learn the opportunities and pitfalls of taking GPU computing to the petascale.	James Phillips	University of Illinois	Molecular Dynamics	Watch now	FLV	MP4
	2062	HOOMD-blue: Fast and Flexible Many-Particle Dynamics	See the newest capabilities and performance enhancements in HOOMD-blue, a general-purpose many-particle dynamics application written for GPUs. Speedups of 80-100x are attained for a wide range of simulation types. Topics for this presentation include an overview of HOOMD-blue, design and implementation details of the underlying algorithms, and a discussion on how generality is maintained without sacrificing performance.	Joshua Anderson	University of Michigan	Molecular Dynamics	Watch now	FLV	MP4	PDF
	2073	High Performance Molecular Simulation, Visualization, and Analysis on GPUs	This talk will present recent successes in the use of GPUs to accelerate interactive visualization and analysis tasks on desktop computers, and batch-mode simulation and analysis jobs on GPU-accelerated HPC clusters. We'll present Fermi-specific algorithms and optimizations and compare with those for other devices. We'll also present performance and performance/watt results for NAMD molecular dynamics simulations and VMD analysis calculations on GPU clusters, and conclude with a discussion of ongoing work and future opportunities for GPU acceleration, particularly as applied to the analysis of petascale simulations of large biomolecular complexes and long simulation timescales.	John Stone	University of Illinois at Urbana-Champaign	Molecular Dynamics	Watch now	FLV	MP4	PDF
	2086	GPGPU DL_POLY	Discover DL_POLY. 1. DL_POLY: an MD code ICHEC has ported to CUDA. The presentation especially focuses on the auto-tuning of the work distribution between CPU and GPU	Gilles Civario	ICHEC	Molecular Dynamics	Watch now	FLV	MP4
	2168	Interactive Molecular Dynamics for Nanomechanical and Nanochemical Experiments	Hear how the combination of GPU accelerated molecular dynamics simulation software, 3D TV displays, affordable haptic game controllers, and high performance molecular visualization is leading to new ways to study materials and objects on the nanoscale. We will present the concept of an appliance for integrated virtual nanoscale experiments and challenges related to software and hardware.	Axel Kohlmeyer	Institute for Computational Molecular Science, Temple University	Molecular Dynamics	Watch now			PDF
	2218	Redesigning Molecular Dynamics for GPUs and GPU Clusters	Generalized Born and Particle Mesh Ewald (PME) molecular dynamics are two computationally intensive algorithms for simulating biological molecules. While several adaptations of Generalized Born have attained excellent speedup on GPUs, high performance Particle Mesh Ewald has been more elusive. Here we describe in detail a recent port of PME implemented within AMBER 11 that has achieved performance on par with up to 128 nodes of a top ten supercomputer.	Scott Le Grand	NVIDIA	Molecular Dynamics	Watch now
	2269	Bringing GPUs to Mainstream Molecular Dynamics Packages	Recent work in close collaboration with NVIDIA has produced a GPU accelerated version of the AMBER Molecular Dynamics Code PMEMD that runs between 20 and 130 times the speed of a single 2.8GHz Intel Nehalem Processor, with even higher performance on multiple GPUs, but which does not make sacrifices in the accuracy or validity of such calculations to achieve this. The GPU accelerated version supports both explicit solvent particle mesh ewald (PME) and implicit solvent simulations and is available as part of the new AMBER 11 package. This talk will provide an overview of the AMBER software, background behind this GPU work, benchmarks, the impact that GPU accelerated MD can have on the field, the techniques used to achieve the performance seen without sacrificing accuracy and finally the validation methods used to ensure simulations are directly equivalent to CPU based calculations. Ensuring that a GPU implementation of a MD package provides results that are indistinguishable from the CPU code is extremely tricky and often the desire to take shortcuts to boost performance can affect accuracy with unpredictable results. We have developed a comprehensive validation suite that can be used to perform the detailed testing that is required to ensure the approximations necessary for GPU performance do not impact the scientific results. Additionally we will discuss how we have made careful use of mixed single and double precision arithmetic in the AMBER implementation to achieve equivalence in the results without excessively compromising performance. Finally we provide examples of recent breakthrough simulations conducted using GPU enabled AMBER 11.	Ross Walker	San Diego Supercomputer Center	Molecular Dynamics	Watch now
	2122	Using GPUs for Real-Time Brain-Computer Interfaces	Learn how GPU processing can provide researchers with an inexpensive and versatile alternative to dedicated signal processing hardware for real-time neural prosthetics. Topics will include an overview of algorithms, current state-of-the-art hardware, GPU processing in a real-time environment, multi-platform processing, and future directions in BCIs using GPU processing.	Adam Wilson	University of Cincinnati	Neuroscience	Watch now	FLV	MP4
	2252	Simulating Housefly Vision Elements Using OpenCL	An OpenCL GPU based computer simulation of a biologically motivated model, based on the anatomy of housefly’s first optic ganglion, the lamina ganglionaris (the lamina layer) is presented. Specific to GPU technology, the computer model demonstrates: the implementation of a 2nd Order Runga-Kutta method to approximate coupled differential equations using GPU hardware; the mapping of a non-Cartesian coordinate system onto the Cartesian layout of the threads. Testing examined usage and access across device memory spaces to determine the optimal usage/access method for the ANN. This result was generalized for OpenCL GPU devices, using the capabilities of OpenCL.	Karen Haines	WASP/The University of Western Australia	Neuroscience	Watch now
	2066	Accelerating System Level Signal Integrity Simulation	Discuss how GPU acceleration for key parts of the ANSYS Nexxim Simulator resulted in significant speedup over multi-core processors. We will cover time consumption and data parallelism exposure considerations, and focus on key areas where GPU acceleration was applied including convolution and Eye rendering.	Danil Kirsanov, Ekanathan Palamadai	ANSYS	Physics Simulation	Watch now	FLV	MP4
	2080	Tackling Multi-Gigabit Design Challenges with a Practical Virtual EMI/ESD Lab	Learn about efficient methodologies for performant and cost-effective EMI and ESD suppression techniques by means of massive GPU parallel processing for simulations. We will discuss solving ever more complicated EMI and ESD challenges very early in the design process using in a so called ‘Virtual EMI/ESD lab’.	Davy Pissoort, Amolak Badesha, Hany Fahmy	KHBO-FMEC, Agilent Technologies, NVIDIA	Physics Simulation	Watch now	FLV	MP4
	2090	Developing Highly Scalable Particle-Mesh Codes for GPUs: A Generic Approach	Dive deep into a multi-parallel Particle in Cell code that utilizes MPI, pthreads, and CUDA. Around this specific application a general C++ framework for transparent data transfers between GPUs has been developed and will be presented. Further techniques employed include interleaving of communication and computation, particle tiling and a study of how well CUDA performance can be transferred to OpenCL.	Guido Juckeland, Michael Bussmann	TU Dresden - ZIH, Forschungszentrum Dresden-Rossendorf	Physics Simulation	Watch now	FLV	MP4
	2102	Evacuate Now? Faster-than-real-time Shallow Water Simulation on GPUs	Learn how to simulate a half an hour dam break in 27 seconds! We present how shallow water simulation with interactive visualization is successfully mapped to modern graphics hardware. Featuring a live demo, we will present interactive shallow water simulations running on a standard laptop. The implementation has been verified against analytical and experimental data, supports multi-gpusimulation, and can run up-to 6300x6300 domain sizes at 320 million cells per second on the GTX 480.	André Rigland Brodtkorb	SINTEF ICT	Physics Simulation	Watch now	FLV	MP4
	2112	The Heisenberg Spin Glass Model on GPU: Myth versus Fact	Dive into implementations of the 3D Heisenberg spin glass model for GPUs. We will discuss results showing that fast shared memory gives better performance with respect to slow global memory only under certain conditions. Covers careful kernel tuning to achieve significant speedup with respect to a state-of-art high end multicore processor.	Massimo Bernaschi	Istituto Applicazioni del Calcolo - C.N.R.	Physics Simulation	Watch now	FLV	MP4
	2137	CUDA for Real-Time Multigrid Finite Element Simulation of Soft Tissue Deformations	The take-away of this presentation is an efficient CUDA implementation of a finite hexahedra multigrid solver for simulating elastic deformable models in real time. Due to the regular shape of the numerical stencil induced by the hexahedral regime, computations and data layout can be restructured to avoid execution divergence and to support memory access patterns enabling the hardware to coalesce multiple memory accesses into single memory transactions. This enables to effectively exploit the GPU's parallel processing units and high memory bandwidth. Performance gains of up to a factor of 12 compared to a highly optimized CPU implementation are demonstrated.	Christian Dick, Joachim Georgii	Technische Universität München	Physics Simulation	Watch now	FLV	MP4	PDF
	2155	GPGPU in the real world. The ABAQUS experience	We describe the ABAQUS experience in integrating GPGPU acceleration into a complex, high performance commercial engineering software. In particular we discuss the trade-off we had to make and the benefits we obtained from this technology.	Luis Crivelli	Dassualt Systems Simulia Corporation	Physics Simulation	Watch now	FLV	MP4
	2231	Driving on Mars, Redux: System Level Simulation of Dynamic Systems	Learn how GPU and HPC computing are used to predict through simulation the dynamics of large complex mechanical systems such as tracked vehicles including the Mars Rover. The presentation outlines the physics based approach and numerical solution methods that enabled the simulation of dynamic systems with millions of bodies on the GPU. The presentation will also explain how a HPC cluster is used to effectively render scenes with tens of thousands of bodies for generating animations that can be used by Engineers in the design process.	Dan Negrut	University of Wisconsin-Madison	Physics Simulation	Watch now
	2246	The challenges of integrating CUDA engines into an existing package, yet not sinking the boat	Based on a true story, come listen to a daring tale about the process of integrating a large CUDA component (physical engine) into an existing product (3D engine) replacing some of its functionality. The architectural difficulties and finer points that needed to be addressed. The tuning and testing of such a large system. While not effecting the stability of the original system.	Eri Rubin	OptiTex	Physics Simulation	Watch now
	2005	Porting Large-Scale Legacy Fortran Codes	Explore a new automatic Fortran translator which has been developed and used to port the numerical subroutines of FEFLO , a general-purpose legacy Computational Fluid Dynamics code operating on unstructured grids, to run on the GPU. Data transfer to the CPU is minimized throughout the course of a CFD run. Benchmarks of large-scale production runs will be presented.	Andrew Corrigan, Rainald Löhner	Naval Research Laboratory & George Mason University, George Mason University	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2011	Fundamental Performance Optimizations for GPUs	This presentation covers the major CUDA optimizations. Topics will include: maximizing memory throughput, kernel launch configuration, using shared memory, and improving GPU/CPU interaction. While C for CUDA is used for illustration, the concepts covered will apply equally to programs written with OpenCL and DirectCompute APIs.	Paulius Micikevicius	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2023	Processing Device Arrays with C++ Metaprogramming	I will describe tricks for building APIs using C++ metaprogramming that generate custom kernels for complex manipulation of device-side arrays in CUDA. Using a variation of Expression Templates, multiple operations can be fused into a single kernel that executes with reasonable efficiency.	Jonathan Cohen	NVIDIA Research	Programming Languages & Techniques	Watch now	FLV	MP4
	2028	Mathematica for GPU Programming	Mathematica is widely used in scientific, engineering, mathematical fields and education. In this session, new tools for general GPU programming in the next release of Mathematica are presented. These tools build on top of Mathematica’s technology which provides a simple, yet powerful, interface to the large base of compiling tools. Applications of CUDA and OpenCL from within Mathematica will be presented. These examples will provide a general overview of the powerful development environment for GPU programming that Mathematica can offer not just for researchers but for anybody with basic knowledge of Mathematica and GPU programming.	Ulises Cervantes-Pimentel	Wolfram Research	Programming Languages & Techniques	Watch now			PDF
	2067	Experiences with Code Optimizations for High Performance GPGPU Programs	Attend this session to learn and share code optimizations to achieve high performance GPU computing. We will cover code transformations for memory coalesing, workload management at both thread and thread-block levels, and different ways to handle memory partition conflicts. We will also discuss Integration of code optimizations into a compiler.	Huiyang Zhou, Yi Yang	North Carolina State University	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2124	Operating System Abstractions for GPU Programming	GPGPU frameworks such as CUDA improve programmability, but GPU parallelism remains inaccessible in many application domains. This session argues that poor OS support causes this problem. OSes do not provide the kind of high-level abstractions for GPUs that applications expect for other resources like CPUs and file systems. We advocate reorganizing kernel abstractions to support GPUs as first-class computing resources, with traditional guarantees such as fairness and isolation. We demonstrate shortcomings in Windows 7 GPU support, and show that better OS abstractions can accelerate interactive workloads like gesture recognition by a factor of 10X over a CUDA implementation.	Christopher Rossbach, Emmett Witchel	Microsoft Research, University of Texas at Austin	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2167	Designing a Geoscience Accelerator Library Accessible from High Level Languages	Explore a library for geoscience applications on CUDA and OpenCL platforms. Target applications span atmosphere, ocean, geomorphology and porous media flows. These areas are linked by common numerical techniques encapsulated in our library. We will review the scope of the library, its meta-programming approaches, and its key design attributes. We will also demonstrate its support for multi-GPU parallelism within and across address spaces and provide examples of is use from high level languages including C, Fortran, and Python.	Chris Hill, Alan Richardson	M.I.T	Programming Languages & Techniques	Watch now
	2212	Parallel Nsight for Accelerated DirectX 11 Development [Advanced]	Parallel Nsight is NVIDIA's new development environment for graphics and GPU computing. In this advanced session, you will learn how Parallel Nsight can accelerate debugging and profiling of Direct3D 11 applications. Attendees will learn how to debug Direct3D frames and HLSL shaders using Parallel Nsight's powerful Graphics Inspector and Debugger which allows developers to inspect Direct3D resources and state, set breakpoints in HLSL shaders, examine shader variables, and see which graphics primitives are live on the GPU. Attendees will also learn how to use the Frame Profiler to capture and mine performance information, and easily pinpoint bottlenecked GPU units.	Simon Barrett	NVIDIA	Programming Languages & Techniques	Watch now	FLV	MP4	PDF
	2278	Strategies for Code Encapsulation in GPU Implementations	Code encapsulation is a common technique used to reduce code complexity that a given programmer has to understand. It allows the use of increasingly complex systems of hardware, software, and algorithms to tackle increasingly difficult scientific problems. Unfortunately, code encapsulation is not easily attainable in current GPU environments. We will share our OpenCL development experiences for achieving partial encapsulation in GPU implementations, and discuss best practices in this area.	Brian Cole	OpenEye Scientific Software	Programming Languages & Techniques	Watch now
	2281	Domain-Specific Languages	Computer graphics has introduced several domain-specific languages (DSLs) that enable high performance and parallelism for narrow problem domains - RenderMan, Cg, GLSL, and recently OpenRL and OptiX. We think that similar approaches can benefit other areas of GPU computing - visualization, animation, physics simulation, or scientific data analysis. In this talk, we present Shadie, a domain-specific shading language for rapid development of complex custom volume visualizations in radiation oncology. The shaders are written in a high-level Python-like language and translated to CUDA for efficiency. We will explain how you can develop your own DSLs using source-to-source translation and a suitable backend library.	Milos Hasan, Hanspeter Pfister	Harvard University	Programming Languages & Techniques	Watch now
	2294	GPU.NET with TidePowerd	Join TidePowerd for a demonstration of GPU.NET, our innovative new product which dramatically cuts the time needed to develop and maintain a GPU-based application by extending Microsoft's .NET Framework onto GPUs. With GPU.NET, your device-accelerated code can be written in any .NET-supported language (e.g., C#, F#, IronPython) and called like any other method - so it's easy to create new GPU-based applications without having to retrain your developers. You'll learn how to use GPU.NET to quickly develop a financial calculator in C#, use the built-in Visual Studio unit-testing tools to ensure the correctness of the code, and seamlessly deploy the application into a mixed Windows / Linux environment. We'll also discuss how GPU.NET expands the frontiers of GPU computing into lucrative new markets such business intelligence, database processing, and data visualization.	Jack Pappas	TidePowerd	Programming Languages & Techniques	Watch now			PDF
	2296	CUDA Optimization for Ninjas: A Case Study of High-Performance Sorting	In this presentation, we use our implementation for high performance radix sorting as a case study for illustrating advanced design patterns and idioms. These techniques have allowed us to demonstrate Fermi sorting rates that exceed 1.0 billion 32-bit keys per second (and over 770 million key-value pairs per second), making it the fastest fully-programmable micro-architecture for this genre of sorting problems. Although the CUDA programming model is elegantly decoupled from any particular hardware configuration, we present techniques for exploiting knowledge of the NVIDIA GPU machine model in order to produce more efficient implementations. Our design patterns enable the compiler to specialize a single program text for a variety of architectures, resulting in target code that “fits” the underlying hardware significantly better than more general approaches. In particular, we discuss strategies for kernel fusion, warp-synchronous programming, flexible granularity via meta-programming, algorithm serialization, and data-movement tuning.	Duane Merrill	University of Virginia	Programming Languages & Techniques	Watch now
	2047	Bridging Ray and Raster Processing on GPUs	Explore new techniques in real time rendering. We will discuss a system for ray traced global illumination (GI) carefully integrated with a traditional raster renderer using an incremental irradiance cache. Covers novel GPU methods for spawning secondary GI rays on only visible cells, smoothly sampling the visible 3D cache into 2D, and incrementally ray traced spherical harmonics basis. Details applying a range of optimizations to achieve real-time frame rates with the OptiX ray tracing engine.	Kenny Mitchell	Black Rock Studio		Not Available	FLV	MP4
	2074	Driving a Product from Rasterization to Ray Tracing: The Developer Experience	Learn from the challenges encountered while using DirectX to update the Bunkspeed Move rasterization engine to work with Mental Images' iRay. This work was part of the creation of Bunkspeed Shot, which allows the user to leverage both the high quality image generation of iRay and a highly interactive, good quality rasterization engine (used for quick setup of a scene). Covers major differences between a ray tracing based interactive system, including GPU based ray tracing, and a traditional GPU rasterization engine.	Nicolas Gebbie	Bunkspeed	Ray Tracing	Watch now	FLV	MP4
	2250	GPU Ray Tracing Exposed: Under the Hood of the NVIDIA OptiX Ray Tracing Engine	Take a deep dive into many of the design choices and implementation details of the NVIDIA OptiX ray tracing engine. Learn how domain specific compilation, a unique execution model and a general object model, are combined into a flexible and powerful API.	Steve Parker, Austin Robison, Phillip Miller	NVIDIA	Ray Tracing	Watch now	FLV	MP4
	2003	Using CUDA to Accelerate Radar Image Processing	Come see how current GPU technology provides the means for the first portable real-time radar image processing algorithm. This session will outline how the GPU has afforded nearly three orders of magnitude improvement in performance for Synthetic Aperture Radar's (SAR) hallmark image processing algorithm. We will present algorithm details and further improvements.	Aaron Rogan	Neva Ridge Technologies	Signal processing	Watch now	FLV	MP4	PDF
	2126	Accelerating Signal Processing: Introduction to GPU VSIPL	Learn how to use the Vector Signal Image Processing Library to accelerate signal processing applications without needing to understand platform-specific programming and optimization techniques. We will discuss how GPU VSIPL implements the VSIPL API and uses CUDA-capable GPUs to maximize performance of several example applications.	Dan Campbell	Georgia Tech Research Institute	Signal processing	Watch now	FLV	MP4	PDF
	2043	Disparity Map Generation	Explore the algorithms and implementation of disparity maps on the GPU. We will discuss how a disparity map facilitates stereoscopic content creation, applications and approaches tried, and final results of real time calculations on GPUs.	Henry Gu	GIC	Stereoscopic 3D	Watch now	FLV	MP4
	2107	Accelerating Stereographic and Multi-View Images Using Layered Rendering	Explore applications of geometry shaders in improving the performance of stereo pair or multi-viewer image generation. This session will cover the basic approach of single-pass stereo-pair creation and provides guidelines for when layered rendering can be used to increase performance. A particular emphasis will be placed on virtual reality and scientific visualization, but the techniques discussed apply to a wide range of rendering environments. Results will be shown for three GPU architectures, including the new GF100 GPU.	Jonathan Marbach	TerraSpark Geosciences, LLC	Stereoscopic 3D	Watch now	FLV	MP4
	2241	Standing Out: Implementing a Great Stereo UI	Learn how to make S3D compatible user interfaces, HUDs, and in-game menus. The first part of this session will outline the common problems users encounter when displaying traditional 2D UI in stereoscopic 3D. The second part will focus on the different techniques, tips/tricks, and best practices developers can use to create high-quality S3D interfaces. The presentation will highlight examples from several shipped titles, as well as showcase a complete 3D UI game demo running in S3D on multiple devices including PC and mobile.	Brendan Iribe	Scaleform	Stereoscopic 3D	Watch now
	2002	CUDA Debugging on Linux and MacOS with cuda-gdb	Boost your development speed by mastering the CUDA debugging tools NVIDIA provides. In this session you will learn the basics of cuda-gdb and cuda-memcheck, as well as their more advanced features with live demonstrations on Linux and MacOS.	Satish Salian	NVIDIA	Tools & Libraries	Watch now	FLV	MP4
	2008	OpenCL Optimization	Learn how to optimize your OpenCL application to achieve maximum performance on NVIDIA GPUs. We will first briefly discuss how the OpenCL programming model maps onto NVIDIA GPU’s architecture. We will then talk about memory, instruction, and NDRange optimization techniques, illustrating each with small code samples.	Peng Wang	NVIDIA	Tools & Libraries	Watch now			PDF
	2012	Analysis-Driven Performance Optimization	The goal of this session is to demystify performance optimization by transforming it into an analysis-driven process. There are three fundamental limiters to kernel performance: instruction throughput, memory throughput, and latency. In this session we will describe: •how to use profiling tools and source code instrumentation to assess the significance of each limiter; •what optimizations to apply for each limiter; •how to determine when hardware limits are reached. Concepts will be illustrated with some examples and are equally applicable to both CUDA and OpenCL development. It is assumed that attendees are already familiar with the fundamental optimization techniques.	Paulius Micikevicius	NVIDIA	Tools & Libraries	Watch now			PDF
	2039	GPU Debugging with Allinea DDT	Discover how a debugger can help you fix those hard to find bugs in your GPU software, with this introduction to the special CUDA features in Allinea DDT.	David Lecomber	Allinea Software	Tools & Libraries	Watch now	FLV	MP4
	2041	PyCUDA: Even Simpler GPU Programming with Python	Explore PyCUDA, a robust, open-source toolkit that lets you control your GPU from the comfort of Python, a Matlab-like scripting language. Learn about Fermi tuning with PyCUDA, the new interfaces for CUBLAS and CUFFT, the ecosystem of third-party libraries built on PyCUDA, and examples illustrating PyCUDA's benefits to large-scale applications.	Andreas Kloeckner	Courant Institute, NYU	Tools & Libraries	Watch now	FLV	MP4	PDF
	2050	Copperhead: Data-Parallel Python for the GPU	Learn how to write Python programs that execute highly efficiently on GPUs using Copperhead, a data-parallel Python runtime. Using standard Python constructs like map and reduce, we will see how to construct data-parallel computations and embed them in Python programs that interoperate with numerical and visualization libraries such as NumPy, SciPy and Matplotlib. We will examine how to express computations using Copperhead, explore the performance of Copperhead programs running on GPUs, and discuss Copperhead's runtime model, which enables data-parallel execution from within Python.	Bryan Catanzaro	University of California, Berkeley	Tools & Libraries	Watch now	FLV	MP4
	2053	Pixel Bender: Building a Domain Specific Language on the GPU	Examine the challenges and advantages of building the Pixel Bender domain specific language for image processing for the GPU. We will examine how Pixel Bender was made to work within several Adobe applications across a wide range of hardware systems and platforms.	Bob Archer	Adobe Systems Inc	Tools & Libraries	Watch now	FLV	MP4	PDF
	2070	CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices	The CUSPARSE library can impact and enable software solutions for computational science and engineering problems in the fields of energy exploration, physical simulations and life sciences among many others. It provides sparse linear algebra primitives that can be used to implement iterative linear system and eigenvalue solvers and can also serve as a building block for the state-of-the-art sparse direct solvers. CUSPARSE library is implemented using CUDA parallel programming model and provides sparse analogs to BLAS level-1,2,3 operations, such as matrix-vector multiplication, triangular solve and format conversion routines.	Maxim Naumov	NVIDIA	Tools & Libraries	Watch now	FLV	MP4	PDF
	2109	Migration of a Complete 3D Poisson Solver from Legacy Fortran to CUDA	We describe our journey of migrating a legacy direct solver library for Poisson equations written in Fortran77 to CUDA in order to harness the computational power provided by the Tesla device (“Fermi”). This legacy library is still widely used today as it is the most complete library that can deal with three different boundary conditions (Dirchlet, Neumann and Cyclic) and two grid configurations (staggered and centered) independently in any of the three dimensions (x, y, z); giving a total of over 200 configurations.	Huynh Phung Huynh	A*STAR Institute of High Performance Computing	Tools & Libraries	Watch now	FLV	MP4
	2111	Using R for High-Performance Data Analysis	Data analysis is the art and the science of getting the correct quantitative models and their numerical parameters from the observed data. In this talk, we report on a project to integrate CUDA into the open source data analysis environment R. The combined use of the CPU and GPU resources can efficiently exploit the significant amount of data parallelism inherent in most data analysis problems and methods. This makes interactive analysis possible even for large, compute-intensive problems. The implementation and the achievable performance gains will be demonstrated on a concrete example from quantitative finance.	Domokos Vermes	Worcester Polytechnic Insitute	Tools & Libraries	Watch now	FLV	MP4
	2143	CUDA Fortran Programming for NVIDIA GPUs	An introduction to programming NVIDIA GPUs using CUDA Fortran. Suitable for expert Fortran or CUDA C programmers who need to extract maximum performance from GPUs using an explicit GPU Fortran programming model. Introduces the CUDA Fortran language, and through examples, illustrates how to explicitly program GPUs in native Fortran 95/03 through creation of GPU kernel subroutines, management of host and device memory, definition of CUDA grids and thread blocks, launching kernels, and use of the CUDA Fortran runtime API. This talk includes a live component with a Windows laptop containing an NVIDIA GPU and the PGI CUDA Fortran compiler.	Brent Leback	The Portland Group	Tools & Libraries	Watch now
	2148	Rapid Prototyping and Visualization with OpenCL Studio	Learn about OpenCL Studio, an integrated OpenCL and OpenGL development environment for parallel programming and visualization. We will discuss building end user applications and using its integrated visualization capabilities to better understand the output and internal structure of parallel algorithms. We will also demonstrate its capabilities using several sample applications including particle systems, volumetric rendering, and image processing.	Jochen Stier	Geist Software Labs	Tools & Libraries	Watch now		MP4
	2149	Overview of Parallel Nsight for Visual Studio	NVIDIA Parallel Nsight provides access to the power of the GPU from within the familiar environment of Microsoft Visual Studio. This session is an entry level overview of the GPU computing and graphics development features of Parallel Nsight as well as a glimpse into the future of this powerful tool.	Kumar Iyer	NVIDIA	Tools & Libraries	Watch now	FLV	MP4
	2150	Parallel Nsight: Debugging Massively Parallel Applications [Advanced]	Data parallel algorithms that provide real-time financial options pricing or identification of hidden oil reserves are utilizing the massively parallel nature of the GPU for industry changing performance gains. Developers require industry standard development tools to create the software that accomplishes these parallel tasks. NVIDIA Parallel Nsight delivers the power of the GPU within the familiar environment of Microsoft Visual Studio. In this session, you will learn advanced techniques for debugging CUDA C/C++ and DirectCompute code using Parallel Nsight, including conditional and data breakpoints as well as out of bound GPU memory access detection.	Sebastien Domine	NVIDIA	Tools & Libraries	Watch now	FLV	MP4	PDF
	2151	Parallel Nsight: Analyzing and Optimizing Massively Parallel Applications [Advanced]	Life altering products that provide early detection of breast cancer or simulate molecular behavior, accelerating drug discovery, are becoming reality thanks to the power of the GPU. As these technologies become mainstream, mainstream tools are required to support these development efforts. NVIDIA Parallel Nsight delivers the power of the GPU within the familiar environment of Microsoft Visual Studio. In this session, you will learn advanced techniques for visualizing your application's workloads and performance characteristics across the CPU, GPU, and operating system, and explore the depths of Parallel Nsight profilers, including GPU performance counters and how to use them.	Sebastien Domine	NVIDIA	Tools & Libraries	Watch now	FLV	MP4	PDF
	2156	GMAC: Global Memory For Accelerators	Learn how to use GMAC, a novel run-time for CUDA GPUs. GMAC unifies the host and device memories into a unified virtual address space, enabling the host code to directly access the device memory, and removing the need for data transfers between host and device memories. Moreover, GMAC also allows pointers to be used by both, the host and device code indistinctly. This session will present the GMAC run-time and show how to use it in current applications. This session will cover from the basics of GMAC to multi-threaded applications using POSIX threads, OpenMP and MPI.	Isaac Gelado	Universitat Politecnica de Catalunya	Tools & Libraries	Watch now	FLV	MP4
	2160	StarPU: a Runtime System for Scheduling Tasks	See how StarPU provides task scheduling facilities for a hybrid platform and a powerful data management library that transparently takes care of data across the entire machine. We will discuss the significant performance improvements resulting from its flexible scheduler as well as its ability to mix parallel CPU kernels (eg. written in OpenMP or TBB) with CUDA/OpenCL and MPI.	Cedric Augonnet	INRIA	Tools & Libraries	Watch now			PDF
	2164	Analytical Performance Models to Improve the Efficiency of GPU Computing	Dive deep into a simple analytical model that provides insight into performance bottlenecks of parallel applications on GPU architectures. We will discuss how the model estimates the execution time of massively parallel programs. We will also cover how to optimize applications based on our developed performance analysis models.	Hyesoon Kim	Georgia Tech	Tools & Libraries	Watch now
	2176	Easy GPU Meta-programming: A Case Study in Biologically-Inspired Computer Vision	Learn how to let the computer optimize your CUDA and OpenCL code for you with easy GPU Meta-programming and Scripting (e.g. PyCUDA). We will present a case study in which we consider the step-wise optimization of a 3D filter bank convolution, using a suite of open-source tools.	Nicolas Pinto,David Cox	MIT, Harvard University	Tools & Libraries	Watch now	FLV	MP4
	2177	Simplifying Parallel Programming with Domain Specific Languages	Explore a new approach in parallel programming which leverages Domain Specific Languages (DSLs) to simplify programming heterogeneous systems (multi-core processors and GPUs). This approach allows DSL users to take advantage of the power of GPUs without having working knowledge of lower level programming models such as CUDA. Topics will cover the advantages of the DSL approach in parallel programming, and the runtime implementation details with optimizations to have the performance benefits of using GPUs.	HyoukJoong Lee, Hassan Chafi	Stanford University	Tools & Libraries	Watch now			PDF
	2179	GPU - An R Library for Native GPU Objects	Come learn about the GPU R package. R is the widely popular open source statistical programming language. The GPU package extends R by providing GPU-based types, classes and methods implementing GPU versions of R vectors, matrices, lists and data frames. Subsequent operations with these are executed on the GPU. Users are not required to create special bindings or implement special syntax, nor do they need copy objects between CPU and GPU. The GPU packages allows programmers access to the computational power of GPUs with little modification to existing code.	Christopher Brown	Decision Patterns	Tools & Libraries	Watch now	FLV	MP4
	2202	A Programming Model and Tool for Automatic High-Performance C to CUDA Mapping	Discover our automatic C-to-CUDA mapper prototype, and how it optimizes execution and data movement for a broad class of loop codes. Coupled with our powerful mapper, C as an input language does not only offer portability but also performance and performance portability. Learn about our optimizations and some of the performance obtained through different uses of the mapper.	Benoit Meister	Reservoir Labs	Tools & Libraries	Watch now	FLV	MP4	PDF
	2210	GPU-Ocelot: An Open Source Debugging and Compilation Framework for CUDA	Learn how to debug and profile CUDA applications using GPU-Ocelot. Ocelot is a compilation and emulation framework for CUDA that includes debugging and profiling tools as well as backend compilers for NVIDIA GPUs and x86 CPUs. We will present examples of applications developed on x86 CPUs and deployed on NVIDIA GPUs. We will also discuss memory checking, race detection, and deadlock detection tools available within Ocelot.	Gregory Diamos, Andrew Kerr, Sudhakar Yalamanchili	Georgia Institute of Technology	Tools & Libraries	Watch now	FLV	MP4
	2213	BCSLIB-GPU: Significant Performance Gains for CAE	Hear product architects and developers describe the algorithmic depths and high level breath of the use of GPUs that have been employed to create BCSLIB-GPU, the GPU enablement of the industry standard sparse matrix software suite, BCSLIB-EXT. We provide a range of comparison data with Tesla and Fermi compared with multi-core CPU only systems and for a wide range of realisitic demanding real world test problems.	Danl Pierce	Access Analytics Int'l, LLC	Tools & Libraries	Watch now	FLV	MP4
	2216	CUDA Libraries Open House	Learn about NVIDIA’s CUDA libraries and meet the engineers that develop them. Lead developers will cover the capabilities, performance and future directions for NVIDIA’s CUFFT, CUBLAS, CURAND, and NPP libraries (other libraries such as CUSPARSE and open source Thrust are covered in other talks). After the presentation, NVIDIA developers will remain in the room to chat and answer questions during the lunch break.	Ujval Kapasi, Philippe Vandermersch, Elif Albuz, Nathan Whitehead, Frank Jargstorff	NVIDIA	Tools & Libraries	Watch now			PDF
	2219	High-Productivity CUDA Development with the Thrust Template Library	Thrust is a parallel template library for developing CUDA applications. Modeled after the C++ Standard Template Library (STL), Thrust brings a familiar abstraction layer to the realm of GPU computing. Thrust provides host and device variants of the STL vector container to simplify memory management and facilitate data transfers. These containers are complemented with a large collection of generic data-parallel algorithms and a suite of useful iterator adaptors. Together, these features form a flexible high-level interface for GPU programming that greatly enhances developer productivity. In this session we'll discuss Thrust's features and explain the basic design philosophy of the library.	Nathan Bell	NVIDIA Research	Tools & Libraries	Watch now	FLV	MP4	PDF
	2220	Thrust by Example: Advanced Features and Techniques	Thrust is a parallel template library for developing CUDA applications which is modeled after the C++ Standard Template Library (STL). In this session we'll show how to implement decompose problems into the algorithms provided by Thrust. We'll also discuss the performance implications of "kernel fusion" and "array of structs" vs. "structure of arrays" memory layouts and how they relate to Thrust. Lastly, we'll present evidence that Thrust implementations are fast, while remaining concise and readable.	Jared Hoberock	NVIDIA	Tools & Libraries	Watch now			PDF
	2225	Tools for Managing Clusters of NVIDIA GPUs	Learn about the suite of tools NVIDIA provides to manage large installations of GPUs from the NVIDIA Tesla Series. The presentation will cover cluster management – tool and library –, as well as the GPUDirect technology that enables GPUs to communicate faster across the network.	Peter Buckingham, Andrew Iles	NVIDIA	Tools & Libraries	Watch now	FLV	MP4	PDF
	2249	New Programming Tools GPU Computing	This session will focus on new parallel programming tools for GPU computing. The type of tools that fit into the session include (1) Planning tools for porting legacy applications to use GPU computing, (2) High-level programming and scripting tools for GPU computing, (3) Automation of common performance optimizations for GPU computing, (4) Performance analysis and diagnosis tools for GPU computing, (5) Tools that simplify heterogeneous parallel computing.	Andrew Schuh	University of Illinois	Tools & Libraries	Not Available
	2249	New Programming Tools GPU Computing	This session will focus on new parallel programming tools for GPU computing. The type of tools that fit into the session include (1) Planning tools for porting legacy applications to use GPU computing, (2) High-level programming and scripting tools for GPU computing, (3) Automation of common performance optimizations for GPU computing, (4) Performance analysis and diagnosis tools for GPU computing, (5) Tools that simplify heterogeneous parallel computing.	Wen-mei Hwu	University of Illinois	Tools & Libraries	Watch now
	2251	TotalView Debugger for CUDA	Hear how the TotalView debugger is being extended to support GPU computation with CUDA. In addition to the basic challenges associated with debugging parallel programming, CUDA programming introduces a number of new concepts for which developers need visibility in debugging: a hierarchical memory, near-SIMD warps, streams, and kernels, among others. How do we create a tool that handles it all? We'll be discussing the status of our work and the challenges encountered in bringing this all together into a single package, TotalView for CUDA.	Chris Gottbrath	TotalView Technologies, Inc., a Rogue Wave Software company	Tools & Libraries	Watch now
	2267	GPU Computing with MATLAB®	MATLAB is a widely used tool for scientific, engineering and financial applications. As the popularity of GPUs has grown, there is strong interest from engineers and scientists who solve computationally intensive problems to be able to leverage GPUs within MATLAB and other products from MathWorks. This talk will discuss how MathWorks tools can help engineers and scientist to take advantage of GPU resources while continuing to work in the familiar MATLAB environment. A range of capabilities will be discussed and demonstrated.	Loren Dean	MathWorks	Tools & Libraries	Watch now	FLV	MP4	PDF
	2271	Seven Tricks We Learned to Get Top Performance	Many people find the performance a first draft CUDA implementation unimpressive. We'll show seven techniques we used to push Jacket's convolution and matrix multiply code from vanilla to record performance. Included is a rapid fire coverage of various topics: scaling from mobile cards to Fermi, balancing register pressure, counting FLOPS, loop unrolling, efficient memory addressing, guarded memory access, and more. We'll also debut Jacket's new C/C++ interface aimed at providing the same performance and functionality already available in Jacket's GPU support for MATLAB.	James Malcolm	AccelerEyes	Tools & Libraries	Watch now
	2272	GStream: A General-Purpose Data Streaming Framework on GPUs	We present GStream, a general-purpose, scalable and C++ template run-time framework amenable to both the streaming problem and GPU architectures. GStream offers transparent streaming data transmissions and automatic memory synchronization over a rich collection of computing resources that are transparently allocated and reused. Various problems other than streaming application, such as scientific computing, numerical codes and text processing, can b easily expressed using GStream and subsequently integrated with our GStream library. GStream's ease of use combined with efficient exploitation of GPU resources have the potential to lead to higher coding productivity and application performance through our data-centric specification paradigm.	Xing Wu, Frank Mueller	North Carolina State University	Tools & Libraries	Watch now
	2297	Developing CUDA Accelerated .NET Plugins for Microsoft Excel	Quantifi will demo its xLDevelopment environment, which provide developers with an easy to use development environment which allows cuda functionality to be in Microsoft Excel. With as little as four lines, one will also select the position of the function in the menu bar, xml markup language will display in the excel help functionality, and objects can be easily added to the object cache. These objects can then be inspected by the end user or developer. Performance information can also be displayed in the object cache. The environment provides the developer an environment where he can focus on developing high performance functionality, and all intermediate layers of interface are taking care of by the environment.	Peter Decrem	Quantifi	Tools & Libraries	Watch now	FLV	MP4	PDF
	2299	Integrating CUDA BLAS with IMSL Fortran	As GPU hardware becomes more prevalent in both research and commercial institutions, software that takes advantage of this specialized hardware is growing in demand. In many cases, it is infeasible or impossible to rewrite an existing program to run entirely on the GPU, so the goal is often to offload as much work as possible. As the IMSL Library team at Rogue Wave Software considers how best to tackle the GPU realm with a general mathematical library, the IMSL Fortran Library takes an initial step where the CUDA BLAS library is utilized to offload CPU work to GPU hardware. This presentation will discuss the approach and architecture of the solution. Benchmark results will show where success has been found. Plans for future products will also be covered.	Chris Gottbrath	TotalView Technologies, Inc., a Rogue Wave Software company	Tools & Libraries	Watch now	FLV	MP4
	2016	VDPAU: PureVideo on Unix	Learn about VDPAU (Video Decode and Presentation API for Unix). VDPAU provides GPU-accelerated video decoding, post-processing, UI compositing, and display on Unix. VDPAU also supports sharing surfaces with OpenGL and CUDA ("interop"). This allows developers to implement their own post-processing algorithms or scene analysis, or to use decoded video surfaces as part of a scene rendered using OpenGL.	Stephen Warren	NVIDIA	Video Processing	Watch now	FLV	MP4	PDF
	2027	GPU-Based Image Processing in Military Applications	There are more than 6000 Unmanned Aerial Vehicles (UAVs) in use in the US Military. The US Army alone has flown more than 1 million UAV flight hours. Every UAV captures at least one stream of video; some as many as 9. All this video needs to be processed and analyzed both during the mission, and post-mission. Traditionally, custom ASICs, and FPGAs were required for even the most rudimentary image processing tasks. Now, GPUs provide orders of magnitude more compute at a fraction of the cost. Hear how MotionDSP uses GPUs to provide previously impossible capabilities to military imaging.	Sean Varah	MotionDSP Inc.	Video Processing	Watch now	FLV	MP4
	2048	H.264/AVC Video Encoding with CUDA and OpenCL	Join experts from MainConcept, a leading provider of video codecs to the professional market, as they demonstrate the latest version of their CUDA-based H.264/AVC Encoder.	Thomas Kramer	MainConcept	Video Processing	Watch now	FLV	MP4
	2075	GPU-Accelerated Video Encoding	Learn how to accelerate video encoding using the GPU. We will give an overview of the typical video encoding pipeline and discuss how different parts of the pipeline can be ported to GPU using various approaches. We will focus on block-based Motion Estimation, in particular, as it is the corner stone of video encoding algorithms. The efficiency of its implementation on the GPU is crucial to the speed and quality of the encoder.	Anton Obukhov	NVIDIA	Video Processing	Watch now	FLV	MP4	PDF
	2087	Fast High-Quality Panorama Stitching	We present a panorama stitching application implemented with CUDA C on the GPU. The image processing pipeline consist of SIFT feature detection and matching and Graphcut image stitching to achieve high-quality results. We demonstrate live panorama creation with a Webcam.	Timo Stich	NVIDIA	Video Processing	Watch now	FLV	MP4	PDF
	2095	Building High Density Real-Time Video Processing Systems	Learn how GPU Direct can be used to effectively build real time, high performance, cost effective video processing products. We will focus especially on how to optimize bus throughput while keeping CPU load and latency minimal.	Ronny Dewaele	Barco	Video Processing	Watch now	FLV	MP4
	2121	Maximizing Throughput of Barco's GPU-Enabled Video Processing Server	Find out how Imec middleware realizes the full potential of GPU-enabled video processing servers to manage multiple video processing pipelines. We will discuss how the middleware monitors GPU and CPU execution to best balance the load. Covers how we achieved a 30% increase in throughput with only a minimal 0.05% overhead on Barco's GPU-enabled video processing server.	Maja D'Hondt	imec	Video Processing	Watch now	FLV	MP4	PDF
	2224	GPU Acceleration in Adobe Creative Tools	Hear experts explain how Adobe Creative Suite 5 harnesses the power of CUDA technology in several of its core software applications. We will focus on the complete redesign of the core video playback and rendering engine in Adobe Premiere Pro CS5 and how it uses the power of GPUs to deliver superior performance and change the game for Adobe in professional video production.	Paul Young, Steve Hoeg, Al Mooney	Adobe	Video Processing	Watch now	FLV	MP4
	Emerging Companies Summit Presentations							More Files Coming Soon
	ID	Title	Abstract	Speakers	Affiliation	Topic Area(s)		Downloads
	4000	Emerging Companies Summit Opening Address	The Emerging Companies Summit is a unique forum for startup companies to showcase innovative applications that leverage the GPU to solve visual and compute-intensive problems. The Opening Address includes an overview of NVIDIA’s GPU ecosystem development activities and an interaction on stage with selected companies building groundbreaking applications on top of the GPU platform. The ECS is a great opportunity to discover new players in the GPU ecosystem, find great investments, explore partnership opportunities, network/ build relationships, and discuss the future of an industry that is reshaping computing.	Jeff Herbst	NVIDIA	General Interest		FLV		PDF
	4001A	Emerging Companies: CEO on Stage featuring Elemental Technologies	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Elemental Technologies - covering the field of video processing. Find this session at 5 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Sam Blackman	Elemental Technologies, Inc.	General Interest		FLV		PDF
	4001B	Emerging Companies: CEO on Stage featuring Mersive	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Mersive - covering the field of imaging. Find this session at 20 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Rob Balgley	Mersive	General Interest		FLV		PDF
	4001C	Emerging Companies: CEO on Stage featuring Geomerics	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Geomerics - covering the field of computer graphics. Find this session at 35 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Chris Doran	Geomerics	General Interest		FLV		PDF
	4002A	Emerging Companies: CEO on Stage featuring miGenius	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features miGenius - covering the field of cloud computing. Find this session at 5 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Chris Blewitt	miGenius	General Interest		FLV		PDF
	4002B	Emerging Companies: CEO on Stage featuring Allegorithmic	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Allegorithmic - covering the field of mobile devices. Find this session at 20 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Dr Sébastien Deguy	Allegorithmic	General Interest		FLV		PDF
	4002C	Emerging Companies: CEO on Stage featuring Bunkspeed	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Bunkspeed - covering the field of computer graphics. Find this session at 35 minutes into the video. Panelists for this session include Dan’l Lewin (Corporate VP, Microsoft), Drew Lanza (Partner, Morgenthaler), and Jon Peddie (President, JPR) & Jeff Herbst (VP of Business Development, NVIDIA).	Philip Lunn	Bunkspeed	General Interest		FLV		PDF
	4003	Emerging Companies Summit Panel: GPUs for Computer Vision	Moderated by Jon Peddie (President, Jon Peddie Research) The GPU (graphics processing unit) runs advanced applications which are transforming existing industries and creating new ones. Join our panel of leading industry experts as they discuss the latest technology advances in the usage of GPU for Computer Vision, they will cover facial, gesture, human motion, and biometrics recognition, augmented reality, robotic computing and more. Panelists: Joe Stam (Sr. Applications Engineer, NVIDIA) Yoram Yaacovi (CTO & General Manager, Technologies at Microsoft Israel, R&D Center) Sam Cox (CEO, Milabra) Janko Mrsic-Flogel (CTO, Mirriad) Tom Dean (Research Scientist, Google)	Jon Peddie	Jon Peddie Research	General Interest		FLV		PDF
	4004A	Emerging Companies: CEO on Stage featuring empulse GmbH	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features empulse GmbH - covering the field of databases & data mining. Find this session at 5 minutes into the video. Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Michael Hummel	empulse GmbH	General Interest		FLV		PDF
	4004B	Emerging Companies: CEO on Stage featuring Playcast Media Systems	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Playcast Media Systems - covering the field of video processing. Find this session at 20 minutes into the video. Panelists will include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Natan Peterfreund	Playcast Media Systems	General Interest		FLV		PDF
	4004C	Emerging Companies: CEO on Stage featuring Cooliris	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Cooliris - covering the field of computer graphics. Find this session at 35 minutes into the video. Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Austin Shoemaker	Cooliris	General Interest		FLV		PDF
	4005A	Emerging Companies: CEO on Stage featuring Softkinetic	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Softkinetic - covering the field of computer vision. Find this session at 5 minutes into the video. Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Michel Tombroff	Softkinetic	General Interest		FLV		PDF
	4005B	Emerging Companies: CEO on Stage featuring Rocketick	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Rocketick - covering the field of high performance computing. Find this session at 20 minutes into the video. Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Uri Tal	Rocketick	General Interest		FLV		PDF
	4005C	Emerging Companies: CEO on Stage featuring Jedox AG	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Jedox AG - covering the field of databases & data mining. Find this session at 35 minutes into the video. Panelists include Flip Gianos (Partner, Interwest), Charles Carmel (VP of Corporate Business Development, Cisco), Nathan Brookwood (Principal Analyst, Insight64) and Jeff Herbst (VP of Business Development, NVIDIA).	Kristian Raue	Jedox AG	General Interest		FLV		PDF
	4007A	Emerging Companies: CEO on Stage featuring Scalable Display Technologies	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Scalable Display Technologies - covering the field of imaging. Find this session at 5 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	Andrew Jamison	Scalable Display Technologies	General Interest		FLV		PDF
	4007B	Emerging Companies: CEO on Stage featuring RTT	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features RTT - covering the field of computer graphics. Find this session at 20 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	Jeroen Snepvangers	RTT	General Interest		FLV		PDF
	4007C	Emerging Companies: CEO on Stage featuring Aqumin	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Aqumin - covering the field of finance. Find this session at 35 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	Michael Zeitlin	Aqumin	General Interest		FLV		PDF
	4008A	Emerging Companies: CEO on Stage featuring OTOY	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features OTOY - covering the field of cloud computing. Find this session at 5 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	Jules Urbach	OTOY	General Interest		FLV		PDF
	4008B	Emerging Companies: CEO on Stage featuring Universal Robotics	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Universal Robotics - covering the field of machine learning & artificial intelligence. Find this session at 20 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	David Peters	Universal Robotics	General Interest		FLV		PDF
	4008C	Emerging Companies: CEO on Stage featuring ICD	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features ICD - covering the field of mobile devices. Find this session at 35 minutes into the video. Panelists include Norman Winarsky (VP of Ventures, Licensing & Strategic Programs, SRI), Savitha Srinivasan (Corporate Venture Partner, IBM), Rob Enderle (Analyst, Enderle Group) and Jeff Herbst (VP of Business Development, NVIDIA).	David Hayes	ICD	General Interest		FLV		PDF
	4009	Emerging Companies Summit Panel: The "New Normal" For Building Emerging Companies Based On Disruptive Technologies	Moderated by Jeff Herbst (Vice President of Business Development, NVIDIA) Start-ups are facing unique challenges as aresult of the current economic and business environment. Not only is the venture funding environment very difficult, but small companies are finding it increasingly difficult to “break out” of the pack through IPO’s and attractive M&A exits. This panel of experts (which includes VC and corporate investors) attempt to assess the current state of both the public and private markets, and explore various strategies and options for building successful companies in this “new” environment. Topics include traditional forms of equity and debt, angel financing, as well as other creative/strategic financing options (eg. NRE arrangements, strategic partnerships etc.). The discussion promises to be both lively and provocative. Panelists: Garrett Herbert (Partner, M&A Transaction Services, Deloitte & Touche LLP) Peter Kidder (Division Risk Manager, Silicon Valley Bank) Michael Tedescor (Managing Director, Citigroup Global Markets) Andrew T. Sheehan (Managing Director, Sutter Hill Ventures) Eric Jensen (Partner, Business Department Chair, Cooley LLP)	Jeff Herbst	NVIDIA	General Interest		FLV		PDF
	4010A	Emerging Companies: CEO on Stage featuring OptiTex	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features OptiTex - covering the field of physics simulation. Find this session at 5 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Yoram Burg	OptiTex USA Inc.	General Interest		FLV		PDF
	4010B	Emerging Companies: CEO on Stage featuring Useful Progress	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Useful Progress - covering the field of medical imaging & visualization. Find this session at 20 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Sylvain Ordureau	UsefulProgress	General Interest		FLV		PDF
	4010C	Emerging Companies: CEO on Stage featuring NaturalMotion Limited	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features NaturalMotion Limited - covering the field of computer graphics. Find this session at 35 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Torsten Reil	NaturalMotion Limited	General Interest		FLV		PDF
	4011A	Emerging Companies: CEO on Stage featuring Perceptive Pixel	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Perceptive Pixel - covering the field of imaging. Find this session at 5 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Jeff Han	Perceptive Pixel	General Interest		FLV		PDF
	4011B	Emerging Companies: CEO on Stage featuring Cinnafilm	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Cinnafilm - covering the field of film. Find this session at 20 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Lance Maurer	Cinnafilm, Inc.	General Interest		FLV		PDF
	4011C	Emerging Companies: CEO on Stage featuring Total Immersion	See the hottest new technologies from startups that could transform computing. In a lively and fast-paced exchange, the “Emerging Companies Summit - CEO on Stage” sessions feature CEOs from three startups who have 7 minutes and 30 seconds to introduce their companies and 7 minutes and 30 seconds to interact with a panel of industry analysts, investors and technology leaders. This CEO on Stage session features Total Immersion - covering the field of computer vision. Find this session at 35 minutes into the video. Panelists include Bill Tai (General Partner, Charles River Ventures), Paul Weiskopf (Sr. VP of Corporate Development, Adobe), Tim Bajarin (President, Creative Strategies) and Jeff Herbst (VP of Business Development, NVIDIA).	Bruno Uzzan	Total Immersion	General Interest		FLV		PDF