Particle Systems on GPU

Youngho Kim
CIS665: GPU Programming
Building a Million Particle System: Lutz Latta
UberFlow - A GPU-based Particle Engine:
Peter Kipfer et al.
Real-Time Particle Systems on the GPU in
Dynamic Environments: Shannon Drone
GPU-based Particle Systems for Illustrative
Volume Rendering: R.F.P. van Pelt et al.
Nvdia DirectX10,
Soft Particle
Building a Million Particle System:
Lutz Latta
One of the original uses was in the movie Star Trek II
William Reeves (implementor)
Each Particle Had:
• Position
• Velocity
• Color
• Lifetime
• Age
• Shape
• Size
• Transparency
Particle system from Star Trek II: Wrath of Khan
Spacewar: 1962
Uses pixel clouds as
Random motion
Asteroids: 1978
Uses short moving vectors
for Explosions
Probably first “physical”
particle simulation in
Stateless Particle System
◦ A particle data is computed from birth to death
by a closed form function defined by a set of start
values and a current time.
◦ Does not react to dynamic environment
 Fast
State Preserving Particle System
◦ Uses numerical iterative integration methods to
compute particle data from previous values and
changing environmental descriptions.
Movie Clip
Stateless Simulation – Computed particle data
Particle Birth
by closed
form functions
◦ Notime
dynamically changing
of birth &on
Set global function
parameter as
to dynamic
program constants
◦ No
varying data (sim.vertex
in vertex
Render point sprites/triangles/quads
With particle system vertex program
Generation – Particles are generated randomly within a
predetermined location
Particle Dynamics – The attributes of a particle may vary
over time. Based upon equations depending on
Premature Extinction:
◦ Age – Time the particle has been alive
◦ Lifetime – Maximum amount of time the particle can live.
◦ Running out of bounds
◦ Hitting an object (ground)
◦ Attribute reaches a threshold (particle becomes transparent)
Rendered as a graphics primitive (blob)
Particles that map to the same pixels are
◦ Sum the colors together
◦ No hidden surface removal
◦ Motion blur is rendered by streaking based on the
particles position and velocity
CPU based particle systems are limited to
about 10,000 particles per frame in most
game applications
Limiting Factor is CPU-GPU communication
◦ PCI Express transfer rate is about 3 gigabytes/sec
Rendering Passes:
Process Birth and Deaths
Update Velocities
Update Positions
Sort Particles (optional, takes multiple
Transfer particle positions from pixel to
vertex memory
Render particles
Two Textures (position and velocity)
◦ Each holds an x,y,z component
◦ Conceptually a 1d array
◦ Stored in a 2d texture
Use texture pair and double buffering to compute new data
from previous data
Storage Types
◦ Velocity can be stored using 16bit floats
Size, Color, Opacity, etc.
◦ Simple attributes, can be added later, usually computed
using the stateless method
Birth = allocation of a particle
Death = deallocation of a particle
◦ Associate new data with an available index in the
attributes textures
◦ Serial process – offloaded to CPU
◦ Initial particle data determined on CPU also
◦ Must be processed on CPU and GPU
 CPU – frees the index associated with particle
 GPU – extra pass to move any dead particles to unseen
(i.e. infinity, or behind the camera)
 In practice particles fade out or fall out of view
(Clean-up rarely needs to be done)
11 15
26 19
Optimize heap to always return
smallest available index
Velocity Operations
Global Forces
◦ Wind
◦ Gravity
Local Forces
Velocity Damping
Collision Detection
◦ Attraction
◦ Repulsion
F = Σ f0 ... fn
F = ma
a = F/m
If m = 1
Stokes Law of drag
force on a sphere
Fd = 6Πηr(v-vfl)
η = viscosity
r = radius of sphere
C = 6Πηr (constant)
v = particle
vfl = flow
Sample Flow Field
◦ Imitates viscous materials or air resistance
◦ Implement by downward scaling velocity
◦ Self-propelled objects (bee swarms)
◦ Implement by upward scaling velocity
Collisions against simple objects
◦ Walls
◦ Bounding Spheres
Collision against complex objects
◦ Terrain
◦ Complex objects
◦ Terrain is usually modeled as a texture-based
height field
vn = (vbc * n)vbc
vt = vbc – vn
vbc = velocity before collision
vn = normal component of velocity
vt = tangental component of velocity
V = (1-μ)vt – εvn
μ = dynamic friction (affects tangent velocity)
ε = resilience (affects normal velocity)
Integrate acceleration to velocity:
◦ v = vp + a⋅ ∆ t
Integrate velocity to position:
◦ p= pp + v⋅ ∆ t
Computationally simple
Needs storage of particle position and velocity
Integrate acceleration to position:
◦ p = 2pp − ppp a⋅t2
◦ ppp: position two time steps before
Needs no storage of particle velocity
Time step needs to be (almost) constant
Explicit manipulations of velocity
(e.g. for collision) impossible
Odd-even merge sort
◦ Good for GPU because it always runs in constant
time for a given data size
 Inefficient to check whether sort is done on GPU
◦ Guarantees that with each iteration sortedness
never decreases
 Full sort can be distributed over 20-50 frames so it
doesn’t slow down system too much (acceptable visual
A GPU-based Particle Engine:
Peter Kipfer et al.
Major bottleneck
◦ Transfer of geometry to graphics card
Process on GPU if transfer is to be avoided
◦ Need to avoid intermediate read-back also
Requires dedicated GPU implementations
Perform geometry handling for rendering on
Particle advection
◦ Motion according to external forces and 3D force
◦ Depth-test and transparent rendering
◦ Spatial relations for collision detection
◦ Individually colored points
◦ Point sprites
Simple two-pass method using two vertex
arrays in double-buffer mode
◦ Render quad covering entire buffer
◦ Apply forces in fragment shader
Render target
Bind to texture
Buffer 0
Pass 1: integrate
Bind to render target
Buffer 1
Pass 2: render
Bind to vertex array
Additional buffers for state-full particles
Store velocity per particle (Euler integration)
Keep last two positions (Verlet integration)
Simple: Collision with height-field stored as
RGB = [x,y,z] surface normal
A = [w] height
Compute reflection vector
Force particle to field height
Essential for natural behavior
◦ Full search is O(n²), not practicable
◦ Approximate solution by considering only
◦ Sort particles into spatial structure
 Staggered grid misses only few combinations
Check m neighbors to the left/right
◦ Collision resolution with first collider
(time sequential)
◦ Only if velocity is not excessively larger than
integration step size
Real-Time Particle Systems on the
GPU in Dynamic Environments:
Shannon Drone
Storage requirements
Integrating the equations of motion
Saving particle states
Changing behaviors
Need to store immediate particle state (position,
velocity, etc)
Option 1: Store this data in a vertex buffer
◦ Each vertex represents the immediate state of the
◦ Particles are store linearly in the buffer
Option 2: Store this data in a floating point
texture array
◦ First array slice stores positions for all particles.
◦ Next array slice stores velocities, etc.
Accuracy depends on the length of time step
Use Euler integration for these samples
Runge-Kutta based schemes can be more
accurate at the expense of needing to store
more particle state
Integration on the CPU can use read-modifywrite operations to update the state in-place
This is illegal on the GPU (except for
nonprogramming blending hardware)
Use double buffering
Ping-pong between them
Particles are no longer affixed to a
predestined path of motion as in parametric
Changing an individual velocity or position
will change the behavior of that particle.
This is the basis of every technique in this
N-Body problems
Force splatting for N2 interactions
Gravity simulation
Flocking particles on the GPU
Every part of the system has the possibility of
affecting every other part of the system
Space partitioning can help
Treating multiple parts of a system at a
distance as one individual part can also help
Parallelization can also help (GPU)
Our goal is to project the force from one
particle onto all other particles in the system
Create a texture buffer large enough to hold
forces for all particles in the simulation
Each texel holds the accumulated forces
acting upon a single particle
It is O(N2), but it exploits the fast rasterization,
blending, and SIMD hardware of the GPU
It also avoids the need to create any space
partitioning structures on the GPU
Uses force splatting to
project the gravitational
attraction of each
particle onto every
other particle
Handled on the GPU
◦ CPU only sends time-step
information to the GPU
◦ CPU also handles highlevel switching of render
targets and issuing draw
Uses basic boids behaviors
[Reynolds87,Reynolds99] to simulation
thousands of flocking spaceships on the GPU
Cohesion and Alignment
Unlike N-Body problems, Cohesion and
Alignment need the average of all particle states
◦ Cohesion steers ships toward the common center.
◦ Alignment steers ships toward the common velocity
We could average all of the positions and
velocities by repeatedly down-sampling the
particle state buffer / texture
However, the graphics system can do this for us
Graphics infrastructure can already perform
this quick down-sampling by generating mip
The smallest mip in the chain is the average
of all values in the original texture
Extend 2D partitioned grid [Lutz04] to 3D by
rendering the scene into a volume texture
Instance the geometry across all slices
For each instance we clip the geometry to the
For each voxel in the volume texture stores
the plane equation and velocity of the scene
Once having the position buffer, it can gather
paint splotches
Use a gather pixel shader to traverse the position
For each particle in the buffer, the shader
samples its position and determines if it can
affect the current position in the position buffer
If it can, the color of the particle is sent to the
output render target, which doubles as the mesh
GPU-based Particle Systems for
Illustrative Volume Rendering:
IEEE/ EG Symposium on Volume and PointBased Graphics 2008
R.F.P. van Pelt et al.
Interactive GPU-based illustrative framework,
called VolFlies-GPU, for rendering volume data,
exploiting parallelism in both graphics hardware
and particle systems
User-configurable particle systems to produce
stylized renderings from the volume data,
imitating traditional pen and ink drawings
Achieve real-time interaction and prompt
parametrization of the illustrative styles, using an
intuitive GPGPU paradigm
GPGPU paradigm using
transform feedback
Intensive use of both vertex
shaders and geometry shaders,
while fragment shaders are
completely discarded
VolFliesGPU framework
comprises an interactive
illustrative visualization
framework for real-time penand-ink style rendering of
volume data
Stippling: Surface shading by changing the stipple scale
Hatching: Conveying shape by tracing hatch stroke in one
or two directions.
Curvature-based hatching with contours
Direction-based hatching on cone-splatted bone tissue,
combined with a scale-based stippled skin iso-surface
with contours.
Similar styles, with added contours on the bone surface
Interactive framerates
The particle system is useful for graphic
effects for movies, games, and medical
The particle system can be effectively
operated on GPU processing

similar documents