概述
Acceleration Structures可以使得ray tracing复杂度从$O(n)$减少到$O(logn)$.加速比率是非常之高的,但是萦绕而来的有两大问题——高效的构建和高效的tracing,这两个问题必须平衡,以似的每一帧的渲染时间达到最小。
对于静态的场景,可以采用预先构建并且储存Acceleration Structures的方案来削减Acceleration Structures构建的时间,从而每一帧的渲染时间取决于Acceleration Structures的遍历。但是对于动态场景来说,几乎每一帧都会有object的移动或者变形,每次操作都将会导致Acceleration Structures无效化,以至于必须花时间去处理Acceleration Structures。
有两种划分方法——按照空间划分和按照object划分。
按照划分方法来分类,加速结构可分为两类:
- Adaptive结构(object划分):一般是tree,以高效tracing为目标实现良好的orinitive空间排序。例如:Kd-trees,BVHs,bound ing interval hierarchies[1],bounded kd-trees[2].
- Uniform结构(空间划分):选择划分位置是由结构自身决定的,虽然可能会有启发函数来决定划分到什么粒度。例如:grid,octree.
构建Adaptive结构的方法分为自顶向下和自底向上的。前者递归划分,可以根据整个数据集和附加信息来划分;后者则无此优点,他成组构建primitives集,然后对其分层。
递归构建一般有4中操作:
- 终结条件,比如最深深度,目标primitive个数
- 选择划分点,根据不同的启发函数,有不同的划分方式,最简单的比如根据当前volume的信息计算,复杂的比如SAH将会考虑primitive分布并且一般会基于某个cost模型。
- 节点创建
- 后处理,比如重排节点以获取更好的cache效率,或者类似基于Rope的kdtree,构建和优化rope。
SAH( surface area heuristic) SAH是一种概率模型,目标是找到最小化cost函数的分割点:
假设ray均匀分布,那么$p_l$,$p_r$等于ray通过父节点,且通过本节点的概率,使用几何概率进行计算,也就是包围盒的面积比上父节点包围盒的面积。 自顶向下构建近似地贪婪计算cost函数即可找到分割点。 使用SAH还可作为一个终结条件:
静态场景
构建KD-Tree
自顶向下递归构建kdtree的算法复杂度为$O(nlogn)$($n+\frac{n}{2}+\frac{n}{2}+4* \frac{n}{4}+..+n*\frac{n}{n}$).n是primitive个数。
具体算法参考:
class OLPBRKDTreeNode
{
public:
OLPBRKDTreeNode(bool val):leaf(val),left(nullptr),right(nullptr),root(false),state(-1){}
void init_as_leaf(std::vector<int> index);
std::vector<int> index;
OLPBRKDTreeNode* left;
OLPBRKDTreeNode* right;
RBAABB bound;
int split_axis;
float split_pos;
//debug
int depth;
//序列化根节点标记
bool root;
bool leaf;
};
//bd当前节点的AABB,prim图元储存实际储存,cur_index当前节点所属图元的index
static void build_kdtree(const RBAABB& bd, const std::vector<OLPBRGeometry*>& prim, std::vector<int>& cur_index, int depth, OLPBRKDTreeNode*& toparent)
{
if (cur_index.size() < 1)
{
toparent = nullptr;
return;
}
if (depth == 0 || cur_index.size() < single_max_n)
{
toparent = alloc_node(true);
toparent->init_as_leaf(cur_index);
toparent->bound = bd;
toparent->depth = 0;
toparent->split_axis = -1;
toparent->split_pos = -1.f;
return;
}
int axis = 0;
std::vector<int> oa;
std::vector<int> ob;
f32 best_point;
int best_aixs = -1;
RBAABB bda, bdb;
//寻找分割平面
{
float old_cost = MAX_F32;
int best_index = -1;
float inv_ta = 1.f / bd.get_surface_area();
auto d = bd.max - bd.min;
std::vector<split_point_t> tobesplitx;
{
//x
for (int i = 0; i < cur_index.size(); ++i)
{
auto bdi = prim[cur_index[i]]->bound();
tobesplitx.push_back(split_point_t(cur_index[i], true, bdi.min.x - bd.min.x));
tobesplitx.push_back(split_point_t(cur_index[i], false, bdi.max.x - bd.min.x));
}
if (tobesplitx.size() > 0)
std::sort(tobesplitx.begin(), tobesplitx.end());
f32 aa = 0;
f32 ab = 0;
int na = 0, nb = tobesplitx.size()*0.5;
for (int i = 0; i < tobesplitx.size(); ++i)
{
if (tobesplitx[i].low == false) nb--;
aa = ((d.y*d.z) + tobesplitx[i].point*(d.z + d.y)) * 2;
ab = ((d.y*d.z) + (d.x - tobesplitx[i].point)*(d.z + d.y)) * 2;
f32 pa = aa * inv_ta;
f32 pb = ab * inv_ta;
f32 eb_ = (na == 0 || nb == 0) ? eb : 0.f;
eb_ = 0;
f32 cost = trav_t + isc_t*(1.f - eb_)*(pa*na + pb*nb);
if (cost < old_cost)
{
old_cost = cost;
best_aixs = 0;
best_index = i;
}
if (tobesplitx[i].low == true) na++;
}
}
std::vector<split_point_t> tobesplity;
{
//y
for (int i = 0; i < cur_index.size(); ++i)
{
auto bdi = prim[cur_index[i]]->bound();
tobesplity.push_back(split_point_t(cur_index[i], true, bdi.min.y - bd.min.y));
tobesplity.push_back(split_point_t(cur_index[i], false, bdi.max.y - bd.min.y));
}
if (tobesplity.size() > 0)
std::sort(tobesplity.begin(), tobesplity.end());
f32 aa = 0;
f32 ab = 0;
int na = 0, nb = tobesplity.size()*0.5;
for (int i = 0; i < tobesplity.size(); ++i)
{
if (tobesplity[i].low == false) nb--;
aa = ((d.x*d.z) + tobesplity[i].point*(d.z + d.x)) * 2;
ab = ((d.x*d.z) + (d.y - tobesplity[i].point)*(d.z + d.x)) * 2;
f32 pa = aa * inv_ta;
f32 pb = ab * inv_ta;
f32 eb_ = (na == 0 || nb == 0) ? eb : 0.f;
eb_ = 0;
f32 cost = trav_t + isc_t*(1.f - eb_)*(pa*na + pb*nb);
if (cost < old_cost)
{
old_cost = cost;
best_aixs = 1;
best_index = i;
}
if (tobesplity[i].low == true) na++;
}
}
std::vector<split_point_t> tobesplitz;
{
//z
for (int i = 0; i < cur_index.size(); ++i)
{
auto bdi = prim[cur_index[i]]->bound();
tobesplitz.push_back(split_point_t(cur_index[i], true, bdi.min.z - bd.min.z));
tobesplitz.push_back(split_point_t(cur_index[i], false, bdi.max.z - bd.min.z));
}
if (tobesplitz.size() > 0)
std::sort(tobesplitz.begin(), tobesplitz.end());
f32 aa = 0;
f32 ab = 0;
int na = 0, nb = tobesplitz.size()*0.5;
for (int i = 0; i < tobesplitz.size(); ++i)
{
if (tobesplitz[i].low == false) nb--;
aa = ((d.y*d.x) + tobesplitz[i].point*(d.x + d.y)) * 2;
ab = ((d.y*d.x) + (d.z - tobesplitz[i].point)*(d.x + d.y)) * 2;
f32 pa = aa * inv_ta;
f32 pb = ab * inv_ta;
f32 eb_ = (na == 0 || nb == 0) ? eb : 0.f;
eb_ = 0;
f32 cost = trav_t + isc_t*(1.f - eb_)*(pa*na + pb*nb);
if (cost < old_cost)
{
old_cost = cost;
best_aixs = 2;
best_index = i;
}
if (tobesplitz[i].low == true) na++;
}
}
f32 t;
switch (best_aixs)
{
case 0:
for (int i = 0; i < best_index; ++i)
if (tobesplitx[i].low)
oa.push_back(tobesplitx[i].prim_index);
for (int i = best_index + 1; i < tobesplitx.size(); ++i)
if (!tobesplitx[i].low)
ob.push_back(tobesplitx[i].prim_index);
t = tobesplitx[best_index].point / d.x;
bd.split(0, t, bda, bdb);
best_point = tobesplitx[best_index].point;
break;
case 1:
for (int i = 0; i < best_index; ++i)
if (tobesplity[i].low)
oa.push_back(tobesplity[i].prim_index);
for (int i = best_index + 1; i < tobesplity.size(); ++i)
if (!tobesplity[i].low)
ob.push_back(tobesplity[i].prim_index);
t = tobesplity[best_index].point / d.y;
bd.split(1, t, bda, bdb);
best_point = tobesplity[best_index].point;
break;
case 2:
for (int i = 0; i < best_index; ++i)
if (tobesplitz[i].low)
oa.push_back(tobesplitz[i].prim_index);
for (int i = best_index + 1; i < tobesplitz.size(); ++i)
if (!tobesplitz[i].low)
ob.push_back(tobesplitz[i].prim_index);
t = tobesplitz[best_index].point / d.z;
bd.split(2, t, bda, bdb);
best_point = tobesplitz[best_index].point;
break;
default:
break;
}
}
OLPBRKDTreeNode* a, *b;
//处理扁平的bounding box
if (bda.max[axis] - bda.min[axis] <= SMALL_F)
{
a = nullptr;
b = alloc_node(true);
b->init_as_leaf(cur_index);
b->bound = bd;
b->depth = 0;
b->split_axis = -1;
b->split_pos = -1.f;
}
else
{
build_kdtree(bda, prim, oa, depth - 1, a);
build_kdtree(bdb, prim, ob, depth - 1, b);
}
toparent = alloc_node(false);
toparent->bound = bd;
toparent->left = a;
toparent->right = b;
toparent->depth = depth;
toparent->split_axis = best_aixs;
toparent->split_pos = best_point + bd.min[best_aixs];
}
上面实现,大部分代码在使用SAH寻找分割平面,为了使得整个kdtree分布在一块连续的内存中,并以便于后处理排序以及序列化,使用了alloc_node
自定义分配内存。
这种算法最大弊端在于每一次都要去扫描primitive来决定最终的分割面,有一种采样SAH cost的方法是在轴上均匀分布的位置计算cost,跨越分割点的primitive按照计入两边.对于给定的$(N_L,N_R,S_L,S_R)$,可以通过线性插值primitive数量和表面积来计算cost函数的二次逼近。对于变化很大的区间,亦可以通过采样固定数量的位置来提高近似质量。当primitive的数量少于指定的采样数之后,再变回基本的遍历primitive的方法来计算cost。 这种采样方法有几大优点:
- 两种算法都具有复杂度$O(nlogn)$,但采样方法可以使用simd加速来减小复杂度常数。
- 采样算法将排序等更多的工作推迟到了叶节点(也就是primitive少于指定采样数的时候),此时排序的个数变小。考虑在构造期间可能会忽略整个子树。
- 采样在数据集储存不连续(cache locality低)的时候也可以工作的比较高效,而排序在此时则相反。
缓存KD-Tree
对于复杂的场景来说,构建kdtree可能会耗费相当多的时间。然而,对于静态场景而言,一旦构造好一次kdtree,就可以把kdtree永久保存下来,在需要的时候再直接读入即可,如此则节约了大量的构造时间,对于大场景,暂存kdtree也是重要的。
如果kdtree被分配到了一块连续的内存之内,那序列化起来则会方便得多,而将节点集中分配并不是一件难事。序列化kdtree,主要要做两件事:
- 替换子节点偏移和重定位子节点地址
- 正确序列化和反序列化primitive索引
剩下的只需要把整块连续的内存直接二进制dump到本地文件就好了,读入的时候,将整块内存直接分配出来,然后重定位内存。
初步的实现如下:
class OLPBRKDTreeNodeWrite
{
public:
struct
{
u64 size;
//相对于vec_mem的偏移
u64 offset;
u64 pad;
#ifdef _DEBUG
u64 pad1;
#endif
} index;
u64 left_offset;
u64 right_offset;
RBAABB bound;
int split_axis;
float split_pos;
//debug
int depth;
int state;
//序列化根节点标记
bool root;
bool leaf;
};
struct HeadWrite
{
//总尺寸
u64 total_size;
//单个节点大小
u64 single_block_size;
//vector偏移量
u64 vector_offset;
//节点个数
u64 node_num;
};
//序列化kdtree
static void serialize(const char* filename)
{
size_t block_size = linear_mem.frame.get_alloc_size(sizeof(OLPBRKDTreeNode));
f32 tt = (f32)linear_mem.frame.get_allocated_memory() / block_size;
int block_num = (int)tt;
printf("serialize %f nodes!\n node size:%d\ntotal size:%d\n", tt, block_size, linear_mem.frame.get_allocated_memory());
size_t node_size = linear_mem.frame.get_allocated_memory();
size_t total_size = 0;
for (int k = 0; k < block_num;++k)
{
OLPBRKDTreeNode* node = ((OLPBRKDTreeNode*)&linear_mem.mf.memory_ptr[k*block_size]);
total_size += node->index.size()*sizeof(int);
}
size_t head_size = sizeof(HeadWrite);
char* mem = new char[head_size+node_size+total_size];
HeadWrite* head_mem = (HeadWrite*)mem;
char* node_mem = &mem[head_size];
char* vec_mem = &mem[node_size+head_size];
head_mem->node_num = block_num;
head_mem->single_block_size = block_size;
head_mem->total_size = head_size + node_size + total_size;
head_mem->vector_offset = node_size + head_size;
if (sizeof(OLPBRKDTreeNodeWrite) != sizeof(OLPBRKDTreeNode))
printf("write:%d~origin:%d\n", sizeof(OLPBRKDTreeNodeWrite) , sizeof(OLPBRKDTreeNode));
CHECK(sizeof(OLPBRKDTreeNodeWrite) == sizeof(OLPBRKDTreeNode));
CHECK(sizeof(OLPBRKDTreeNodeWrite) == block_size);
OLPBRKDTreeNodeWrite* cur_save_node = (OLPBRKDTreeNodeWrite*)node_mem;
size_t cur_vec_offset = 0;
for (int k = 0; k < block_num;++k)
{
OLPBRKDTreeNode* node = ((OLPBRKDTreeNode*)&linear_mem.mf.memory_ptr[k*block_size]);
memcpy(cur_save_node, node, block_size);
size_t vsize = node->index.size()*sizeof(int);
cur_save_node->index.size = vsize;
cur_save_node->index.offset = cur_vec_offset;
cur_save_node->index.pad = 0;
#ifdef _DEBUG
cur_save_node->index.pad1 = 0;
#endif
memcpy(vec_mem + cur_vec_offset, node->index.data(), vsize);
cur_vec_offset += vsize;
if (node->left)
{
cur_save_node->left_offset = (u64)((u8*)node->left - (u8*)linear_mem.mf.memory_ptr);
}
else
cur_save_node->left_offset = 0xffffffffffffffff;
if (node->right)
{
cur_save_node->right_offset = (u64)((u8*)node->right - (u8*)linear_mem.mf.memory_ptr);
}
else
cur_save_node->right_offset = 0xffffffffffffffff;
cur_save_node++;
}
std::ofstream fout(filename, std::ios::binary);
fout.write((char*)mem, head_size + node_size + total_size);
fout.close();
delete[] mem;
}
static void* deserialize(const char* filename)
{
release_frame();
std::ifstream fin(filename, std::ios::binary);
if (!fin)
{
printf("read %s failed!\n", filename);
return nullptr;
}
fin.seekg(0, fin.end);
int read_size = fin.tellg();
fin.seekg(0, fin.beg);
char* mem = new char[read_size];
fin.read(mem, read_size);
fin.close();
HeadWrite* head = (HeadWrite*)mem;
u64 node_num = head->node_num;
u64 block_size = head->single_block_size;
u64 total_szie = head->total_size;
u64 vector_off = head->vector_offset;
CHECK(total_szie==read_size);
char* vec_mem = &mem[vector_off];
OLPBRKDTreeNodeWrite* cur_read_node = (OLPBRKDTreeNodeWrite*)&mem[sizeof(HeadWrite)];
void* p = linear_mem.frame.alloc(block_size*node_num,false);
memcpy(p, cur_read_node, block_size*node_num);
OLPBRKDTreeNode* node = (OLPBRKDTreeNode*)p;
OLPBRKDTreeNode* ret = nullptr;
for (int k = 0; k < node_num;++k)
{
u32 icount = cur_read_node->index.size / sizeof(int);
u32 ioff = cur_read_node->index.offset;
new(&node->index) std::vector<int>();
node->index.resize(icount);
memcpy((node->index.data()), vec_mem + ioff, icount*sizeof(int));
ret = node->root ? node : ret;
if (cur_read_node->left_offset != 0xffffffffffffffff)
node->left = (OLPBRKDTreeNode*)((u64)cur_read_node->left_offset + (u64)p);
else
node->left = nullptr;
if (cur_read_node->right_offset != 0xffffffffffffffff)
node->right = (OLPBRKDTreeNode*)((u64)cur_read_node->right_offset + (u64)p);
else
node->right = nullptr;
cur_read_node++;
node++;
}
delete[] mem;
return ret;
}
Ray遍历KD-Tree
最简单的kdtree遍历就是以任何一种顺序遍历二叉树,不过这种方法的复杂度为$O(n)$,没什么意义。
标准的遍历是沿着光线遍历,分为3种情况:
- $t_{split}<t_{min}$ or $t_{split}>t_{max}$,仅相交一个child
- $t_{min}<t_{split}<t_{max}$,相交两个child
- 不相交
值得注意的时候,沿ray遍历找到一个交点之后不能停止,因为一个primitive可能横跨很多个bounding box,如果遇到以下情形,停止遍历将会导致错误:
可以简单地使用递归来直接实现遍历:
static bool intersection_res(const OLPBRRay& ray, const OLPBRKDTreeNode* node, OLPBRItersc* isc, const std::vector<OLPBRGeometry*>& prims)
{
if (!node)
{
return false;
}
float tmin, tmax;
if (!node->bound.intersection(ray.o, ray.d, tmin, tmax))
return false;
if (node->leaf)
{
bool ret = false;
for (int i = 0; i < node->index.size();++i)
{
ret |= prims[node->index[i]]->intersect(ray, *isc);
}
return ret;
}
int axis = node->split_axis;
RBVector3 inv_dir(1.f / ray.d.x, 1.f / ray.d.y, 1.f / ray.d.z);
OLPBRKDTreeNode* first, *second;
float tplane = (node->split_pos - ray.o[axis])*inv_dir[axis];
if (inv_dir[axis] > 0)
{
if (tplane<=0)
{
first = node->right;
second = nullptr;
}
else
{
first = node->left;
second = node->right;
}
}
else
{
if (tplane <= 0)
{
first = node->left;
second = nullptr;
}
else
{
first = node->right;
second = node->left;
}
}
//blow for debug
// if (first || second)
{
bool a = intersection_res(ray,first,isc,prims);
//如果包含了交点在aabb中才算,预防相交的交点不再本aabb中的情形
if (a)
{
RBVector3 sp = ray.o + ray.d*(isc->dist_ - isc->epsilon_);
/*
也可以看上图
| | __B_ |
| A | |~~| |
|____|___|__|___C_|
光线首先穿过Bound A与C平面相交,此时返回true,然而这个交点可能不是最近的,甚至可能根本不在Bound A中!
只要检测到的最近点一定满足:宿主object一定与A有交;一定是最近的点。
*/
if (first->bound.is_contain(sp))
return a;
}
bool b = intersection_res(ray,second, isc, prims);
return a||b;
}
//printf("kdtree intersc should not to be here!%d,%d\n",first,second);
//return false;
}
上述算法将 条件$t_{split}<t_{min}$ or $t_{split}>t_{max}$归纳为两个child都相交的情形,仅仅把$t_{split}<0$的情景记为仅相交一个child。这种归纳方案并不会有什么问题,只是相对低效一些。
另外,上述算法做了提前的遍历终结,条件为相交的最近点包含在本bounding box中:
if (a)
{
RBVector3 sp = ray.o + ray.d*(isc->dist_ - isc->epsilon_);
if (first->bound.is_contain(sp))
return a;
}
很多时候,递归由于函数自身的调用会产生很多问题,比如内存占用,函数call overhead,GPU不友好等等。
所以通常会使用迭代算法来进行遍历,对于要访问两个child的情形,使用一个栈来储存第二个child一遍第一个访问完成后继续访问。
#define MAX_TODON 64
static bool intersection(const OLPBRRay& ray, const OLPBRKDTreeNode* tree,
OLPBRItersc *isc, const std::vector<OLPBRGeometry*>& prims)
{
float tmin, tmax;
if (!tree) return false;
if (!tree->bound.intersection(ray.o, ray.d, tmin, tmax)) return false;
CHECK(tmin >= 0);
RBVector3 inv_dir(1.f / ray.d.x, 1.f / ray.d.y, 1.f / ray.d.z);
kd_todo_t todo[MAX_TODON];
int todo_pos = 0;
bool hit = false;
const OLPBRKDTreeNode*node = tree;
while (node != nullptr || todo_pos != 0)
{
if (ray.max_t < tmin) break;
//有可能某个节点是空节点,但是此时栈中依然有节点需要处理
if (!node)
{
--todo_pos;
node = todo[todo_pos].node;
tmin = todo[todo_pos].tmin;
tmax = todo[todo_pos].tmax;
}
if (!node)
continue;
if (!node->leaf)
{
int axis = node->split_axis;
float tplane = (node->split_pos - ray.o[axis])*inv_dir[axis];
const OLPBRKDTreeNode* first_child, *second_child;
int below_first = ((ray.o[axis] < node->split_pos) || (ray.o[axis] == node->split_pos&&ray.d[axis] <= 0));
if (below_first)
{
first_child = node->left;
second_child = node->right;
}
else
{
first_child = node->right;
second_child = node->left;
}
if (tplane > tmax || tplane <= 0) node = first_child;
else if (tplane < tmin) node = second_child;
else
{
todo[todo_pos].node = second_child;
todo[todo_pos].tmin = tplane;
todo[todo_pos].tmax = tmax;
++todo_pos;
node = first_child;
tmax = tplane;
}
}
else
for (auto obj_i : node->index) hit |= prims[obj_i]->intersect(ray, *isc);
if (todo_pos > 0)
{
--todo_pos;
node = todo[todo_pos].node;
tmin = todo[todo_pos].tmin;
tmax = todo[todo_pos].tmax;
}
else
break;
}
return hit;
}
基于Rope的kdtree
有一种更加GPU友好的遍历方法,使用一个“Rope”结构,使叶节点bounding box每一个面都指向与其相邻的包含所有节点的最小子树。这样就不需要一个附加的栈来记录遍历轨迹了。
使用一个后期处理可以很简单地构建“Rope”,同时使用一个优化函数,将“Rope” push到最小的子树上去。
Rope等待实现…
Packet ray遍历
当ray表现出一些公共属性时,某些计算仅需要做一次,就可以在所有ray之间共享。这通常会用在同原点的ray相交测试primitive上。这时候可以将这些点使用SIMD同时遍历kdtree,算法上基本和单个ray遍历是一样的,但是要使用mask来屏蔽那些不需要结算的ray,注意这种方案在靠近root的地方很有效,会随着越来越远离root节点越来越分散而丢失其优点,在这种方案下必须所有的ray都找到交点后才能退出。
动态场景
kdtree对于动态场景一般是重建。可以考虑使用采样SAH和binning of primitives[3] 来提升重建效率。另外还有一种“fuzzy kdtree”[4],允许primitive有限制地运动。 还有就是分层次的kdtree,首先是以primitive集为单位分离地构建bottom-level kdtree并储存变换矩阵,然后再以这些primitive集构建一个top-level kdtree。遍历的时候达到叶节点就进入bottom-level kdtree,将ray使用矩阵转换进去,然后继续遍历,但这种方案仅支持层次动画。
高效构建KDTree
BVH
解决动态场景最好的是使用BVH。【Real–time Ray Tracing of Dynamic Scenes 2.4.3】
参考文献
[1] C. Wächter and A. Keller. Instant Ray Tracing: The Bounding Interval Hierarchy. In T. Akenine-Möller and W. Heidrich, editors, Rendering Techniques 2006 (Proc. of 17th Eurographics Symposium on Rendering), pages 139–149, 2006.
[2] S. Woop, G. Marmitt, and P. Slusallek. B-KD trees for hardware accelerated ray tracing of dynamic scenes. In GH ’06: Proceedings of the 21st ACM SIGGRAPH/Eu- rographics symposium on Graphics hardware, pages 67–77. ACM, 2006.
[3] S. Popov, J. Günther, H.-P. Seidel, and P. Slusallek. Experiences with streaming con- struction of SAH KD-trees. In Proceedings of the 2006 IEEE Symposium on Interac- tive Ray Tracing, pages 89–94. IEEE, 2006.
[4] J. Günther, H. Friedrich, I. Wald, H.-P. Seidel, and P. Slusallek. Ray tracing ani- mated scenes using motion decomposition. Computer Graphics Forum, 25(3):517– 525, 2006. (Proceedings of Eurographics).