Learning from Projections

There we are. Welcome to this final part of a four-part series on learning from 3D data. After racking our brains to understand deep learning techniques in various 3D representations (point clouds, voxel grids, graphs and meshes), in this last episode we dial it back a notch, more precisely from three to two dimensions.

Why (not) to project

As always, the first question is, why should we want to do this?. Before we can answer this question, let’s first define what is meant by projection in this context1. In general, a projection is a mapping transforming something to something else. For geometric settings, i.e. when talking about mappings from one dimension to another, this is often referred to as projecting data from one representation to another. Depending on your background, you might think about dimensionality reduction techniques like PCA, projecting the data from a high to a lower dimensional space, neural networks, projecting (relatively) low dimensional inputs like images into higher dimensional feature space or photography, projecting our three dimensional perception of the world into the two dimensional image plane.

Coincidentally, this last example, photography, is the basis for many projection based learning algorithms in this article. To understand why, simply think about the history of computer vision. For the longest time, this field was confined to two dimensions, as the only commodity sensor capturing visual information was the camera. Only in recent years have 3D scanning devices become more affordable and commonplace due to applications in augmented reality (gaming consoles) and autonomous driving. Besides, there is another reason why it feels natural to use two dimensional data, because its what humans to by default. Yes, we have two eyes, so there is some stereo and thus 3D processing going on, but even if you close one eye, you can still understand your surrounding perfectly well, even though you only work with 2D projections of 3D objects onto your retina.