Research on Relative Motion Tracking

Project Context

My thesis project has undergone a major shift in the last week. I’m moving away from the post-apocalyptic pirate internet, and towards something completely different: A means of projecting content onto surfaces that makes the projection appear intrinsic to the surface.

Imagine a hand-held projector that you can sweep across a room, kind of like a flash light. As it moves, the projected content appears stuck to the wall, the floor, etc. For example, you could add something to the scene in a particular location — a bit of text, perhaps.

After adding the text, you could sweep the projector to a different part of the wall. The text would appear to go out of view once it left the throw-area of the projector, but if you were to move the projector back towards the spot where you initially added the text, you would see the words come back into view. The words are stuck to their environment — the projection is just an incidental way of exploring the space and revealing its content. Two recent technologies make this a particularly ripe time for this project: The Kinect gives cheap 3D scene information, which can improve the quality of motion tracking and automate the projection mapping process. New pico-projectors that can run on battery power and weigh significantly less than their conference-table counterparts mean that carrying around and a projector and using it to explore a space is no longer an entirely ridiculous proposition. This whole idea, which I’m currently calling Thesis II (for personal reasons) will be written up in more detail soon.

Fronts of Inquiry

The creative challenge for the next twelve weeks is to conceive of and build an application that demonstrates the usefulness and creative possibilities of this tool.

The technical challenges are twofold. First, I need a way to track the relative motion between the projector and the projection surface (generally a wall) — I’ll refer to this as relative motion tracking. Second, I need a way to dynamically distort the projected image to match the geometry of the projection surface. This is similar in concept to projection mapping, except the projection surface isn’t static. I’ll call this dynamic projection mapping. The calculations for both of these steps need to happen in less than 20 milliseconds if the effect is going to work and feel fluid.

Other people are already working on dynamic projection mapping, and from a technical standpoint it’s both more familiar ground and less essential to the final project than relative motion tracking. Where projection mapping is “nice to have” and will contribute significantly to the quality of the project, the technology that the project depends on to work at all is dynamic motion tracking. So, this paper will focus on research into means of relative motion tracking, and which (if any) existing open-source projects could be adapted for this application.

Similar Projects

At the most basic level, I need to find a way to take a camera feed and determine how content in the scene is moving. Traditionally, this is called camera tracking — a form of deriving structure from motion. The process goes something like this: First the software identifies feature points within each frame — these are generally areas of high contrast, which relatively easy to pick out algorithmically. On the next frame, the software finds another batch of feature points, and then does correspondence analysis between these feature points in the most recent frame and feature points in the last frame. From this information, the movement of the camera can be inferred. (e.g. if a feature point is at pixel [5, 100] in frame one, and then moves to pixel [10, 80] in frame two, we can guess that the camera shifted about [5, -20] between frames. It’s a bit more complicated than that, because of the parallax effect — points closer to the camera will appear to move more than points further away from the camera. The software can take this into account, and build a rough point cloud of the scene.

This process has applications in special effects and film / post-production. If you have a shot with a lot of camera movement, and you need to add an explosion to the scene, camera tracking gives exactly the information you need to position the explosion in a believable way from frame to frame. Because of this demand, there are a few über-expensive closed-source software packages designed to perform camera tracking reliably. Boujou, for example, sets you back about $10,000. There is, however, a free and open-source option called PTAM — Parallel Tracking and Mapping for Small AR Workspaces which can perform similar tracking.

Caveats

The PTAM code seems like the right starting point for my own adaptation of this concept, but there are a few caveats that make me nervous about just how much of a head start the code will give me. First, PTAM and similar camera tracking software is designed for use on high-contrast two-dimensional RGB bitmaps — basic still film frames. In contrast, the grayscale depth map coming from the Kinect is relatively low contrast, and areas of high contrast are probably best avoided in the feature detection process, since they represent noisy edges between depths. I probably will not be able to use the Kinect’s RGB data, because it’s going to be filled with artifacts from the projection. Also, since the Kinect already gives us a point cloud, I don’t need any of the depth-calculation features from PTAM. Because of these issues, I will probably start work by skimming through the PTAM source code to get an idea of their approach to the implementation, and then seeing how PTAM behaves when fed the grayscale depth map from a Kinect. From there, I will probably start experiment a simpler feature extraction and tracking algorithms in Processing that make the most of the Kinect’s depth data. (This code would be destined for an eventual port to C++.)