a digital scan of a 35mm film image of a processing sketch running on an LCD
Skip to Content

Research on Relative Motion Tracking

Project Context
My thesis project has undergone a major shift in the last week. I’m moving away from the post-apocalyptic pirate internet, and towards something completely different: A means of projecting content onto surfaces that makes the projection appear intrinsic to the surface.

Imagine a hand-held projector that you can sweep across a room, kind of like a flash light. As it moves, the projected content appears stuck to the wall, the floor, etc. For example, you could add something to the scene in a particular location — a bit of text, perhaps. After adding the text, you could sweep the projector to a different part of the wall. The text would appear to go out of view once it left the throw-area of the projector, but if you were to move the projector back towards the spot where you initially added the text, you would see the words come back into view. The words are stuck to their environment — the projection is just an incidental way of exploring the space and revealing its content. Two recent technologies make this a particularly ripe time for this project: The Kinect gives cheap 3D scene information, which can improve the quality of motion tracking and automate the projection mapping process. New pico-projectors that can run on battery power and weigh significantly less than their conference-table counterparts mean that carrying around and a projector and using it to explore a space is no longer an entirely ridiculous proposition. This whole idea, which I’m currently calling Thesis II (for personal reasons) will be written up in more detail soon.

Fronts of Inquiry
The creative challenge for the next twelve weeks is to conceive of and build an application that demonstrates the usefulness and creative possibilities of this tool.

The technical challenges are twofold. First, I need a way to track the relative motion between the projector and the projection surface (generally a wall) — I’ll refer to this as relative motion tracking. Second, I need a way to dynamically distort the projected image to match the geometry of the projection surface. This is similar in concept to projection mapping, except the projection surface isn’t static. I’ll call this dynamic projection mapping. The calculations for both of these steps need to happen in less than 20 milliseconds if the effect is going to work and feel fluid.

Other people are already working on dynamic projection mapping, and from a technical standpoint it’s both more familiar ground and less essential to the final project than relative motion tracking. Where projection mapping is “nice to have” and will contribute significantly to the quality of the project, the technology that the project depends on to work at all is dynamic motion tracking. So, this paper will focus on research into means of relative motion tracking, and which (if any) existing open-source projects could be adapted for this application.

Similar Projects
At the most basic level, I need to find a way to take a camera feed and determine how content in the scene is moving. Traditionally, this is called camera tracking — a form of deriving structure from motion. The process goes something like this: First the software identifies feature points within each frame — these are generally areas of high contrast, which relatively easy to pick out algorithmically. On the next frame, the software finds another batch of feature points, and then does correspondence analysis between these feature points in the most recent frame and feature points in the last frame. From this information, the movement of the camera can be inferred. (e.g. if a feature point is at pixel [5, 100] in frame one, and then moves to pixel [10, 80] in frame two, we can guess that the camera shifted about [5, -20] between frames. It’s a bit more complicated than that, because of the parallax effect — points closer to the camera will appear to move more than points further away from the camera. The software can take this into account, and build a rough point cloud of the scene.

This process has applications in special effects and film / post-production. If you have a shot with a lot of camera movement, and you need to add an explosion to the scene, camera tracking gives exactly the information you need to position the explosion in a believable way from frame to frame. Because of this demand, there are a few über-expensive closed-source software packages designed to perform camera tracking reliably. Boujou, for example, sets you back about $10,000. There is, however, a free and open-source option called PTAM — Parallel Tracking and Mapping for Small AR Workspaces which can perform similar tracking.

Caveats
The PTAM code seems like the right starting point for my own adaptation of this concept, but there are a few caveats that make me nervous about just how much of a head start the code will give me. First, PTAM and similar camera tracking software is designed for use on high-contrast two-dimensional RGB bitmaps — basic still film frames. In contrast, the grayscale depth map coming from the Kinect is relatively low contrast, and areas of high contrast are probably best avoided in the feature detection process, since they represent noisy edges between depths. I probably will not be able to use the Kinect’s RGB data, because it’s going to be filled with artifacts from the projection. Also, since the Kinect already gives us a point cloud, I don’t need any of the depth-calculation features from PTAM. Because of these issues, I will probably start work by skimming through the PTAM source code to get an idea of their approach to the implementation, and then seeing how PTAM behaves when fed the grayscale depth map from a Kinect. From there, I will probably start experiment a simpler feature extraction and tracking algorithms in Processing that make the most of the Kinect’s depth data. (This code would be destined for an eventual port to C++.)

February 9 2011 at 12 PM

Dude, honestly, download the new black eyed peas music video to an iphone and play with it (totally worthless band/music, but im serious, check the tech in the app). I think using the ipods accelerometer could achieve your goal here. I'm picturing some kind of unit with a computer, iphone, and projector, all attached. info here: http://itunes.apple.com/us/app/bep360/id410003781?mt=8

February 17 2011 at 9 PM

Julio:

Eric, this project sounds awesome. If anyone can pull it off, it's got be you. Can't wait to see how your project evolves.

March 7 2011 at 11 PM

Eric Mika:

@Shawn: Thanks for pointing me to the app, I hadn’t seen it. Very cool. Unfortunately an accelerometer or even a gyro alone is not going to give me enough info, since I need a depth map to figure out how to correct the projection. However, it’s likely that the accelerator inside the Kinect can be put to use for improving the accuracy of whatever feature tracking / SLAM algorithm I end up using.

That said, it’s probably a matter of time (a decade or so?) until our phones have 3D sensor / camera combos similar to what we have today with the Kinect. And yes, as far as form factor I’m imagining a Kinect + pico projector combo, plus something to process the camera data and provide the video source. (And for now, that’s going to have to be a laptop.) At this point, I’m more interested in getting the proof of concept working than worrying about practical packaging / form factors.

@Julio: Thanks! Let me know what you’re working on some time.

March 8 2011 at 1 AM

Using the Kinect camera you can make this implementation really nice.

Make sure to check this prior work, they did a lot of good AR interaction methods considering the technology they were using:
http://www.youtube.com/watch?v=8-AJnLMzE0k

April 3 2011 at 9 PM

Eric Mika:

Wow thanks Toby. I hadn’t seen that, it’s basically the same idea with a different hardware / software implementation. Hmm. From five years ago. Hmmm…

This discovery almost destroys my thesis’ raison d’être, I certainly thought I had something original, but I guess I’ll need to emphasize a few planned differences to preserve my sanity:

  • Removal of external hardware dependencies. Looks like they used an external camera + markers for tracking. My plan plan is to use the Kinect + SLAM / ICP algorithms interleaved with positioning inferences from an IMU to keep thing things going in real time. (The point cloud alignment algorithms are extremely heavy.)

  • Surface ingestion… e.g. using the Kinect’s RGB camera to grab chunks of the environment for processing / re-projection. (The “real world photoshop” application.)

  • Portability. My entire design is one unit slung over a shoulder. Can’t quite make it wireless yet (pico projectors are too dim), but I’m trying to get down to a single power cord. It won’t be as compact as Toronto’s hand-held projector since the Kinect is pretty huge, and I need a powerful laptop to run the mapping algorithms. But removal of external hardware dependencies means the system can work anywhere, without prior calibration / mapping. Even outside. (As long as it’s nighttime, at least, on account of the IR problem.)

  • Location tagging / persistent content. Again, because of portability and the relative lack of calibration required for extracting positioning info from a point cloud history, it would be possible to save layers of projected content that are linked a specific location and could be edited / overwritten on subsequent visits to the space.

  • Open Source. Commodity hardware + public code.

  • Platform. My hope is to expose a layer to developers / creative coders, probably through an openFrameworks addon, or maybe with a stand-alone app designed to function in a Syphon pipeline. I hope this can make building apps on top of this interaction model a creative process rather than a technical one.

April 3 2011 at 10 PM

vasilis:

http://www.youtube.com/watch?v=Jd3-eiid-Uw

Have you considered using johnnys headtracking ideas and code but instead having the infrared beams moving and the wiimote steady you could have the wiimote on the moving projector and the infrared leds on specific spots on a room.that way you can map the space efficiently and fast enough.

As for a way to distort the image depending on the spot you are projecting from,if image is 3d ,for example 3d letters using a similar way as johnny did you could have that effect

May 17 2011 at 2 AM

Julian:

Hey Eric,

what were your results of trying to get PTAM to run on the Kinect? Did you find any reason why it shouldn't work? I'm thinking about giving it a try.

Cheers, Julian

August 1 2012 at 8 AM

Add Your Comment