The Technology behind Hyperlapse from Instagram

Instagram, hyperlapse, Instagram engineering, video stabilization,

Yesterday we released Hyperlapse from Instagram—a new app that lets you capture and share moving time lapse videos. Time lapse photography is a technique in which frames are played back at a much faster rate than that at which they’re captured. This allows you to experience a sunset in 15 seconds or see fog roll over hills like a stream of water flowing over rocks. Time lapses are mesmerizing to watch because they reveal patterns and motions in our daily lives that are otherwise invisible.

Hyperlapses are a special kind of time lapse where the camera is also moving. Capturing hyperlapses has traditionally been a laborious process that involves meticulous planning, a variety of camera mounts and professional video editing software. With Hyperlapse, our goal was to simplify this process. We landed on a single record button and a post-capture screen where you select the playback rate. To achieve fluid camera motion we incorporated a video stabilization algorithm called Cinema (which is already used in Video on Instagram) into Hyperlapse.

In this post, we’ll describe our stabilization algorithm and the engineering challenges that we encountered while trying to distill the complex process of moving time lapse photography into a simple and interactive user interface.

Cinema Stabilization

Video stabilization is instrumental in capturing beautiful fluid videos. In the movie industry, this is achieved by having the camera operator wear a harness that separates the motion of the camera from the motion of the operator’s body. Since we can’t expect Instagrammers to wear a body harness to capture the world’s moments, we instead developed Cinema, which uses the phone’s built-in gyroscope to measure and remove unwanted hand shake.

The diagram below shows the pipeline of the Cinema stabilization algorithm. We feed gyroscope samples and frames into the stabilizer and obtain a new set of camera orientations as output. These camera orientations correspond to a smooth “synthetic” camera motion with all the unwanted kinks and bumps removed.

These orientations are then fed into our video filtering pipeline shown below. Each input frame is then changed by the IGStabilizationFilter according to the desired synthetic camera orientation.

The video below shows how the Cinema algorithm changes the frames to counteract camera shake. The region inside the white outline is the visible area in the output video. Notice that the edges of the warped frames never cross the white outline. That’s because our stabilization algorithm computes the smoothest camera motion possible while also ensuring that a frame is never changed such that regions outside the frame become visible in the final video. Notice also that this means that we need to crop or zoom in in order to have a buffer around the visible area. This buffer allows us to move the frame to counteract handshake without introducing empty regions into the output video. More on this later.

The orientations are computed offline, while the stabilization filter is applied on the fly at 30 fps during video playback. We incorporated the filtering pipeline, called FilterKit, from Instagram, where we use it for all photo and video processing. FilterKit is built on top of OpenGL and is optimized for real-time performance. Most notably, FilterKit is the engine that drives our recently launched creative tools.

Hyperlapse Stabilization

In Hyperlapse, you can drag a slider to select the time lapse level after you’ve recorded a video. A time lapse level of 6x corresponds to picking every 6th frame in the input video and playing those frames back at 30 fps. The result is a video that is 6 times faster than the original.

We modified the Cinema algorithm to compute orientations only for the frames we keep. This means that the empty region constraint is only enforced for those frames. As a result, we are able to output a smooth camera motion even when the unstabilized input video becomes increasingly shaky at higher time lapse amounts. See the video below for an illustration.

Adaptive Zoom

As previously noted we need to zoom in to give ourselves room to counteract handshake without introducing empty regions into the output video (i.e. regions outside the input frame for which there is no pixel data). All digital video stabilization algorithms trade resolution for stability. However, Cinema picks the zoom intelligently based on the amount of shake in the recorded video. See the videos below for an illustration.

The video on the left has only a small amount of handshake because it was captured while standing still. In this case, we only zoom in slightly because we do not need a lot of room to counteract the small amount of camera shake. The video on the right was captured while walking. As a result, the camera is a lot more shaky. We zoom in more to give ourselves enough room to smooth out even the larger bumps. Since zooming in reduces the field of view, there is a tradeoff between effective resolution and the smoothness of the camera motion. Our adaptive zoom algorithm is fine-tuned to minimize camera shake while maximizing the effective resolution on a per-video basis. Since motion, such as a slow pan, becomes more rapid at higher time lapse levels (i.e. 12x), we compute the optimal zoom at each speedup factor.

Putting It All Together

“The first 90 percent of the code accounts for the first 90 percent of the development time. The remaining 10 percent of the code accounts for the other 90 percent of the development time.” –Tom Cargill, Bell Labs

Very early on in the development process of Hyperlapse, we decided that we wanted an interactive slider for selecting the level of time lapse. We wanted to provide instant feedback that encouraged experimentation and felt effortless, even when complex calculations were being performed under the hood. Every time you move the slider, we perform the following operations:

We request frames from the decoder at the new playback rate
We simultaneously kick off the Cinema stabilizer on a background thread to compute a new optimal zoom and a new set of orientations for the new zoom and time lapse amount.
We continue to play the video while we wait for new stabilization data to come in. We use the orientations we computed at the previous time lapse amount along with spherical interpolation to output orientations for the frames we’re going to display.
Once the new orientations come in from the stabilizer, we atomically swap them out with the old set of orientations.

We perform the above steps every time you scrub the slider without interrupting video playback or stalling the UI. The end result is an app that feels light and responsive. We can’t wait to see the creativity that Hyperlapse unlocks for our community now that you can capture a hyperlapse with the tap of a button.

By Alex Karpenko