Bouncing Heads

Roberto Guido

23 Feb 2015 • 3 min read

I've been asked to find a solution to programmatically render a video with an image in overlay (to be true: lots of videos, with only the image changing), moving around accordly a predefined path. Challenge accepted!

The first idea has been to programmatically generate all the overlaying images frame by frame with ImageMagick, render them into a video, and mix all together with ffmpeg. Of course, this has been dropped after 5 minutes.

The second idea has been to use Processing. I've never used that, but I know his fame to be a popular programming environment for designers and artists so I figured out it could be possible to do what required with little effort. In fact after some googling I've been able to glue the following script, doing what required: move a static image on top of a video:

// Sources:
// https://processing.org/discourse/beta/num_1197673835.html

import processing.video.*;

Movie myMovie;
PImage myPhoto;
float posX;
float posY;
int directionX;
int directionY;

void setup()
{
	size(640, 480);
	myMovie = new Movie(this, "/tmp/happy.flv");
	myPhoto = loadImage("/tmp/leonida.png");
	posX = 0;
	directionX = 0;
	posY = 0;
	directionY = 0;
	myMovie.loop();
}

void movieEvent(Movie myMovie)
{
	myMovie.read();
}

void draw()
{
	image(myMovie, 0,0, width, height);
	if (posX >= width - 100)
		directionX = 1;
	else if (posX <= 0)
		directionX = 0;

	if (directionX == 0)
		posX += 1;
	else
		posX -= 1;

	if (posY >= height - 100)
		directionY = 1;
	else if (posY <= 0)
		directionY = 0;

	if (directionY == 0)
		posY += 1;
	else
		posY -= 1;

	image(myPhoto, posX, posY, 100, 100);
}

Fair enough, but the only way I've found to render it again in a new video has been... use the saveFrame() function, generate all the frames, and mix all together with ffmpeg. Not a viable solution, due the long time required to perform all the actions.

But this experiment permitted to light up a possible effective solution: unpack the video in his single frame, stick the image at the expected coordinates, and render again the output video at the same time.

I ended up discovering the Python binding for OpenCV, famous framework for advanced video and image manipulation, and again googling and attaching pieces together I've obtained a (almost) good script:

# Mostly inspired by:
# http://stackoverflow.com/questions/18954889/how-to-process-images-of-a-video-frame-by-frame-in-video-streaming-using-opencv
# http://docs.opencv.org/trunk/doc/py_tutorials/py_core/py_image_arithmetics/py_image_arithmetics.html

import cv2

cap = cv2.VideoCapture("/tmp/happy.flv")
while not cap.isOpened():
	cap = cv2.VideoCapture("/tmp/happy.flv")
	cv2.waitKey(1000)
	print "Wait for the header"

image = cv2.imread('/tmp/leonida.png')
image2 = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
ret, mask = cv2.threshold(image2, 0, 255, 0)
mask_inv = cv2.bitwise_not(mask)
img2_fg = cv2.bitwise_and(image, image, mask = mask)

fourcc = cv2.cv.CV_FOURCC(*"XVID")
out = cv2.VideoWriter('/tmp/output.avi', fourcc, 24, (426, 240))

posX = posY = directionX = directionY = 0

pos_frame = cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES)
while True:
	flag, frame = cap.read()
	if flag:
		if posX >= (frame.shape[1] - image.shape[1] - 1):
			directionX = 1
		elif posX <= 0:
			directionX = 0

		if posY >= (frame.shape[0] - image.shape[0] - 1):
			directionY = 1
		elif posY <= 0:
			directionY = 0
		
		if directionX == 0:
			posX += 1
		else:
			posX -= 1
		
		if directionY == 0:
			posY += 1
		else:
			posY -= 1

		roi = frame[posY:posY + image.shape[0], posX:posX + image.shape[1]]
		img1_bg = cv2.bitwise_and(roi, roi, mask = mask_inv)
		dst = cv2.add(img1_bg, img2_fg)
		frame[posY:posY + image.shape[0], posX:posX + image.shape[1]] = dst

		# This is to view the video in realtime
		# cv2.imshow('video', frame)
		out.write(frame)
	else:
		cap.set(cv2.cv.CV_CAP_PROP_POS_FRAMES, pos_frame-1)

	if cv2.waitKey(10) == 27:
		break
	if cap.get(cv2.cv.CV_CAP_PROP_POS_FRAMES) == cap.get(cv2.cv.CV_CAP_PROP_FRAME_COUNT):
		break

cap.release()
out.release()

(Sorry to be so little pythonic, I will never quit the old-school C style...)

The result of mixing up the first 30 seconds of a very popular videoclip and an angry face is that... now also Leonida is quite happy!

Still there are big issues, mostly due the fact OpenCV do not supports alpha channels and the final rendering is not that good. But this small program gets 9.5 seconds to render the full 30 seconds video, and despite the fact there is room for improvement on the final result still seems a good path to follow.