This project will allow people in a videoconference to easily focus on whoever is speaking or presenting. We aim to accomplish this by locating the current speaker using face recognition and audio trilateration, and adjusting the video camera towards the speaker using motors.

With many people in a videoconference, it is often difficult to determine who is currently speaking. The few existing solutions are expensive and have minimal customization. We plan on making the camera system compact enough to allow for easy transportation and set up in any room of the users’ choice.

  • Can identify faces using a video camera an a face detection algorithm
  • Can approximately locate someone speaking around the device in 2-D space, relative to the device
  • Rotate to the person speaking and focus in on them using facial recognition software
  • Allow 360 degree pan and 20 degree tilt range of motion
  • Ability to work with generic video chat software and video camera
  • Inexpensive
  • Portable
  • Responsive

Over-The-Monitor Webcams

Logitech HD Pro Webcam C920

Over-the-monitor webcams are generally meant for individual video conferencing, and usually are attached to the monitor at a fixed angle. Swivel will be able to include multiple people in the field of vision by autorotating to focus on whoever is speaking.

Telepresence Cameras

Sony EVIHD1 10x High Definition Color Pan/Tilt/Zoom Camera

Telepresence Cameras are designed for medium to large sized conference rooms, and allow remote control of pan/tilt angle and zoom on individual people. Swivel will be able to focus on people automatically, and at a cheaper price than the average telepresence camera product.

Auto Pan/Tilt Cameras

Logitech QuickCam Orbit AF

The Logitech QuickCam Orbit AF automatically rotates and adjusts it's height using face detection to track a person as he or she moves around. Swivel will not only use face tracking, but also audio tracking to focus on stationary people seated around a conference table.

Auto Pan/Tilt Base for Mobile Devices

Motrr Galileo

The Motrr Galileo allows users to mount an iPhone and use an iOS app on the iPhone to track people using face detection, motion detection, or color detection. Swivel adds yet another dimension of tracking by using audio to track whoever is speaking. Additionally, no app need to be installed on the host device, Swivel will handle the tracking itself, regardless of the host camera.

Google Hangout

Videoconference in Google Hangout

Google Hangout analyzes the audio of each speaker to detect whoever is speaking, and maximizes the speaker's video feed for everyone else int he conference. Swivel will allow users a similar feature, but for multiple individuals in one room, rather than remote individuals with their own cameras. Additionally Swivel will be abe to track the current speaker as he or she moves around the room.

Hardware

Item Description Est. Cost
Camera Link $38.84
Servo Motor Link $9.69
Stepper Motor Link $14.95
Unidirectional Microphone (x4) Link $10
Udoo Link $99-135
MIPI 5MP AF Camera Link $38.84
Circuit Board (Our Design) $50
Casing (Our Design) $100

Software

Protocols

  • UART (Serial)

Week of 3/16/15:

Installed and setup the Udoo

Week of 3/23/15:

Wrote a program which uses facial recognition to find faces and turn two servo motors to follow them.

Week of 3/30/15:

Setup a basic stepper motor circuit which can turn to specific angles, using the quickest path.

Week of 4/6/15:

Setup microphone circuits, wrote a program to display recorded amplitudes for debugging purposes.

Week of 4/13/15:

We first wrote a program to compare the amplitudes of each microphone, and if the average volume went over a certain threshold, the stepper motor would turn the camera to the micrphone with the largest amplitude.

We realized that the program was too sensitive to random noises in the background, so we modified the program to sample the microphone values every 50ms, and average the samples over 1.5s.

Finally, we updated the stepper motor code to limit how far the motor can turn in either direction, to prevent wires from wrapping around the motor and getting tangled.

Week of 4/20/15

Until this point, we had two separate programs, one which controlled the motors based on input from a facetracking algorithm, and one which turned the base motor to a certain direction based on audio input. This week, we successfully merged the two programs and ran it on the Udoo. We now can listen for a speaker, turn the camera towards the speaker, then follow the speaker using the facetracking. See the video below.

Week of 4/27/15

We first improved out signal processing algorithm for turning the webcam to the direction of a different person speaking. We now continuously calibrate the microphones by keeping track of the baseline input amplitude, and comparing the ratios between the current input value and the baseline value. This provides more accurate readings, and amplifies the differences between the microphones.

We have also created a CAD drawing for an enclosure for all our parts, with holes for the microphones, cable ports, and the motor. We are currently in the process of acquiring access to a 3D printer to print this enclosure.

Week of 5/4/15

Until now, we have been developing two separate microcontroller programs: one which moves motors according to the facetracking algorithm, and another which turns the camera based on the microphone sensor input. After some final tweaking of each program, we successfully merged the two into one program. The system will now continously move the camera up and down depending on the location of the face, and when it picks up a new speaker, it will pause the facetracking, turn to the new speaker, and look for a new face to track.

Project Proposal Presentation 2/18 Slides

Kedar "Kederps" Amladi

If Kedar was in the Wu-Tang Clan, he would be called "Lazy-Assed Assassin"

Nate "Frog" Appleson

Nate is the team record holder for most hugs.

Francisco "Franny" Delgado

Franny can cook 2-Minute Noodles in 1:58

Ben "Beanie" Siegel

BEANIE IS THE ONE WHO KNOCKS