BEDA: de-jargoning my PhD

If you missed it, I am blogging every day in April now.

My PhD title is “Online video annotation and metadata browsing.” The first thing you need to understand when you start a PhD is that no-one really knows what you’re doing or trying to achieve. That title is a research direction, a topic you might get at the top of a GCSE Religious Studies exam followed by the word “…Discuss.” Except I’m going to be “discussing” it for the next three and a half years of my life. Let’s break the title down word-for-word:

“online” – This means that my research needs to use machine learning techniques to learn and improve as it is used over time. Think of something like predictive text that “learns” the words and phrases you type on your phone most often, and over time it will be able to suggest more personalized word results based on what it thinks you are trying to type or will type next. How this applies to video will become clear in a moment. I hope.

“video” – Now this isn’t referring to camera phone video, or videos you see on YouTube, it’s talking about broadcast video. My PhD work is being part-funded by Sony Research in Basingstoke, a department that work on researching new concepts to help professional film-makers/broadcasters and anyone else who uses Sony’s professional camera equipment for their work. In this case “video” refers to the stuff a cameraman records with their Sony camera. The difference between this stuff and the videos you think about on YouTube is the quality, both in terms of the picture’s HD resolution, but also the shot composition, lighting, etc.

“annotation” – Information about what the video is showing, or its relevance to the production. Think about the way commentators annotate a football match with circles and arrows to show what the player was doing. You might be interested in whether the camera is on a tripod or not, or if it’s panning/zooming. You might want to know what objects (people, cars, buildings, trees, etc) are in the frame, and if there on the left or right of the screen, and if they’re moving or not, if they’re the focus of the shot or if they’re just background noise. You might want to know who was talking, and what they were saying. Maybe you want to know the GPS co-ordinates of where it was shot, or if it was indoors or outdoors, day or night, the scene in the script it refers to. You get the idea.

The research is about exploring Computer Vision methods that can automate the annotation of broadcasting video, and learn from what it’s doing so that it’s better in the future. Sounds vague? That’s the point.

So what about the “metadata browsing” bit? Well that is about what happens after you have a computer system that can annotate all this footage for you, how do you then use those annotations to find what you are looking for. Instead of having a load of video clips in a folder on your computer that have really unhelpful names and useless thumbnails, maybe you can use these annotations to show clips in a more intuitive way to make finding what you want easier.

So, yea. That’s my bread and butter for the next three years. I’d like to show you what I’ve been doing for the last six months, but unfortunately most of it is confidential :(

Until tomorrow, take care x