The second post about my final year Electronic Engineering project to identify landmarks using photographs taken from a mobile phone in seconds.
Imagine you are walking around London and you come across a building that you want to know more about. You whip out your mobile phone and open up the ELROND app. By taking a picture of the building you want to know more about, ELROND returns relevant information in just a few seconds. The name and purpose of the facility, weather it is open to the public, the opening times, the phone number of reception, a history, appropriate web links… wouldn’t that be useful? Well that’s just one potential application of my project.
It could equally be used to display information about portraits in a gallery just by holding your smartphone in front of a painting, with relevant information overlayed on the screen in real-time…
Or just as an alternative to GPS to locate you when GPS is not available… such as indoors…
My project, titled Elrond, aims to provide the backbone or infrastructure to enable such apps to be written much quicker. But how does it work? how does that picture of a building turn into information?
The diagram below gives an overview of the process.
- Once the Android application has extracted the features from the image of the building using a feature detection algorithm (more on that in a later post) the extracted information is packaged into an XML format and transmitted over a data network (3G or Wi-Fi) to a Linux web server.
- The web server then parses the XML file into a format it can understand.
- Each extracted feature is then compared to all known features by Elrond, hence gathering a shortlist of which buildings this is most likely to be a picture of. Because the number of known features is likely to be in the order of millions, a neat way of searching the set needs to be used, called a KD-Tree.
- The searching returns a shortlist of images that most probably match the query image. The items at the top of the shortlist are the most likely, so the top 100 results are looked into in more detail. Elrond will look at all the features in both images, find ones that match, and store their locations. Then a homograph is calculated to see if there is a way to map the features from the query image to the stored image. A homograph is a matrix that describes the best way to map two sets of points to each other. If the homograph can map lots of points between the images, then that image is given a high match score.
- After all the homographs have been calculated, the best matching image can be determined; or no match is found, in which case Elrond cannot return any information. Assuming a reasonable match was found, Elrond now knows what the building is! Relevant information about that building is maintained on a database, and so it can be fetched.
- The database information is packaged into another XML file and sent to the mobile device.
- The mobile device interprets the XML and graphically displays the information on the screen. Voilà!
I know that some people are reading this gormless, but for those that are interested to know more I want to write more posts about how specific parts of the application work and perform. So if you have any thoughts and suggestions let me know.
The next post will be about less techy things…