Through the eyes of a machine

One thing most people from my generation probably have in common is that we’ve all had an elder give us a lecture about how “things were much harder back in my day”, and you know what? They’re probably right.

I blame the way technology has evolved for that. Seriously, machines are able to do so much these days that it’s actually hard to keep up with! They can listen, talk and even think for themselves. And it’s not like you need to break into Area 51 to get access to this technology, it’s actually present in most modern smartphones!

One thing humans have been trying to do for years is to give machines the ability to see, which has been quite a challenging task since, well, machines don’t actually have eyes. Nonetheless, what does it mean to “see”? Some would argue that it’s the ability to extract information from the real world simply by looking at it. Machines might not have eyes, but they do have access to endless amounts of videos and images, so all that’s left is to actually interpret that information.

Computer Vision

There is actually a scientific field that focuses on this issue, and it is called “Computer Vision”. I had the opportunity to work on this topic for my master’s degree and I ended up really liking what I learned.

I developed a mobile solution that would extract the information written on presence sheets, completely eliminating the need for manually inserting the information on a laptop.

I won’t really go into detail on how I programmed my solution, but I will describe the different steps that make up its process:

  1. Data acquisition – the user will use a mobile app to capture an image of the presence sheet
  2. Table extraction – the table in the image will be isolated, removing all of the surrounding redundancy
  3. Image segmentation – the now isolated table will have its cells separated and grouped logically
  4. Name identification – the names of the individuals on the presence sheet are extracted, giving an identity to each group of cells
  5. Presence identification – the signatures of each individual are interpreted and registered
  6. Returning the results – the original user will receive a list of users with the information related to the presences of each individual in the presence sheet

With solutions like mine, it’s possible to make our everyday lives easier. Computer vision is being used for things like recognizing our faces in photos, understanding our emotions and making our cars drive themselves. It’s truly amazing the way technology has evolved.

So yes, things are probably easier for us these days. And although I have access to every recipe in the world, don’t worry grandma, google will never replace you.

This article was written by Rodrigo Sá Pessoa, Software Engineer @ md3.

Rodrigo is a developer for MD3 who focuses mostly on the mobile aspects of our projects. He recently became a master in computer engineering and developed a passion for hybrid development and computer vision. He considers himself a creative person and spends his free time writing, drawing, playing music and learning new things.