I was sitting at my desk the other day and the question popped into my head while looking at two coffee cups on my desk: “How does my brain recognize that both of those are coffee cups? They are different in size, shape, and color. Interesting…”
I’ve programmed quite a bit so I know how difficult that problem could be. You couldn’t literally record every image of a coffee cup and say “if new image matches old image, coffee cup, otherwise ???”. That would be rediculously inefficient. So I thought about how my brain would do it. Bring up an image of a coffee cup in your head and what can you tell me about it? If you’re like me, the answer is not much. I could think of the general shape, general color, and that’s about it. If it had a handle and was ceramic, definitely coffee cup.
Therein lie two key observations that could be translated into code. I recognized shape which would be a set of dimensions or better yet, ratios if we are not concerned with size. Even at that though, “general” implied that the ratios were very rough. They were not exact by any means. To me, a lack of exactness equates to fewer computational resources required. Neat.
The next question I presented myself with was “how do I know that is a coffee cup and not a computer monitor?”. Well, for one, the dimension ratios do not match up and aren’t anywhere near each other. They aren’t similar in shape at all. I can surmise then that the two aren’t similar objects. “What if I had never seen a coffee cup before?”. Well, if I had seen a cup before, I’d be in OK shape. I would know the ratios for a cup, and since the ratios for a coffee cup are similar, I could say, well it’s “like a cup”. Pattern recognization. I could really set up a sensor in a room and start scanning things and learning. I could even go off just pictures. The terms coffee cup and cup are just packets of meaning assigned to those objects that it would learn. See my post on mathematics of language for more on that.
The last and final question I asked before I got involved in something else was: “How can I tell there is something here in the first place?”. I can see color and can distinguish lines. I can see the lines that outline the coffee cup and see the color of it. Since I can see the lines, there must be a different object there. That’s how I know it’s there; the lines. Not so fast. My eye cannot detect lines. I know it can detect color though. Lines must be in the software. After thinking about it, I surmised that lines are actually just sharp differences in color. My way to explain this would be to think of a person dressed in green in front of a green screen. You can barely distinguish them and the only way you can is due to the light reflecting differently around the edges of their body, such that the light you see in thise areas is slightly off tint from the rest of their suit.
Now I need to go get a cup of tea.