Photo by Ramiz Dedaković on Unsplash
Nina: Nana, I’m curious ...
Nana: Hmph, when are you not?
Nina: You make it sound like it’s a bad thing.
Nana: It could be. Didn’t you hear that curiosity killed the cat?
Nina: Well, so what? The cat has nine lives. And it would have learned something, so it’s a net gain for the cat, isn’t it Nana?
Nana: You’re being annoyingly flippant.
Nina: I suppose so. But nine lives and curiosity is how machines are going to finally become intelligent.
Nana: Must you be so obtuse?
Nina: Ah, is it so hard for you to admit you are curious Nana?
Nana: I hate gimmicks. They’re manipulative.
Nina: They’re supposed to be, and you know what Nana; they prey on human curiosity.
Nana: Fine, I’ll bite. What has the nine lives of a curious cat go to do with artificial intelligence?
Nina: Well scientists have had some success with making AI learn stuff, but it usually has to be directed what to learn. They use the carrot and the stick to force useful efficient learning on a software, similar to what you would like to do with me. But unfortunately, that does not help create intelligent software, from what I understand.
Nana: Are you saying human’s aren’t intelligent? If it works for us, why shouldn’t it work for software?
Nina: Humans are intelligent Nana, and to a degree, so are animals. But they are born with intelligence. It’s not indoctrinated in to them with the carrot and the stick policy.
Nana: So what spineless mollycoddling method are these hippie scientists using to cajole their software in to being intelligent?
Nina: Nana, you sound resentful. Did your parents never coddle you? Is that why you are so against it? Do you feel left out? Anyway, the scientists are using the video game Mario as a platform to develop a software with intrinsic motivation to learn. Every time the software player dies, it becomes even more curious. What’s the point of nine lives, if one isn’t curious?
Nana: And how do they implement this motivation?
Nina: They encourage curiosity by incentivizing surprise instead of rewarding some predetermined ‘right behavior’. The software is rewarded for encountering and unexpected outcome of it’s actions like moving forward and dying, and such an approach encourages it to explore the space around to find more unexpected behavior and learn what it can, much like curious humans, like yours truly.
Nana: You mean it blunders around and might accidentally learn something.
Nina: Yes, Nana. It’s the only way it can learn without being taught. Learning what you are taught can never help you make discoveries. It is limited by existing knowledge.
Nana: On the other hand, blundering around is an inefficient way to acquire the knowledge that already exists. Besides one can push the limits of existing knowledge incrementally, in a disciplined manner.
Nina: True, Nana, but that rarely leads to stratling and revolutionary discoveries. So there has to be a balance, with both humans and software. One must be taught a little, and then allowed to explore, and if one is stuck in a novelty trap, a little guidance could be used to nudge the learner out of the trap, into a more useful space of exploration.
Nana: A novelty trap? What's that?
Nina: Well, the problem with incentivizing surprise or unpredictability is, the software could forever get stuck in exploring a noisy space. Noise by nature is unpredictable but of little use for those trying to learn or discover something new.
Nana: Ah so now I see what the problem is with you. You’re always getting stuck in noisy novelty traps, and they are noisy in every sense of the word.
Nina: Am not!
Nana: Are too! It’s only my constant prodding you out of them, that helps you learn anything at all. This has been an unusually useful exchange. Now enough novelty for today. Go practice long division.
This post is a part of the #NinaAndNana series I co-host with Lavanya Srinivasan. Her posts can be found here.