Sunday, August 19, 2012

SHADOWS OF MIND TO reCAPTCHA

Knowledge is infectious. When I started reading ‘Shadows of Mind: A Search for the Missing Science of Consciousness’ by Roger Penrose, the very reading took me through an interesting journey into various spheres of knowledge related to artificial intelligence.

Penrose is renowned for his work in mathematical physics. ‘Shadows of Mind’ is a sequel of his previous book, ‘The Emperor’s New Mind’. In both the books, Penrose explains inter alia the concept of ‘Turing test’. It is a test that can be used to find if a given machine thinks like a human. For example, assume that there is a machine and a human placed in front of a human judge. However, the machine, the human and the human judge are separated from each other by non-transparent sound-proof screens. The human judge knows that there is one human and one machine sitting in front of him, but he does not know which is what. The human judge has to engage in a natural language conversation with both the machine and the human and find out which is machine. The machine will try to imitate humans in its answers. The conversation is limited to text-only format (like chatting in a communicator). If the judge is not able to reliably distinguish the machine from the human, the machine is said to have passed the test. It has gained artificial intelligence!

I was curious about the Turing test and started reading other articles related to this matter. I was then introduced to another test called as ‘reverse Turing test’. It is a reverse version of Turing test in which the human judge is replaced with a machine judge. The machine judge has to find out which is machine and which is human. You may not believe that we undergo this reverse Turing test quite often when we spend our time in front of computers - especially in IP. Before we access the file wrapper from USPTOPAIR, it asks us to enter the words given in ‘reCAPTCHA’ to access the documents. I have pasted an example below:

 

The reCAPTCHA appears in many sites including social network sites. The back end machine recognizes through reCAPTCHA whether it is human or a software program (i.e. a machine) that attempts to access the web site and allows only humans because it is believed that only humans can interpret and type those distorted words correctly. The present day machines are not intelligent enough to interpret such distorted words. Thus, each time we access such web sites, we unknowingly go through a reverse Turing test!

There is an interesting story behind reCAPTCHA project. Apart from making us to undergo reverse Turing test, the reCAPTCHA indeed puts us on a job which we did not even know we had. A non-profit group called Open Content Alliance functioning from the University of Toronto is engaged in a massive project of scanning the out-of-copyright books in the world.

Once the text is scanned, the file is sent to a server in California, where it is run through optical
character recognition (OCR) software. We might have used the OCR option available in Adobe Acrobat for converting image into text. In the same way, it produces a digital full-text version of those out-ofcopyright books. However, the old text books often contain illegible words that are difficult for the software to recognize. These troublesome scans are sent on to the reCAPTCHA servers at Carnegie Mellon University in Pittsburgh.

These are the words distributed to us when we enter the sites such as USPTO. As a control, the
reCAPTCHA program distorts a known word and pairs it up with the word the scanning software has
failed to decipher. That is why we find two words in reCAPTCHA. However, we do not know which the control word is and which the original word from text book is. If we decipher the control word correctly, the computer takes our deciphering of the original text book word also as correct. However, the reCAPTCHA reconfirms it by circulating the same word few more times. It is reported that the system is now correcting over 10 million words a day, with 99.1 percent accuracy. Thus, every time we enter the web site by passing through reCAPTCHA, we actually do a job for a not-for-profit cause!

It was a long interesting journey from Shadows of Mind to reCAPTCHA.

No comments:

Post a Comment