This is what you call an exciting development. A new computer program developed by researchers at Cornell University has accomplished an astonishing feat – deducing laws of physics using nothing but simple mathematical analysis of large data sets. No, it didn’t discover anything path-breaking but this is a huge step forward nevertheless.
What the program had to work with was just a few basic mathematical operators. The program was designed to look for relations between different parts of the dataset fed to it and distill equations from it. The dataset fed involved ones from simple pendulums and spring oscillators. Using this, the program spit out equations – Newton’s Second Law, and the law of conservation of momentum. All that in just one day! Here’s how Wired described in a nutshell the concept behind the software:
Initially, the equations generated by the program failed to explain the data, but some failures were slightly less wrong than others. Using a genetic algorithm, the program modified the most promising failures, tested them again, chose the best, and repeated the process until a set of equations evolved to describe the systems.
To be fair to previous researchers in the field, we can say that it is only now that we have access to cheap, plentiful storage and processing power that such a thing could be possible. Even with such serious number-crunching power it did take considerable time for the come up with ‘easy’ laws. Security researchers have already harnessed the power of farms of PlayStation 3 consoles to carry out brute force attacks to break SSL. The point to note here is that this kind of power is finally something which is attainable within a reasonable budget. Progressing to a stage where it can actually dazzle us with a new discovery may be quite a daunting task given the amount of processing it would take. Still, this particular iteration of the software was more ‘proof-of-concept’ rather than a production model – a demonstration that it could be done. Equipped with more complex ‘building blocks’ of mathematical operators might allow it to achieve results faster.
photo credit: Home BissAnother recent development in a somewhat similar field is that of the search engine Wolfram Alpha. Currently not publicly accessible, but certain media outlets were given a preview of Wolfram Alpha. According to whatever the reviewers have been allowed to speak it would seem that Wolfram has gone where Powerset hasn’t. Wolfram actually generates fact-sheets regarding a user’s search query – albeit limited to only certain topics at the moment – in contrast to Powerset which simply returns existing data. Present-generation search engines such as Google and Yahoo! are nowhere going out of the picture any time soon unless per web-page indexing costs (comparatively high for Powerset) goes down. Let’s not consider processing cost a trivial factor. Although it was a dud, one of the major ‘advantages’ that the promoters of Cuil touted was that their system could index much more data than Google with lower expenditure on processing costs. Eventually Cuil’s ranking algorithm was written off by reviewers as flawed (maybe because it showed “random pornography” – NSFW!)
That brings us to the larger ideal of the Semantic Web – a vision where computer can understand what currently human-readable data means. The top two search engine providers are going different ways on this. Google is trying to use large-scale data analysis to find patterns – basically, NLP on a scale never tried before; they are well-prepared for this too as they own one of the largest indexes of information in the world. Yahoo! Search, on the other hand, is rooting for microformats. Microformats use existing XHTML markup standards to ‘mark’ as to what the data tagged means. Does require effort on part of the publisher, but given the craze around search engine optimization I’m sure this will take of quickly if the Google monolith throws in the towel and decides to support microformats. You see, although Google’s ideals of using unbiased machine-analysis et al is sci-fi and ‘futuristic’, it is also too rosy and impractical – for now. Powerset succeeded because it stuck to analysing Wikipedia; Wolfram Alpha is aimed at providing answers (right now) to known scientific facts. In both cases, we find that this was possible due to the availability of structured data. With microformats, descriptions can easily be added without resorting to OWL.
photo credit: ~inkyThe ‘fun’ thing about computer science as a field is that it is not limited by physical laws in a major way. True, Moore’s Law has been validated every single milestone with faster and / or cheaper processing power becoming available. (Equally important is cheap storage – can you even think of analysing the terabytes and petabytes of data without current storage solutions?) In this case it is true that hardware limitations have stemmed reasonable attempts at quests such as these – until now. Software truly is virtual reality. There is no limit to how fast a computer language and its capabilities can mutate with successive generations. There’s a limit to the efficiency of an internal combustion engine. Forget practical figures of maximum efficiency hovering around 25%; there are theoretical limits dictated by thermodynamics. There are limits to how tall you can make a building – limits imposed by construction material. Computers, on the other hand, will see processing power increasing manifold over the next few years. Will humankind innovate quickly enough to keep in step with this raw power and unlock new secrets? Time will tell, but it certainly is an interesting and exciting challenge for those stepping into this field.