As I was getting information overload at SciFoo, I got really intrigued by a comment Tim O'Reilly made about metadata. He basically said that searches were going on all the time even well before Google emerged (this is self-evident!) However, he added that "all Larry Page and Sergey Brin did was create PageRank" and implement an algorithm to actually rank pages in order of importance, i.e. the links the carry.
While this had been known to me, the next comment Tim made hit home. "We need to allow users to submit data to us in ANY format and catalog it in some automated fashion by finding the metadata associated with the data itself. More importantly, we need to allow users to query this metadata and tag it according to relevance."
Can we do this automatically for scientific data? If so how? Lucene? Anyone?