A few weeks ago Stuart at Astronomy Blog produce a very provocative and interesting graph he called the "H-R diagram of Astronomers".
His original work is reproduced below and you can find his post Astronomy Blog.
When I saw the plot I immediately thought: what would the real data look like in this ADS-Google space?
I then wrote a simple service that uses the Google Search API to retrieved my "absolute frame" and then coupled this with a simple call to ADS for the number of refereed papers. Once the services where in place, all I needed was a dataset of astronomers. I happen to have a list of all the member of the American Astronomical Society (as of 2008 I believe) and I decided that this was a good start. I ingested this list in a database, which also contains another "dimension" which is the astronomer's Institution (but this might be the subject of a different post). I then tied the services to the database and I was ready to run my initial tests.
The Google searches took almost no time, but the ADS ones turned out to be a little more involved. Even if I had originally threaded each request to be issued at regular intervals, ADS simply blacklisted my machine's IP. However, after a few emails with the folks at ADS - who incidentally expressed a lot of interest - I was able to resume the counting.
The current dataset contains ~6500 astronomers who are AAS members and you can find the plot below.
In the original post on Astronomy Blog, Stuart did mention that Google searches depend on the servers you query. In this case I think that given that the dataset is comprised of all AAS members, the google main USA servers seemed to like the appropriate choice. However, as many of the comments in the original post highlighted, there are potential large errors in the Google index, given that what we are measuring is the number of results returned for a given name. To address this issue, as Stuart pointed out, one has to take into account how "celebrity is measured". Leslie Lamport at Microsoft Research looked at this specific issue and defined a Celebrity Index in this paper (PDF).
I have not had the time to look at how to implement a CI index for the H-R diagram for Astronomers, but I believe that if indeed extracting the CI data is feasible in an automatic fashion, this plot might become much more representative of reality. If then we are able to directly query the Google server which is most appropriate for the person, we might even have something useful here!
After that we should be able to have an interactive version of this plot, where one should be able to ask "Where am I?" or "Where is Carl Sagan?". I am planning on adding some of the famous astronomers Stuart had in his post and search and interactivity, but only if I can resolve the CI issue.