2008/03/08

Finding Data Using the Virtual Observatory

I did not work on science today at all, but I realized that even simple questions are not so trivially answered by using VO collections. For example, Marc wanted to know "what is the cluster of galaxies in literature with the most number of member redshifts".

So I went to the registry and looked at what I get if I search for clusters of galaxies. It seems to be mostly the Vizier catalogs. Instead of using code to call Vizier I went directly to it and selected all cluster of galaxies in its holdings that have UCDs with the keyword REDSHIFT.
I found 165 cluster catalogs. I wanted to query all of them, but if I select them all I get a strange error page with:

\Beg{DIV}{class='error'} \Beg{DIV}{class='explainerror'} \centerline{{\LARGE\bf Too many catalogues selected}}\par {\em The following unexpected problem occured in} {\bf VizieR}:\thickrule\par {VizieR isn't able to query simultaneously so many catalogs,there is a limit of approximatively 30 tables.

I need to down-select. What is the minimum number of catalogs I can query at once? Nowhere this number can be found: trial and error... grrr.... Five seems to work, but then I am taken to a form that I can only describe as REALLY SCARY: see photo and notice the scrollbar on the left!

I don't want to look at each of the 5 cluster catalogs' columns... so I do the lazy thing and select all columns.
The result can be generated in several formats and I pick text and KML. KML is visually ok (icons are too big), but one can see all the cluster's members, even if it's not obvious at all which of the galaxies have redshifts.... Ultimately this is not as useful as I don't want to search and save 5 catalogs at a time....

Back to Vizier. How about all galaxies with redshift? 260 matching catalogs... and then? Same output options and no way to run a query on redshift on each of them, or at least not obvious to me.

Going to NED. Sending email to Joe Mazzarella to see if I can download somehow all the 1.4M redshifts NED has and then search around the center of the sample Marc and I have of 560 clusters. Joe responded with a super-detailed email and a way to download the NED redshifts at 50K chunks at a time. Not idea, but duable. I will work on this, particularly if I can download and ingest into a SQL Database directly from the wire! However, this will NOT answer the question I am posing....

Further work is needed. Looking at VO and ADS.