How to (hopefully) not drown in dataPosted: November 15, 2011
More is better, right? Bigger telescopes and bigger surveys are both undoubtedly good things, but to make the best use of these advances we need to be able to handle the corresponding increase in data flow, and subsequent pressure on the astronomical archives which are going to have to cope with it.
This is a cross posting with the Astronomy Twitter Journal Club who are going to be discussing this topic on twitter (search for the #astrojc hashtag) this Thursday at 20:10 GMT. If you’re interested please come and join in.
This ‘data tsunami’ is almost upon us, according to a new paper by G. Bruce Berriman and Steven Groom. The recent addition of large datasets from the Spitzer and WISE telescopes has massively increased queries to the online Infrared Science Archive (IRSA), and, unsurprisingly, slowed down the response time of the database. This is only going to get worse as the archive’s growth is expected to accelerate over the next few years.
The paper also points out that how astronomers use archives is going to change. At the moment, raw datasets are typically downloaded and then reduced on a user’s own computer. However, once data reach peta-byte scales it’s likely that they’ll have to be handled in situ, if only to avoid breaking the internet.
So what can be done? And, more importantly, can we do whatever we’re going to do in as cheap a way as possible? Firstly, we need better ways to search multiple online datasets efficiently – the excellent Virtual Observatory is already developing techniques to help here.
Next, we need to explore new technologies like cloud computing. The Square Kilometre Array (which will generate 10 gigabytes per second) will have theSkyNet, the (worryingly named) community based cloud which will harness the power of volunteers’ computers to process its data.
Finally we need to talk more, especially to IT experts in computer infrastructure, and then share what we’ve learned in the authors’ proposed new journal dedicated to information technology in astronomy. We then need to properly reward the effort people put into this area, as well as giving young astronomers a grounding in software engineering to better prepare them for this data-heavy future.
If we do all that then, the authors’ suggest, we’ll be able to survive the coming data flood. Fingers crossed.
G. Bruce Berriman, & Steven L. Groom (2011). How Will Astronomy Archives Survive The Data Tsunami? ACM Queue arXiv: 1111.0075v1