Agile & Open Source concepts applied in a large scale scientific project The massive amount of data that the Square Kilometer Array (SKA) will produce requires innovations in both software and hardware and changes in the way scientists use astronomical data. This is so important that The International Centre for Radio Astronomy Research (ICRAR) selected ThoughtWorks to collaborate in an exploration of how science and the technology industry can work productively together, on a global scale.
Highlights Key techniques and processes ThoughtWorks’ exploratory venture with ICRAR introduced general agile and Open Source concepts that hold promise for making the SKA a reality. The following techniques and processes were used: Automation of routine activities (environment setup, build, test, deploy) Use of Open Source techniques to allow global collaboration:
The Square Kilometre Array (SKA) is an international science programme to build the world’s largest and most sensitive radio telescope - an instrument that will be 50 times more sensitive than today’s most powerful radio telescopes. The SKA will allow scientists to investigate the evolution of galaxies, dark energy, gravity, cosmic magnetism and even seek signs of biological life on other planets. Its central computer will have the processing power of about one hundred million PCs. The array will use enough optical fibre to wrap twice around the Earth, and will generate enough raw data to fill 15 million 64 GB iPods every day! The SKA is so advanced its development will drive creation of a new generation of technologies. Spin off innovations will benefit other systems that process large volumes of data from geographically dispersed sources, while the power requirements of the SKA can accelerate technology development in scalable renewable energy generation, distribution, storage and demand reduction.
Since team members come and go over the duration of such long multi-part and multi-phase projects, specific steps are taken to minimise impact and enable new members to be productive as quickly as possible: When planning work, plan some (or many!) pieces that can be easily completed by someone new to the project Automate development environment setup so new people can be productive quickly Create just enough documentation to allow new people to be productive, and update it based on feedback from those people.
Agile & Open Source concepts applied in a large scale scientific project The SKA organisation: Globally diverse, singularly focused On the 25th of May 2012, the SKA Organisation announced that Australia-New Zealand, together with South Africa, would share hosting of the SKA. It will be built on two continents, spanning from Australia to New Zealand and from South Africa to Ghana and Mauritius. The massive amount of data that the SKA will produce will require innovations in both software and hardware, and changes in the way scientists use astronomical data. Science organisations from the 8 founding members and at least as many associated countries will have to work closely with the software and hardware industry to achieve their aims. This is so important, and groundbreaking in scope, that the International Centre for Radio Astronomy Research (ICRAR) selected ThoughtWorks as industry software experts to collaborate in an exploration of how science and the technology industry can work productively together, on a global scale.
ThoughtWorks enables ICRAR and the scientific community to build a framework for reliable and maintainable software Researchers from the International Centre for Radio Astronomy Research (ICRAR), based in Perth, have been heavily involved in Australia’s precursor programmes for the SKA. The partnership with ThoughtWorks was designed to identify software techniques that will help manage and distribute large pieces of work, demonstrate how Agile principles and the scientific method are complementary disciplines, and benefit both the scientific and IT communities by cross-pollinating ideas and approaches.
www.thoughtworks.com
Science and research organisations already use iterative development and validation to quickly investigate and develop solutions to new situations, however the focus on science problems rather than software problems can lead to code that is hard to maintain. As the SKA will be pushing computational limits on every front (data transfer volume, data storage, and computational complexity), the efficacy of the principles and techniques used will be tested at extreme levels, providing new insights into how the process of developing and delivering software and systems can be continually improved.
“
We at ICRAR believe that collaboration with the software industry, particularly companies that follow Agile principles, will lead to software with higher levels of maintainability and quality. Our partnership with ThoughtWorks has been established in order to introduce techniques to increase productivity into a project as complex and with as many contributors as the SKA. Such techniques can make a huge difference and could be key to the success of the SKA software development effort.
”
Professor Andreas Wicenec, Information and Communication Technologies Lead at ICRAR.
As the first step in the process of identifying and demonstrating current software methods and technology that will be needed for the SKA, several threads of inquiry were addressed: V Managing large scopes of work distributed among many people and locations V Designing for high data throughput and processing V Distributed computing V Data visualisation.
Managing large, dependent, scopes of work among dispersed teams With agencies and scientific organisations from about 20 countries involved in the SKA and the great amount of work to be coordinated, the need for tools and processes to maintain focus and allow large but dispersed teams to work together productively with the least possible overhead and waste is a prime concern. Based on experience with distributed commercial software projects, ThoughtWorks advised on how to scale practices and processes when larger numbers of people and locations are involved, and how to breakdown work most effectively among teams in multiple locations.
Designing for high data throughput and processing The collection of the SKA antennas will be producing a data stream of up to 9 Pb/s (the Large Hydron Collider produces 15 petabytes of data per year. The data that it produces in a year is the same amount that the SKA will produce in less than 14 seconds). Although it will be processed to reduce the amount that needs to be stored to about 100 Gb/s (a 1/10,000 reduction), this amount of data is still so large as to make it unfeasible to fetch and manipulate the entire data set at once for processing or visualisation. ThoughtWorks advised on: V Datacentre design V Considerations for designing a parallel processing system
ThoughtWorks demonstrated processes and tools that have proven useful on a very large commercial scale: V Using github for collaboration (shared version control, issue tracking and wiki) V Continuous integration V Automated regression testing V Using a build and release server V Using daily stand-up meetings, video and voice calls, IM, email to maintain high levels of communication among people working on a project in multiple locations and time zones (in this case Melbourne, New York, and Perth).
Continued over
www.thoughtworks.com
Distributed Computing
What Next?
ICRAR realises the potential of distributed computing to process large amounts of data. They already have theSkyNet, which harnesses spare computing power to process radio astronomy data. They also have projects that use the BOINC Open Source grid-computing platform. ThoughtWorks contributed to a BOINCbased project to calculate spectral energy distributions for radio astronomy images.
The SKA is still in its early development. The site selection in May 2012 marked the end of the preparation stage. The next couple of years will be dedicated to the design and architecture of the SKA. The precursor and pathfinder projects will continue through this phase and inform the process, while building partnerships with industry that will support the construction phase toward the end of 2015. There are many software and computing hardware challenges that will provide opportunities for growth and innovation.
ThoughtWorks helped create: V A BOINC project which wraps the MagPhys module for use in BOINC clients V Scripts to set up a BOINC server on AWS infrastructure
Sources:
Data Visualisation
http://www.skads-eu.org/p/loska2006/B_35.pdf
Stellarium is planetarium software, which can be used to show you the ‘night sky’ as it would be seen from earth at a place and time of your choosing. A plugin called Survey Monitoring Tool (SVMT) allows you to map out and monitor sky surveys that are carried out on optical telescopes.
http://www.skatelescope.org/the-organisation/history-of-theorganisation/participating-countries
https://tnc2012.terena.org/core/presentation/44
In partnership with ICRAR, ThoughtWorks made progress towards adapting the SVMT plugin, originally developed by the European Southern Observatory for optical astronomy, to represent radio astronomy sky survey data so that it can be used for the SKA precursors and the SKA.
Useful links: http://proceedings.spiedigitallibrary.org/proceeding. aspx?articleid=1363053 http://www.stellarium.org
ThoughtWorks helped create: VËSVMT data files for the Wallaby and Dingo survey fields VËScripts to create a SVMT server VËScripts to create a Stellarium/SVMT development environment
thoughtworks.com Australia | US | UK | Germany | Canada | India | China | Brazil | Singapore | South Africa | Uganda Details of every ThoughtWorks office are available on our website