Science Gateways and their tremendous potential for science

1 Science Gateways and their tremendous potential for sci...
Author: Jean Payne
0 downloads 0 Views

1 Science Gateways and their tremendous potential for scienceNancy Wilkins-Diehr TeraGrid Area Director for Science Gateways San Diego Supercomputer Center

2 Overview What are Science Gateways? What is TeraGrid? Why TeraGrid and Gateways? Examples of Success How Does This Help Me?

3 Phenomenal Impact of the Internet on Scientific Research Only 15 years since the release of Mosaic!Very rapid changes in how science is conducted 1988, National Center for Biotechnology Information BLAST server, search results sent by , still a working portal today 1992 Mosaic web browser developed 1995 “International Protein Data Bank Enhanced by Computer Browser” 2004 TeraGrid project director Rick Stevens recognized growth in scientific portal development and proposed the Science Gateway Program Ensuing explosion of digital information Need for analysis in a growing number of scientific areas

4 Very Rapid Changes in Web UsabilityFirst generation Static Web pages Second generation Dynamic, database interfaces, cgi Lacked the ease of use of desktop applications Third generation True networked and internetworked applications that enable dynamic two-way, even multi-way, communication and collaboration on the Web. These new applications will enable remarkable new uses of the Web in the organizational workplace and on the Internet Fourth generation Web 2.0 Source: Screen Porch White Paper, The University of Western Ontario (1998)

5 Gateways are a Natural Extension of Internet Developments3 common types of gateway Web portal with users in front and services in back Client server model where application programs running on users' machines (i.e. workstations and desktops) and accesses services Bridges across multiple grids, allowing communities to utilize both community developed grids and shared grids Continued rapid changes ahead, must be adaptable, gateways can provide some nimbleness Scientific gateways can have varying goals and implementations. Some expose specific sets of community codes so that anonymous scientists can run them. Others may serve as a community portal that brings a broad range of new services and applications to the community. Some may provide access to data collections or the ability to create data products by analyzing data in a collection. Some provide remote visualization. A common trait of all gateways is their interaction with the TeraGrid through the various service interfaces that TeraGrid provides.

6 Arden Bement Senate Testimony, April 19, 2007“Virtual environments have the potential to enhance collaboration, education, and experimentation in ways that we are just beginning to explore.” “In every discipline, we need new techniques that can help scientists and engineers uncover fresh knowledge from vast amounts of data generated by sensors, telescopes, satellites, or even the media and the Internet.” Gateways are a terrific example of interfaces that can support transformative science

7 Gateway Idea Resonates with ScientistsCapabilities provided by the Web are easy to envision because we use them in every day life Researchers can imagine scientific capabilities provided through a familiar interface Groups resonate with the fact that gateways are designed by communities and provide interfaces understood by those communities But also provide access to greater capabilities on the back end without the user needing to understand the details of those capabilities Scientists know they can undertake more complex analyses and that’s all they want to focus on But this seamless access doesn’t come for free. It all hinges on very capable developers

8 What’s different when the resource doesn’t belong just to me? Tremendous Opportunities Using the Largest Shared Resources - Challenges too! What’s different when the resource doesn’t belong just to me? Resource discovery Accounting Security Proposal-based requests for resources (peer-reviewed access) Code scaling and performance numbers Detailed justification of resource request Citations, metrics of success Tremendous benefits at the high end, but even more work for the developers Potential impact on science is huge Small number of developers can impact thousands of scientists But need a way to train and fund those developers and provide them with appropriate tools

9 300+ Teraflops Computation Dedicated cross-country networkWhat is the TeraGrid? NSF-funded facility to offer high end compute, data and visualization resources to the nation’s academic researchers 300+ Teraflops Computation Visualization 20+ Petabytes Storage Dedicated cross-country network

10 TeraGrid Resources Available to Academic Researchers at No CostTeraGrid creates integrated, persistent, and pioneering computational resources that significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems Proposal-based access, researchers can use resources at no cost Targeted support available as well

11 Implementing Common Gateway RequirementsWeb Services GT4 deployment, identification of remaining capabilities Information services, WebMDS Auditing Need to retrieve job usage info on production resources GRAM audit deployed in test mode in September, inclusion in CTSSv4 Community Accounts Policy finalized, security approaches being tested by RPs Attribute-based authentication testing Allocations Changes in allocation procedures, the mechanisms used to evaluate science impact, and models for identity management, authentication and authorization that are more tuned to virtual organizations. Scheduling Metascheduling RAT On-demand via SPRUCE framework Outreach Talks, Schools/workshops (NVO, GISolve), major project demonstrations (LEAD) SURA, HASTAC, GEON, CI-Channel, SC, Grace Hopper, MSI-CI2, Lariat, Science Workflows and On Demand Computing for Geosciences Workshop Primer Living document in wiki, provides up-to-date overview and instructions for new gateway developers (“how to make your portal a TeraGrid science gateway”)

12 Gateways are growing in numbers Success in a variety of domains10 initial projects as part of TG proposal >20 Gateway projects today No limit on how many gateways can use TG resources Prepare services and documentation so developers can work independently Open Science Grid (OSG) Special PRiority and Urgent Computing Environment (SPRUCE) National Virtual Observatory (NVO) Linked Environments for Atmospheric Discovery (LEAD) Computational Chemistry Grid (GridChem) Computational Science and Engineering Online (CSE-Online) GEON(GEOsciences Network) Network for Earthquake Engineering Simulation (NEES) SCEC Earthworks Project Network for Computational Nanotechnology and nanoHUB GIScience Gateway (GISolve) Biology and Biomedicine Science Gateway Open Life Sciences Gateway The Telescience Project Grid Analysis Environment (GAE) Neutron Science Instrument Gateway TeraGrid Visualization Gateway BIRN Gridblast Bioinformatics Gateway Earth Systems Grid Astrophysical Data Repository (Cornell) Many others interested SID Grid HASTAC

13 Mapping Tool Used on Large Data Sets to Spot Brain DisordersLarge Deformation Diffeomorphic Metric Mapping (LDDMM), developed at the Center for Imaging Science at Johns Hopkins Computes a mathematical description of which shapes are similar and different by computing metric distances in the space of anatomical images "Using TeraGrid resources at multiple sites, this research has been able to successfully distinguish diagnostic categories such as Alzheimer's and Semantic Dementia from control subjects," said Anthony Kolasny, JHU. "This can potentially lead to a powerful new cyberinfrastructure tool clinicians can use to make earlier, more accurate diagnoses." Source: SDSC Headlines, Paul Tooby

14 BIRN uses SSHFS to mount TeraGrid filesystems locallyAugust 2005 CIS has 87TB of local storage. /cis/net lists network drives. 220TB through CIS portal using autofs, samba, smbwebclient. Source: Anthony Kolasny, Johns Hopkins University Charlie Catlett

15 What is SSHFS and how can it help?August 2005 SSHFS allows you to mount data through an ssh connection. Simple command line sshfs local_dir Performance is as fast as your ssh connection. Performance tuning possible. Allows you to use local applications on remote data. using Paraview to look at data processed on the TeraGrid and stored on the GPFS-WAN. Directly accessing the remote file. Your changes are seen by everyone. Source: Anthony Kolasny, Johns Hopkins University Charlie Catlett

16 TeraGrid Life Science GatewayApplication services for bio-informaticians Ability for end-users to apply the large scale resources of the TeraGrid to their problems, while leveraging local resources, Featured apps InterProScan, version 4.2 InterProScan Data version 12.0 hmmr, version 2.3.2 Blastall (from InterProScan) version 2.2.6 Plans to engage Bioinformatics Research Centers (BRC) Eight BRCs sponsored by the National Institute of Allergy and Infectious Disease (NIAID) Funded to display sequencing and annotation data, comparative analysis, genome polymorphisms, gene expression, proteomics, host/pathogen interactions and pathways for the NIAID list of Category A-C priority pathogens and other pathogens causing emerging and re-emerging diseases.

17 TeraGrid Bioportal Access to over 140 computational tools and many biological data sets Collaborative workspace, simplified access to diverse set of tools Database searching, alignment and phylogeny, pattern searching, DNA/RNA analysis, and protein analysis EMBOSS (European Molecular Biology Open Software Suite), GLIMMER (Gene Locator and Interpolated Markov Modeler), HMMER (Hidden Markov Modeler), the NCBI (National Center for Biotechnology Information) toolkit and PHYLIP (PHYLogeny Inference Package). Standard databases include NCBI Aggregate, PDB, Prints, RepBase, UniProt, PFam, ProSite, and TransFac

18 GEON Developing cyberinfrastructure in support of an environment for integrative geoscience researchIT advances can significantly impact how geoscientists conduct their daily research activities Web/grid services, TeraGrid Semantic data integration Information management and ontologies Tremendous opportunities to conduct novel and efficient research in many areas of the geosciences SYNSEIS – SYNthetic SEISmogram generation tool Helps seismologists calculate synthetic 3D regional seismic waveforms Accesses distributed data centers and large computational clusters Users only need to have access to the Internet and a browser. The entire system is web-based and is accessible from the GEONgrid portal web page.

19 GEON: LiDAR (Light Distance And Ranging) dataCapable of generating digital elevation models (DEMs) more than an order of magnitude more accurate than those currently available Opportunity for geologists to study the processes the shape the earth’s surface at resolutions not previously possible. Distribution, interpolation and analysis of large LiDAR datasets, which frequently exceed a billion data-points, present significant computational challenges. GEON tools begin with a user-defined subset of data and ends with download and visualization of interpolated surfaces and derived products.

20 Linked Environments for Atmospheric Discovery (LEAD)NSF TeraGrid Review Linked Environments for Atmospheric Discovery (LEAD) January 10, 2006 Providing tools that are needed to make accurate predictions of tornados and hurricanes Meteorological data Forecast models Analysis and visualization tools Data exploration and Grid workflow "better than real-time" prediction of mesoscale weather events such as tornadoes Linked Environments for Atmospheric Discovery (LEAD) makes meteorological data, forecast models, and analysis and visualization tools available to anyone who wants to interactively explore the weather as it evolves. The LEAD Portal brings together all the necessary resources at one convenient access point Charlie Catlett

21 LEAD Inspires Students“Dr. Sikora:Attached is a display of 2-m T and wind depicting the WRF's interpretation of the coastal front on 14 February It's interesting that I found an example using IDV that parallels our discussion of mesoscale boundaries in class. It illustrates very nicely the transition to a coastal low and the strong baroclinic zone with a location very similar to Markowski's depiction. I created this image in IDV after running a 5-km WRF run (initialized with NAM output) via the LEAD Portal. This simple 1-level plot is just a precursor of the many capabilities IDV will eventually offer to visualize high-res WRF output. Enjoy!” Eric ( , March 2007)

22 NanoHub Explosive User GrowthNanohub is used to complete coursework by undergraduate and graduate students in dozens of courses at 10 universities. Nanohub attracts thousands of users Over 2M hits in last month In past 12 months Over 21,000 users Almost 175,000 simulation runs Very full-featured Simulation tools Research proceedings Curricula content Collaboration spaces

23 GridChem - a desktop application gatewayComputational Chemistry Grid (CCG) science gateway GridChem has been using TeraGrid in production since April 2006 Currently services over 100 users and has delivered hundreds of thousands of CPU hours Many paper publications resulting from GridChem use

24 CReSIS (Center for Remote Sensing of Ice Sheets)Awarded CI-TEAM funding to build a Polar Gateway International Polar Year CReSISGrid Build a TeraGrid Science Gateway Provide broad-based educational and training activity in Cyberinfrastructure for remote sensing and ice sheet dynamics MSI impact through leadership of Linda Hayden, Elizabeth City State University

25 Tremendous Potential for GatewaysIn only 15 years, the Web has fundamentally changed human communication Science Gateways can leverage this amazingly powerful tool to: transform the way scientists collaborate tackle the toughest problems independent of location impact the amount of science that can result from each project influence the public’s perception of science High end resources can have a profound impact The future is very exciting! Web 2.0 Application Hosting Gateway-in-a-box

26 Would development of a gateway help your research?Researchers using defined sets of tools in different ways Same executables, different input Datasets Workflow creation Common data formats Large shared datasets mailing list in body Biweekly telecons to get advice from others Details about current gateways Materials from June full day tutorial at TG07

27 Thank you for your attention Any questions?Nancy Wilkins-Diehr