IU Networking at Indiana University
International Networking
Thursday 22nd March Workshop - day 1

7.30 - 9.00

Breakfast

 9.00 - 10.30

Welcome

US Welcome – James Williams, Indiana University and PI TransPac3 and ACE programs – 5 mins
India Welcome – N. Mohan Ram, Director General, ERNET – 5 mins
Address by Dr. Machi F. Dilworth, Director, Office of International Science and Engineering, NSF - 10mins
Address by Ambassador Arun K. SinghDeputy Chief of Mission, Embassy of India - 10mins 

Introduction to the Workshop - background, objectives and what we want to achieve from the workshop James Williams – 10 mins

Opening Plenary / demonstration – 40 mins

Network-enabled access to globally distributed data repositories
Moderated by George McLaughlin

Astronomy with Cutting-Edge ICT: Making sense of Transients using Geographically dispersed resources
Prof Ajit Kembhavi
Director IUCAA, Pune, India (remote) (Presentation) and Dr Ashish Mahabal, Astronomy Department, Caltech, (Presentation), Dr. Roy WilliamsLIGO, Caltech (Presentation)

AbstractMassive amounts of data are coming online from new generation sky surveys. Combined with the Geographically diverse archives and processing pipelines that need to run in real-time, these programs are a prime example of needing high bandwidth across continents. The presentation and demonstration will emphasize the critical need for high capacity networks and advanced communication services to exploit US-India collaboration in Astroinformatics and Astrophysics in connection with the current and near-future programs.

10.30 - 11.00

Morning Tea

11.00 - 12.30

Research Collaboration
Moderated by N. Mohan Ram

How the US and India Network Organizations support Research Collaboration

This session will provide updates since the last workshop on the network infrastructure that is available to researchers today or will be will be available to researchers in the relatively near term (1-2 years) and describe what the organizations that manage the networks do to facilitiate research collaboration.

Current and evolving R&E network infrastructure and research support structures in India

India's National Knowledge Network (NKN) - Dr. P.S. DhekneRaja Ramanna Fellow at BARC (remote) (Presentation) – 10 mins

Abstract: The National Knowledge Network (NKN) is a state-of-the-art multi-gigabit pan-India network for providing a unified high speed network backbone for all knowledge related institutions in the country. The purpose of such a knowledge network goes to the very core of the country's quest for building quality institutions with requisite research facilities and creating a pool of highly trained professionals. The NKN presentation would cover the key highlights of the project, Management overview, Technical overview, NKN connectivity status and key services offered.

ERNET - Mr. Dipak SinghDirector of Network Operations, ERNET (Presentation)– 10 mins

Current and evolving R&E network infrastructure and research support structures in the US and between US and India

TransPAC3 - James Williams, Principal Investigator TP3 project - Indiana University (Presentation) – 10 mins

Internet2 – H. David Lambert – CEO Internet2 (Presentation) – 10 mins

TEIN3’s role in facilitating US-India Collaboration - David West – DANTE(Presentation) – 10 mins (remote)

Abstract: The talk will provide a brief update of the TEIN project focussing on the connectivity it provides, project plans, and its current and potential future support for global science applications.

The Energy & Sciences Network (ESNet) support for Network-enabled Research Collaboration- Eli Dart (Presentation) - 15 mins

Infrastructure Q/A session – 25 mins

12.45 - 14.00

Lunch

14.00 - 14.45

Network-enabled Access to Distant and High Cost Instruments - 45 mins
Moderated by George McLaughlin 

Remote Access to and Control of Berkley Synchrotron Beamlines by Homi Bhabha National Institute (HBNI), Anushaktinagar, Mumbai
Dr. M V Hosur, 
HBNI (Presentation)
 

Abstract: Third generation synchrotrons are mega-facilities that drive expansions of research-frontiers in a variety of science, arts and engineering disciplines. However, they cost large amounts of money to build and maintain. They also make use of very sophisticated technologies, and therefore only few are available in the world. In the discipline of biology, crystallographers are a very active community to use hard x-rays generated at these synchrotrons. Two unique properties of synchrotrons are particularly significant from the point of view of crystallographers: 1) the high-brilliance of the almost continuous energy x-ray beam produced, and 2) the time structure of the x-ray beam. While the former feature allows usage of very small single crystals and experimental ‘phase’ determination, the latter feature enables structural mapping of biochemical reactions. The accuracy of structures determined using synchrotron data is normally much higher, and approaches the accuracy needed for structure-based drug design. These benefits of synchrotrons to protein crystallographers are now within reach of scientists around the world, thanks to completely automated protein crystallography beam lines, and to development of reliable networks for very fast transfer of commands and data. The National Knowledge Network in India is linked to international networks, and through this network we have been able to operate remotely following beam lines: 1) BM30 (or FIP) on ESRF in Grenoble, France, 2) BM14 on ESRF in Grenoble, France. A number of high-resolution data sets (1.6 – 2.0 Å) have been collected on drug-resistant HIV-1 protease mutant/drug complexes. These structures have been refined to low crystallographic R-factors comparable to those when the diffraction data were collected by onsite operation of the beamline. The molecular models derived are also stereochemically very accurate. There are several protein crystallography beamlines on synchrotrons(APS, ALS, NSLS etc.) available in the USA. Providing remote access to these beamlines to protein crystallographers from India would be a good example of net-enabled collaboration in scientific research. Using the remote data collection facility setup at Homi Bhabha National Institute, Anushaktinagar, Mumbai, feasibility of remote operation of the protein crystallography beamline 5.0.2 on ALS, has been established. Actual data collection will be carried out whenever remote access is formalised.

14.45 - 15.30

Network-enabling of Global Classrooms – 45 mins
Moderated by George McLaughlin 

NKN - A Gateway to a Global Classroom
A collaboration between Amrita University and the University at Buffalo (State University of New York)
Prof Kamal Bijlani
 Head of E-Learning Research Lab at Amrita University(Presentation)
Prof Bharat JayaramanDepartment of Computer Science and Engineering, University at Buffalo, SUNY (Presentation)

15.30 - 16.00

Afternoon tea

16.00 - 16.45

Opportunities for new collaboration - 45 mins
Moderated by James Williams

A Framework for Persistent Collaborations: PRAGMA Overview, Future, Lessons Learned, and Opportunities for US India Collaborations
Peter Arzberger, Director of the National Biomedical Computation Resources(Presentation)

Abstract: PRAGMA, a 30 institution, international, grass-roots organization, explores and evaluates practical approaches to how cyberinfrastructure software can be used to enable and enhance scientific collaboration among both small and medium sized groups. Scientific "expeditions" are used to define which software components, available from the PRAGMA partnership and elsewhere, need development and experimental evaluation prior to deployment on larger production infrastructures. Regular face-to-face meetings enables the group as a whole to support new science areas; gain insight to cyber developments in a very timely manner; support and sustain experimental testbeds across multiple administrative domains; create training, education, and network building activities; and provide the persistent interactions that engender the trust needed as the foundation of international scientific and infrastructure development collaborations.

In this presentation we will introduce PRAGMA as an example of a framework for persistent collaborations that rely on both physical and human networks. We will discuss future directions, lessons learned about collaborations, and present opportunities for collaboration between US and India researchers.

Role of HPC in cyberinfrastructure and some experiences in US-India Collaborations.
Radha Nandkumar, Emeritus Director of NCSA's International and Campus Relations (Presentation)  

Abstract: This presentation will describe opportunities for institutional and individual collaborations in defining the leading edge in high-end computing, information technologies, and cyberinfrastructure. The talk will highlight the role of high end computing in enabling breakthrough science and engineering in general as well as some of the challenges associated with large-scale simulations. An outline of several significant education and outreach activities as well as collaborative international projects will be provided. The presentation will conclude with mention of the impact of these initiatives on society at large.

Fostering Indo-US computational science collaborations
Suresh Marru, Indiana University, Program Manager, XSEDE Science Gateways(Presentation)

Abstract: A major hindrance in academic collaborations is intellectual property sharing. This talk will dwell upon open source development across multiple collaborating institutions paying attention to Science Gateways which provide Web-based environments for scientists and students to perform computational experiments online via Web interfaces using Web services and computational workflows. We believe there are important steps that should be taken to go beyond basic open source to address requirements for building open software communities. In addition to licensing and support tools, open communities must have open processes for making design decisions, accepting code contributions, adding new project members, reporting and resolving problems, and making well-packaged and properly licensed software releases. The Apache Software Foundation provides the infrastructure and mentoring experience to help open source communities address these project governance issues. Additionally, Apache has an interesting requirements (such as developer diversity) that are designed to emphasize the neutrality of the code base (encouraging competitors to have a safe place to cooperate), help sustain their projects through leadership turnover and funding cycles. I would like to discuss how forums like Apache can help US and Indian counter parts can share code and collaborate without worrying about IP and cross-country funding issues.

Afternoon Q/A - 15 mins

17.00 - 17.15

Sum-up from first day – George McLaughlinTransPac3

Dinner on your own


Friday
23rd March
Workshop - day 2

7.30 - 9.00

Breakfast

9.00 - 10.30

Welcome to day 2 and review of day 1 – James Williams – 10 minutes

 

Network-enabling of Medical Research and Drug Discovery Collaboration
Moderated by Mohan Ram


Cheminformatics and Open Source Drug Discovery: a case study in academic collaboration between the U.S. and India – 40 mins
Abhik Seal presenting for Dr. David WildIndiana University
Anshu Bhardwaj presenting for Dr. U.C.A. Jaleel, OSDD Malabar Christian College 
(Presentation)

Abstract: Indiana University (IU) has an internationally renowned and unique research and education program in cheminformatics, the use of advanced informatics techniques for chemistry, biology and drug discovery. Open Source Drug Discovery (OSDD) is a CSIR Team India Consortium with Global Partnership with a vision to provide affordable healthcare to the developing world through discovery of novel therapies for neglected tropical diseases like Malaria, Tuberculosis, and Leshmaniasis. IU and OSDD are engaged in a multi-year educational and research collaboration aimed at (i) applying the lastest research in cheminformatics techniques to the search for treatments for neglected diseases and (ii) bilaterally enhancing cheminformatics learning through engaging of Indian students with research and teaching programs at IU and the development of shared learning resources. In this talk we will describe the project and highlight the many challenges that have arisen, including infrastructure harmonization, funding, dealing with different time zones, and the need for advanced distance collaboration technologies. We will make recommendations as to steps that can be taken to facilitate stronger mutually beneficial collaboration between the countries.

Protein Structure Modelling on the IUCRG:  A BRAF--caBIG® collaboration – 40 mins
  Rajendra Joshi, CDAC/BRAF (remote) (Presentation)

Abstract: The last few decades have witnessed the evolution of biology from what used to be a purely experimental field, to a high end computational domain, where unrelenting computational power is required to decipher pieces of data generated through high throughput techniques into blocks of information that will help to answer many mysteries of life. To be able to generate knowledge from the oceans of genomic data, enabling technologies like High Performance Computing, Grid Computing and Cloud Computing are the latest weapons in the hands of the modern biologist.

The importance of protein structures can be understood easily from the fact that the function of any protein is directly correlated to its structure. The three dimensional structure of a protein directs its function within a cellular environment. Any mutation in the protein sequence leads to changes in its structure which in turn may render the protein non-functional or even attribute some adverse functions leading to diseases like cancer. Over the decades cancer has become one of the most prevalent diseases with an estimate of reaching over 12 million deaths in 2030 according to World Health Organization. Proteins from almost 1% of the human genome have been identified to be involved in oncogenesis. In the absence of resolved structural data (RCSB database has 73974 resolved protein structures as opposed to 534695 sequence entries in UniProtKB) one has to resort to computational techniques to get the 3D structures of proteins in order to properly understand their functions.

The Bioinformatics Group at the Centre for Development of Advanced Computing (C-DAC) in collaboration with cancer Biomedical Informatics Grid (caBIG®) has developed a grid-enabled web-based automated pipeline for ab initio as well as homology based prediction of protein structures, with an emphasis on cancer related proteins. The pipeline has been deployed on the Bioinformatics Resources & Applications Facility (BRAF) hosted at C-DAC, Pune India. The upstream component of the pipeline retrieves a protein sequence (according to user input) from the gridPIR service of caBIG® that provides a data resource of high quality annotated information on all protein sequences supported by UniProtKB. The retrieved sequence in a FASTA format is then fed to the prediction pipeline. At its core the pipeline consists of two prediction engines, one ab initio based that uses the ROSETTA prediction algorithm and another homology modelling based that uses the MODPIPE program, for determining the 3D structures. The graphical user interface of the pipeline enables the user to choose various control parameters like which secondary structure prediction algorithms to use, number of iterations, number of output structures, uploading NMR constraint files, e-value etc. Once submitted, the jobs get distributed over multiple processors on the Biogene supercomputing system at BRAF, which significantly reduces the prediction time. The resultant output comes in the form of predicted structures in PDB format and parsed energy log files which can be downloaded by the user. All the file transfers are secured over the network by SFTP. JMol has been integrated within the pipeline to provide a visual inspection of the predicted models. Test cases have been run using the pipeline with a few cancer related proteins, downloaded from The Cancer Genome Atlas (TCGA), where sequence data from various mutated proteins of affected patients are stored and made available in various data formats. Some of these results will be discussed during the presentation.

Indo-US Cooperation in biomedical informatics
George A. Komatsoulis, Ph.D.interim Director, Center for Biomedical Informatics and Information Technology and CIO at the National Cancer Institute (NCI), NIH(Presentation) 

Abstract:

10.30 - 11.00

Morning Tea

11.00 - 12.00

Evolving areas of Network-enabled Collaboration
Moderated by Dipak Singh

Geosciences, Environmental Networks & Cloud Services, and PRAGMA

A Knowledge R&D Networked Indo-US Collaboration: A case study in Earth Sciences – 20mins
Prof Arun AgarwalDept of Computer & Information Sciences, University of Hyderabad (Presentation)

Abstract: Firstly we will cover the role of GEON/PRAGMA projects, initiated primarily by SDSC -UCSD through NSF, in developing CYBERINFRASTRUCTURE in a wide range of Earth Science disciplines in India since 2005. How this “IT head start” helped in the data fusion and visualization of a variety of earth science related data sets . Secondly, we will also highlight significant achievements in producing a new breed of hybrid students in terms of innovative man power development with cross fertilization of different science streams with IT. Thirdly, we provide a review of available data sets that are being generated in India by various organizations and their applications. In conclusion we will make reference of large data sets with a need to build Cloud Cyber-infrastructure a shift from Grid Middleware based Cyber-infrastructure for geosciences.

Big Data and Cloud Benchingmarking - 20 mins
Prof Chaitan Baru, Director, Center for Large-scale Data Systems Research (CLDS) SDSC (Presentation)

Abstract: As science collaborations become data-centric—even moving in the direction of joint analysis of large datasets—there is increasing need for cyberinfrastructure to support data-intensive computing, and an opportunity for collaboration in the area of benchmarking "big data" applications at global-scale. Can we build environments that use distributed computing and the cloud-based paradigm in which researchers in the US easily access and analyze scientific data from data archives in India, and vice versa? What type of system and network performance is required to sustain such applications? Is there an opportunity for Indo-US collaborations to study performance issues related to such data-intensive applications, and to develop related benchmarks?
In this session, we will discuss a new effort in developing "reference benchmarks" for big data, and "probe benchmarks" for data-intensive clouds which we refer to as the Cloud Weather Service(TM). The goal of these efforts is to provide clear objective information on hardware, software, and system performance for data-intensive applications on dedicated clusters as well as in cloud-based environments. We will discuss how we might structure new Indo-US collaborations in this area.

Morning Session Q/A - 20 mins.

12.00 - 13.30

Lunch

13.30 - 14.45

 

Sustainable US-India Network Enabled Research Collaborations - Where to from here?
Moderated by James Williams

A panel comprising representatives of the US and Indian governments and scientists, taking into account the contributions made during the workshop, will debate and deliberate on ways to significantly further enhance Indo-US network enabled collaboration. In doing so the panel and participants will try to identify key issues, challenges, obstacles, and opportunities needed for the development of action plans, and identify next steps and future deliverables. This session will be followed by a wide ranging discussion among the participants which will help shape the final workshop recommendations.

14.45 - 14.55

Summing up of workshop - James Williams & N. Mohan Ram DG, ERNET India

14.55 - 15.00
Closing Remarks - James Williams