Information Technology and Electronic Networking

in the

University of California Natural Reserve System

 

Provisional Assessment of Basic Capabilities and Needs

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Rudolf Nottrott, May 2000

 

 

 

 

 

 

NRS Mission Statement

 

The mission of the Natural Reserve System is
to contribute to the understanding and wise management of
the Earth and its natural systems by supporting
university-level teaching, research, and public service at
protected natural areas throughout California.

 

Contents

Executive Summary *

Introduction *

The UC NRS -- an Important State, National and International Resource *

From Individual Stations to a Coherent Research Network -- Challenges and Opportunities *

Networking and Information Management in the Wider Ecological Research Community *

Survey Results *

Network Connectivity (Internet and Local Area Net) *

Computing Equipment *

Information Management Budgets *

Data Collection at the Sites *

Acknowledgements *

Appendices *

Appendix A - Glossary *

Appendix B - Survey Forms *

 

 

 

 

Executive Summary

This report summarizes the results of a survey of the basic capabilities of the sites in the NRS, in terms of electronic networking, information technology and data management.

The survey shows a wide range of those capabilities across the NRS sites, and indicates the need for upgrades and improvements in the near- to medium-term future. In particular, the following areas need special attention.

 

Introduction

The University of California Natural Reserve System was formed over thirty years ago in response to a growing need for a system of protected sites that would broadly represent California's rich ecological diversity, biomes and environments, protect them and make them available for scientific study, academic teaching, and public outreach.

The UC NRS -- an Important State, National and International Resource

The system has since been enormously successful in creating a system of outdoor classrooms and laboratories for long-term studies available to researchers, students and teachers at all University of California campuses, and to any qualified user from any institution, public or private, throughout the world. The NRS is not only a unique state-wide resource, but also has the proud distinction of being the largest university-operated system of natural reserves in the world. Figure 1. provides an overview of the sites in the UC NRS. (For more detail see http:\\nrs.ucop.edu.)

UC Berkeley: 1. Angelo Coast 2. Chickering American River 3. Hastings Natural History Reservation 4. Jenny Pigmy Forest

UC Davis: 5. Bodega Marine Reserve 6. Eagle Lake 7. Jepson Prairie 8. McLaughlin 9. Quail Ridge 10. Stebbins Cold Canyon

UC Irvine: 11. Burns Piñon Ridge 12. San Joaquin Freshwater Marsh

UC Los Angeles: 13. Stunt Ranch

UC Riverside: 14. Box Springs 15. Boyd Deep Canyon 16. Emerson Oaks 17. James San Jacinto Mountains 18. Motte Rimrock 19. Sweeney Granite Mountains

UC San Diego: 20. Dawson Los Monos Canyon 21. Elliot Chaparral 22. Kendall-Frost Mission Bay Marsh 23. Scripps Coastal Reserve

UC Santa Barbara: 24. Coal Oil Point 25. Carpenteria Salt Marsh 26. Santa Cruz Island 27. Sedgwick Reserve 28. Sierra Nevada Aquatic Research Lab (SNARL) 29. Valentine Camp

UC Santa Cruz: 30. Año Nuevo Island 31. Ford Ord 32. Landels-Hill Big Creek 33. Younger Lagoon

 

Figure 1. Map of reserves in the University of California Natural Reserve System (as of May 2000). Note that the site numbers in this map are used for reference throughout this report.

From Individual Stations to a Coherent Research Network -- Challenges and Opportunities

Most NRS field sites were chosen on the basis of the quality and type of ecosystems they represent, attributes that often favor sites in remote locations with poor infrastructure. While this situation is not in itself an obstacle to continued use by dedicated researchers and students, it does complicate communication and collaborative work among individuals and groups working at different sites, often for long periods during field seasons.

In addition, the cross-site and cross-institutional dispersion of the hundreds of researchers, students and members of the public that use the NRS individually or in small groups, enormously complicates the task of a integrating the individual site facilities into a coherent system with the capability of data and information discovery, retrieval and exchange.

To meet the these challenges, the reserve managers at their 1999 annual meeting focused on a set of medium and long-term goals in the development of system-wide information management capabilities that include development of a system-wide data policy, metadata conventions and standards, long-term data-archiving strategy, an inter-site user database and an all-site catalog of data sets.

The NRS information technology developer is leading the efforts in planning and implementing solutions that will help the NRS meet these goals.

 

Networking and Information Management in the Wider Ecological Research Community

The situation of field researchers and students widely dispersed both geographically and over multiple institutions is not unique to the NRS, and has long been recognized as a crucial factor capable of impeding cross-site and interdisciplinary collaboration. To overcome these obstacles, the National Science Foundation in 1980 began funding for the Long-Term Ecological Research (LTER) Network, which has now grown to 20 sites in the continental U.S., Alaska, Antarctica and Puerto Rico (see http://LTERnet.edu). Although data management at the network level was not a priority in the early phase of LTER development, this area has gained increasing prominence since the late 1980s and information technology has played a crucial role in transforming the original set of individual LTER sites into a coherent research network.

The success of the U.S. LTER has caught the attention of the international ecological research community and has led to the formation in 1993 of the International LTER network, an association of 19 national LTER networks world-wide (as of December 1999).

Other groups in the ecological research community are also making efforts to improve their capabilities to exchange data and information among their researchers nation-wide (e.g. Organization of Biological Field Stations http://www.obfs.org/).

 

Survey Results

This report summarizes a survey in which the NRS sites were requested to answer a set of questions about the basic infrastructure critical to site and system-wide information management (Appendix B). The questionnaire was kept minimal so as not to unduly burden the reserve personnel, and thus allow a short turn-around time. This was intended to be an initial assessment to be followed by a more finely tuned survey aimed at obtaining more detail and information on site capabilities, such as detailed descriptions of site data (metadata), software and data archiving capabilities.

 

Network Connectivity (Internet and Local Area Network)

Because the NRS comprises hundreds of researchers and students that are widely distributed geographically and across institutional boundaries, system-wide data and information exchange requires the existence of a functional network infrastructure. Internet technology (TCP/IP) has been widely used in academic and engineering environments since the mid-1980s, both for local-area (LAN) and wide-area (WAN) networking, and has reached mainstream commercial and consumer markets since the late 1990s. Until recently, however, the remoteness of many NRS field locations has made it too difficult or impossible for NRS sites to establish Internet connectivity at a reasonable cost. This situation is reflected in Figure 2.

Figure 2. Percentage of sites having a particular type of Internet connection.

 

Nearly half of the NRS sites have no Internet connection at all, and an additional quarter of the sites only have a simple modem connection. Modem connections over phone lines are slow (max. 56 Kbps), have a single, session-dependent computer address (IP address), and are cumbersome to establish.

ISDN, while marginally faster (128 Kbps, or 2x128 Kbps) and somewhat more convenient in its auto-dial capabilities, has the same limitations of a single address per link and a session-dependent computer address.

(I did not ask about the more recent cable modems, but subsequent communication with the sites shows that no site uses this technology).

Only 11%, or three sites, have either a T1 or frame relay connection. T1 is capable of data rates of 1.54 Mbps (for TCP/IP or 24 high-quality phone lines at 56 Kbps), remains always connected, and can be used with readily available multiplexor devices to transmit Internet packets and voice calls simultaneously.

Frame relay connections use a high-speed packet switching protocol for wide area networking (WAN). Like T1 connections, they are direct connections (permanent virtual circuits that don't require dial-up). In addition, they provide the flexibility of a granular service from 128 Mbps up to 45 Mbps, which has made them very popular for LAN to LAN connections across remote distances. Services are offered by all the major carriers. Neither T1 nor frame relay connections impose any restrictions on the number of computer addresses available on the LAN of the field station.

Figure 3 juxtaposes the number of overnight visitors and the type of connection for the NRS sites. Out of the five sites with the highest number of overnight visitors, two have a T1 connection, 2 a simple modem connection only, and one has no connection at all.

 

Figure 3. Type of Internet connection and overnight visitors.

(For site numbers see Fig. 1)

Clearly, the sites with high overnight visitor numbers would greatly benefit from upgraded direct connections to the Internet. If the connections are of sufficient capacity, those visitors may barely notice any difference between field site and university campus when using network services (such as file transfers, remote database access, Web access, and e-mail).

Sites with lower overnight user numbers will also see benefits from better Internet connections by attracting new users, or repeat users for longer-term studies.

Approximately 30% of sites report that they have a Local Area Network (LAN) for use within or between site buildings. While LANs are useful in themselves for sharing on-site resources, they become a necessity with direct Internet connections, allowing all site computers and other resources to be simultaneously connected to the Internet by connecting via a single gateway computer that connects the LAN to the outside world. Thus, in addition to our efforts to improve Internet connectivity of the sites, we also need to target resources toward improvement of internal site connectivity.

 

 

Computing Equipment

The number of computers at the sites, as shown by Fig. 4, appears to be low for most sites with substantial numbers of overnight visitors. A partial solution to this bottleneck, other than merely increasing the number of on-site machines, can be the provided by a LAN connected to the Internet. Visitors can then bring their own computing equipment, most likely in the form of laptop computers, and thus use site network resources as well as network resources at their home institutions and elsewhere on the global Internet.

To the extent that relatively low numbers of on-site computers (especially high-end systems in terms of disk capacity, disk access speed, memory, CPU performance) reflect on essential site capabilities, we must be aware that these capabilities are central to any robust information management system with long-term viability, and we must find appropriate resources to improve this situation.

 

Figure 4. Number and type of on-site computers and overnight visitors

(For site numbers see Fig. 1)

 

Presently the most powerful machines at those sites with on-site computers are either PC's (Windows 98/NT) or Macs (MacOS), as shown in Figure 5. The need for other systems (UNIX and Linux, a form of UNIX) is likely to increase as sites connect to the Internet via high-speed direct connections. This is so because until recently reliable server software for common Internet services has been available mostly for UNIX systems (email, domain name and routing services, etc.).

Also, with increased numbers of overnight users, and the corresponding increase in the number of research projects centered on or near a site that overlap in space and time, local site needs for geo-spatial data management become more acute. This is usually accompanied by an increased demand for more powerful on-site computing capabilities, including higher CPU speeds, increased file space, specialized peripherals, etc.

Figure 5. Type of most powerful computer on site

Geographic Information Systems (GIS), specialized software packages for management of geographically explicit data, have become more common at a number of biological field stations. In the NRS, we need to be conscious of this trend and seriously consider GIS systems geospatial data management capabilities here multiple research projects exist that overlap in space and time.

Information Management Budgets

Estimated site budgets for NRS site information management (IM) range from $0 to a maximum of $25,000. While a zero-budget may appear justifiable in the case of sites that have zero or very few overnight visitors, even the highest amount of $25K per annum does not seem adequate to cover the typical IM costs of an ecological field station with multiple research projects and hundreds of overnight users. For comparison, the average LTER site spends 15% of the total site budget on information management; for current LTER sites that would be about $85,000 to $105,000 (including overhead and fringe). While typical NRS sites in the near future cannot realistically expect to have this level of IM funding, we must be aware of the limitations this situation imposes and we must attempt to find new ways of increasing the resources available to NRS site for information management.

Figure 6 Site information management budgets.

(For site numbers see Fig. 1)

 

Data Collection at the Sites

The subject of what data are collected by resident personnel and visitors at NRS sites is the focus of the NRS Catalog of Data Sets (in planning stage). Therefore, the survey questionnaire had only one question relating to this topic, asking about the amount of data stored at the sites. I received responses to this question from half of the sites only, and the answers ranged from zero to 100,000 GByte (the latter figure including remotely sensed images).

Unfortunately these figures are not very revealing, considering that a single remotely sensed image may occupy several hundred MBytes (e.g. Landsat thematic mapper), and that the data volume of simple point data sets may be as small as a few tens of KBytes.

The descriptions of the data collected at all sites that will be available with the completion of the Catalog of Data Sets will provide much more detailed and useful information.

Most sites do keep a list of research projects and a site bibliography in a variety of formats. These formats are summarized in Figs. 7 and 8.

Figure 7. Formats in which sites keep the site bibliography data.

 

 

Figure 8. Format in which sites keep their list of research projects.

Presently most sites maintain their site bibliography in a software package specialized for this purpose (Endnote) or keep it in a word processor (MS Word) file,

while they keep their project lists in a word processor (MS Word) file or in hardcopy form.

I also asked the sites for information on how their actual research data are accessed (if data are maintained at the site). A majority of sites, have access to hardcopies only, and only one site makes data accessible via the Web. Fig. 8 summarizes the survey results for this question.

Figure 9. Method by which sites access data.

 

Acknowledgements

This work report would not have been possible without the help of the reserve managers, directors and numerous other people at the NRS sites.

Norman Wang, while at NCEAS under the NSF Research Education for Undergraduates, helped compile the material for this report, design the database containing the results and produce the graphs.

I express my thanks and appreciation to all who helped and participated in making this survey possible.

 

Appendices

Appendix A - Glossary

(To be completed)

Backbone (as in cmpus network backbone) The central "cable" of a network. Branch cables off the backbone connect to computers or other networks (LANs).

Frame Relay A high-speed packet switching protocol used in wide area networks (WANs). It is a telecommunication service designed for cost-efficient data transmission for intermittent traffic between local area networks (LANs) and between end-points in a wide area network (WAN).

GIS Geographic Information System, any digital mapping system used for management and manipulation of spatially explicit data. Local site data can be combined and "overlayed" with digital maps obtainable from the U.S. Geological Survey and other organizations covering most of the world.

Internet The collection of networks world-wide that use the TCP/IP protols and operate as a single cooperative wide-area network (WAN).

IP address Internet Protocol address The address of a computer attached to a TCP/IP network. Every client and server station on the global Internet (or if not connected to the Internet then on the LAN) must have a unique IP address. Client workstations have either a permanent address or one that is dynamically assigned to them each dial-up session (modem or ISDN). IP addresses are written as four sets of numbers separated by periods; for example, 128.48.5.249 is the address of the NRS Web server computer . Most often domain names are used instead of IP addresses, e.g. nrs.ucop.edu instead of 128.48.5.249, and programs called name servers translate between the two.

ISDN Integrated Services Digital Network is a set of international standards for digital transmission over ordinary telephone copper wire as well as over other media. Home and business users who install ISDN adapters (in place of their modems) can establish a higher-speed Internet connection (up to 128 Kbps. ISDN requires adapters at both ends of the transmission so your access provider also needs an ISDN adapter. ISDN is generally available from your phone company in most urban areas in the United States and Europe.

Kbps Kilobits per second. One thousand bits per second. Kbps is used as a rating of slow transmission speed. Upper case "B" in KBps means kilobytes per second, but "b" for bit and "B" for byte are not always followed and often misprinted. KBps or KB/s would be used for earlier disk and tape transfer ratings as data is transferred in parallel, not serial

LAN Local Area Network, a communications network that serves users within a confined geographical area, for example a building or a reserve site. It is made up of servers, workstations, a network operating system and a communications link. If a LAN "speaks" the TCP/IP protocol, then connecting it to the Internet through a frame relay, T1, ISDN or other connection can extend the LAN services over wide geographic areas (global WAN).

Linux A version of UNIX that runs on x86, Alpha and PowerPC machines. Linux is open source software, which is freely available; however, the full distribution of Linux along with technical support and training are available for a fee from vendors such as Red Hat Software (www.redhat.com) and Caldera (www.caldera.com). The distribution CD-ROMs include the complete source code as well as hundreds of tools, applets and utilities.

LTER Long-Term Ecological Research Network. A network of 20 ecological research sites, mostly many of them field stations, funded by the National Science foundation for renewable periods of 6 years with an approximate budget of $600K per year (as of 1999). See http://lternet.edu.

Mbps Megabits Per Second One million bits per second. Mbps is commonly used as a rating of transmission speed. Upper case "B" in MBps means megabytes per second, but using "b" for bit and "B" for byte is not always followed and often misprinted. MBps or MB/s would be used for disk and tape transfer ratings as data is transferred in parallel, not serial

 

modem (as in simple modem connection) MOdulator-DEModulator, a device that adapts a terminal or computer to an analog telephone line by converting digital pulses to audio frequencies and vice versa. The term usually refers to 56 Kbps modems (V.90), the current top speed, or to older 28.8 Kbps modems (V.34). The term may also refer to higher-speed cable or DSL modems or to ISDN terminal adapters, which are all digital and technically not modems.

T1 A 1.544 Mbps point-to-point dedicated, digital circuit provided by the telephone companies. The monthly cost is typically based on distance. T1 lines are widely used for private networks as well as interconnections between an organization's PBX or LAN and the telco. The first T1 line was tariffed by AT&T in January 1983. However, starting in the early 1960s, T1 was deployed in intercity trunks by AT&T to improve signal quality and make more efficient use of the network.

TCP/IP Transmission Control Protocol / Internet Protocol is the basic communication language or protocol of the Internet, and today it is also used as the major communications protocol in the private networks called intranets. When you are set up with direct access to the Internet, your computer is provided with a copy of the TCP/IP program just as every other computer that you may send messages to or get information from also has a copy of TCP/IP.

UNIX A multi-user, multitasking operating system that is widely used as the master control program in workstations and especially servers. A myriad of commercial applications run on UNIX servers, and most Web sites run under UNIX. There are many versions of UNIX, and, except for the PC world, where Windows dominates, almost every hardware vendor offers it either as its primary or secondary operating system. Sun has been singularly instrumental in commercializing UNIX with its Solaris OS (formerly SunOS). HP, SCO, IBM and Digital have also been major UNIX vendors and promoters.

WAN Wide Area Network. A communications network that covers a wide geographic area, such as state or country. A wide area network may be privately owned or rented, but the term usually connotes the inclusion of public (shared user) networks. The Internet is the "ultimate" LAN, comprising multiple WANs, which in turn comprise multiple LANs.

 

 

Appendix B - Survey Forms

The form was kept deliberately short so that providing the information would not be an undue burden on the sites, and returns would be available in a relatively short time. Part 1 of the form deals with general questions relating to information technology and connectivity. Part 2 asks additional details about resources available both on site and off. The survey was intended to provide baseline information only. The intent is to follow up this basic assessment periodically with more finely tuned, very short questionnaires aimed at obtaining more detail and information on specific site capabilities, e.g. software and data archiving capabilities. Also, a separate survey of what kind of data is collected at the sites, presently in the planning stage, will provide detailed descriptions of site data for an all-site catalog of data sets.

 

P A R T 1

1) NRS site name and institutional affiliation:

2) Name, e-mail address, job title and phone number of the person completing the survey:

3) Approximately how many permanent/seasonal staff use these NRS sites?

4) What proportion of these staff is targeted to computing or information support?

5) Roughly how many overnight visitors do you have annually at the site?

 

I N F O R M A T I O N T E C H N O L O G Y

6) Please briefly describe what you perceive as the most critical issues

involving information technology at your NRS site? (network connectivity,

database management, etc.)

7) Please provide a rough estimate of the annual budget for information technology at your site (new purchases, maintenance and support of computer and networking hardware, software; management of research and administrative data)

R E S E A R C H D A T A (and other data)

8) Do you currently store research data at your site? ___ Yes ___No

If Yes,

9) Can you provide a rough estimate of the total amount of data stored at

your site (MBytes)?

10 a) Does your site have a bibliography (publications), if so what format is it in (e.g. Endnote, plain text, Microsoft Word, etc.)

10 b) Does your site have a list of (ongoing and past research projects), if so what format is it in (e.g. plain text, Microsoft Word, etc.)

 

I N T E R N E T C O N N E C T I V I T Y

11) Does your site have a connection to the Internet? ___Yes ___No

If Yes,

___ simple modem connection over phone lines

___ cable modem

___ DSL

___ frame relay

___ T1

___ ISDN

___ Other or don't know

12) Does your site have a local network connection (LAN) connecting its computers?

13) Does your site have personnel with specialized skills in data management and/or electronic networking (data manager, computer/network manager)? If so, please provide the contact addresses (incl. e-mail address, phone and FAX numbers, if available)

14) Does your site have a reliable phone line that could be used for dial-up connections?

___Yes ___No

15) Does your site have grid power? ___Yes ___No

 

 

P A R T 2

The following questions relating to Internet connectivity might best be answered by the specialized personnel (data manager, computer/network manager), if existent:

16) Is there a network support contact person

___ at your affiliated institution (UC campus)? Please provide the name, phone number and address.

___at your site? Please provide the name, phone number and address.

 

___elsewhere off your site? Please provide the name, phone number and address.

 

17) Who is the computing services contact at your affiliated institution (UC campus)? Please provide the name, phone number and address?

18) Is there computing system administration support available in your institution?

Please check the following:

Support is available for

___ Windows 98

___ Windows NT

___ Apple/Mac

___ UNIX

19) Estimate the total number of computers at your research site:

___ PC

___ Mac

___ UNIX

___ other

20) What is the most powerful system at your site -- PC, Mac, UNIX (or other). Please briefly describe it (is it a file server, email server, database server, Web server, etc.)