Web Caching Workshop The Web Caching Workshop was held in at the National Center for Atmospheric Research (NCAR) in Boulder, CO June 9-10, 1997. The event was sponsored by the National Laboratory for Applied Network Research (NLANR), with funding from the National Science Foundation (NSF). Approximately 45 individuals attended representing 13 countries. Key topics explored during the workshop: * status of cache deployment globally * topology and configuration alternatives * available cache software * economics/pricing considerations * research requirements * policy and related issues Caching: International Use The factors driving use of caches include the high cost of intercontinental bandwidth and high latencies in retrieving data from the U.S. Statistics for European users indicate that more than 60% of hits are for .COM and more than 75% of .COM sites are based in the U.S. Caching hierarchies are well developed in several European countries including Poland, Germany, Armenia ... and are viewed as critical for countries where low bandwidth connections are the norm, e.g., Russia with its numerous 19.2 KB links. The Terena COM-MESH Experiment was initiated among participating European R&D networks in response to concerns about the high latency of retrieving data from the U.S., as well as the high cost (see www.terena.nl). In Asia, caching plays a vital role in improving performance of intercontinental traffic links for New Zealand and Australian users. Caching hierarchies are active in Korea and Singapore and are planned for Japan. The Asia Pacific Advanced Networking (APAN) initiative includes plans to deploy a caching hierarchy throughout the Asia Pacific region by 1998 with the root caches based in Japan and Korea. Caching in the US NLANR Global Cache Hierarchy Project - This project is supported both as a task under NSF's National Laboratory for Applied Network Research (NLANR) and through a stand-alone grant from NSF and equipment donations from Digital. It includes the operation of six root caches in the U.S. at the five NSF-supported supercomputing centers and at FIX-West -- each with 24 GB disk and 256 MB RAM. Usage ranges around 3 million http request daily (35 GB), with hit rates in the order of 20-30%. These rates are low compared to institutional caches, such as Japan's cach.imnet.ad.jp, which often experience hit rates on the order of 75%. Currently there are 150-200 (mostly foreign) client caches using these root caches. NLANR is also leads the development of the public Squid cache software. Squid users worldwide contribute to the development effort with critical feedback and actual code/patches. Workshop presentations included those comparing squid performance aspects with other alternatives such as NetCache, Apache, CERN's HTTPD, and a new European caching service offered by Mirror Image. Participants conceded that all of these products are in their early stages of development. Most caching software as yet lacks sufficient features for authentication/security, hit metering, and HTTP 1.1 protocol conformance. Educational Uses - Caching significantly improves the ability of teachers to use the Internet as an education tool, by addressing schools what are typically low or intermittent bandwidth environments at schools, and also allowing content filtering. The Tennessee Project is developing inexpensive ($1,250) turnkey cache boxes for K-12 schools. Washington State is using a caching hierarchy as part of its Internet expansion to 296 school districts, i.e., 6 parent caches at the Seattle hub and two caches per end site. Commercial Environment - Caching has been integrated into the topologies of major U.S. providers such as @Home, MicroSoft, and AOL, and is of growing importance for providers specializing in high-bandwidth static (e.g., gif or other graphics-based) traffic. Caching is also critical in the delivery of services such as the Alta Vista search engine. Economics/Pricing Quantifying the costs and benefits of cache usage is still difficult, particularly in the U.S. where bandwidth is relatively inexpensive. Factors influencing the economics of deploying caches include: bandwidth costs, administrative costs, user support issues, topology / cache hierarchy considerations, request rates, user base (e.g., more useful if homogenous), and occurrences of hot sets and thrashing. Generally, participants agreed that charging schemes should be based on resource consumption costs and should provide incentives for use of the cache, however, with the exception of New Zealand, there are few examples of actual usage-based charging for cache services. In New Zealand, changes since January 1996, with services becoming available from multiple international link operators and flat-rate bandwidth pricing structures becoming common, have undermined the NZ Internet Exchange's ability to bill for cache services on a per-megabyte basis. Research, Operational & Policy Issues Research challenges addressed by workshop presenters include: delta encoding for transmission of changed material (vs. retransmissions of entire content); building PICs awareness into proxies; developing multicast communications among clustered caches, automating cache discovery/selection; benchmarking proxy servers; as well as issues relating to impact of persistent connections, effects of ICP and TCP implementations on cache performance and the implications of HTTP 1.1 on cache consistency. Additional areas where research is needed include: push technologies locating the nearest copy of an object or resource easing the burden of configration while maintaining security economic models better replacement policies Operational considerations were discussed throughout meeting, focusing on optimal topology configurations, i.e., levels of hierarchy, placement in the infrastructure, load balancing, clustering, and requisite redundancy; preferred cache configurations, i.e., RAM, disk space, refresh rates, and link speeds; and usage/performance statistics. Policy issues covered included: cache busting and stripping of cookies; privacy considerations associated with cache logs (particularly customer usage patterns); copyrights and related issues of 'implied consent' for caching by virtue of posting materials on the Internet; content filtering and PICs labeling. There were also pleas by participants for organizations to standardize their cache log formats and make these logs available to researchers. NLANR currently posts logs for the most recent seven days on its website (www.nlanr.net/Cache) and Digital makes some logs available to researchers. Participants expressed concern, however, about the possibility of optimizing caches based on such a limited sampling. There was also consensus on the need for a single repository of public trace information along the lines of the Internet traffic archive, www.ita.org. The Boulder workshop was a follow-on to the 1996 Caching Workshop held in Warsaw, Poland (http://w3cache.icm.edu.pl/workshop/). Additional details on this year's workshop are available at: www.nlanr.net/Cache/Workshop97/minutes.html or from the NLANR Cache PI, Duane Wessels at wessels@nlanr.net.