Nation-wide Caching Project in Korea - Design and Experiment

Jaeyeon Jung and Kilnam Chon
Korea Advanced Institute of Science and Technology

1. Introduction

Information provisioning mechanisms, such as web caching and replication have been becoming increasingly important in the computer networking system. This has been driven by the explosive growth of web traffic and unbearable wait for Web pages. Effective caching system can reduce retrieval latency, network load, and server load by avoiding pulling the same document several times over the same connection.[DESIRE97] In addition, Web caching system which is capable of cooperating with other servers has great benefits from scalability points of a view. Though cooperating cache servers have to suffer from difficulties in configuration such as selection of neighbors or agreement on setting lifetimes of objects in the cache, they can share load and increase hit ratio.

For these reasons, there has been rapid growth in the demand for Web caching that has led to many caching projects over the past few years. NLANR, starting from 1995, is making experiments and development with scalable caching system to deploy a prototype information provisioning infrastructure. [NLANR97] The high cost of international bandwidth has pushed New Zealand to adopt Web caching earlier than most other countries.[Neal96] The large disparity in bandwidth between international and domestic, has made effective caching system more important in the United Kingdom.[Smith96] In Korea, we are planning a scalable and high performance caching system across strategic network locations, which will be served as "Nation-wide Caching Infrastructure". By establishing our own and linking it with other countries' (AP/Global cache coordination), network bandwidth consumption and latency we have suffered, especially on international links, will be decreased significantly.

Current caching system configuration with interconnection topology and hardware and software equipment will be described in section 2. Cache server statistics including hourly connection request rate, average connection request count and amount of bytes transferred, hit ratio and content-type distribution will be presented in section 3. Section 4 concludes the paper by presenting some design issues and problems of caching systems.

2. Cache configuration in Korea

2.1 Interconnection topology of cache servers

Figure 1: caching interconnection topology in Korea

The root cache, cache.kix.net is located on KIX/KT which is one of the major exchange points in Korea. It has a mutual parent relationship with sv.cache.nlanr.net in U.S and use it.cache.nlanr.net for specific domains in Europe. The institutions and ISPs with direct links to the cache.kix.net configure it as a parent. Neighbor relationship, however, depends on management policies of each cache server. Some ISPs don't join caching hierarchy because a root cache could be a single point of failure.

2.2 Hardware and software platform of cache servers

All the institutions and ISPs in a hierarchy operate cache servers using Squid, which offers fast and efficient Web caching. Internet Cache Protocol (ICP) used between caches provides a good mechanism for creating cooperative caching system. Hardware system of each cache server is presented in Appendix A .

3. Data collection and analysis

3.1 Data sets

data sets
starting date
duration
bytes served
cache.kix.net
6-Apr-97
7 days
24.7G bytes
cache.kaist.ac.kr
6-Apr-97
7 days
19.8 G bytes

Table 1: data sets of WWW object requests to the cache server

Table 1 lists the data sets used in this study, including the starting date, duration and amount of data collected at each site . We gathered log files from two cache servers, one (cache.kix.net) at the top level in national hierarchy and the other (cache.kaist.ac.kr) at the second level. The root cache (cache.kix.net) operated by Korea Telecom serves 6 child caches as well as individual users. The cache.kaist.ac.kr is at KAIST which is a research university located in Taejon.

These data sets help us compare the behavior between a top level cache and a lower level cache. We can also analyze them to get performance data such as the number of connection requests per hour, count based hit ratio, and the distribution of requested content-type.

3.2 Analysis

3.2.1 Daily cache server usage

protocol
cache.kix.net
cache.kaist.ac.kr
requests
HTTP
283,569
139,030
ICP
198,559
166,008
bytes transferred
HTTP
3,532,562
2,827,907
(in KB)
ICP
16,741
115,639

Table 2: daily statistics for number of requests and bytes transferred

Table 2 provides daily usage statistics. cache.kaist.ac.kr serves about 650 users (counted on IP address), 305 K requests and transfers 2.9 G bytes a day. However, not having as many users as cache.kaist.ac.kr, cache.kix.net processes requests about 1.3 times more than cache.kaist.ac.kr because of 6 cache servers connected as children ,

3.2.2 Hourly request pattern

The connection request pattern is dominated by a 24-hour pattern, as has been widely observed before.[Paxon93] Figure 2 is the plot of the hourly connection request of cache.kaist.ac.kr data sets. It shows that connection requests occur mostly during office hours and substantial renewal in the evening hours, when most of the students do research actively. It often causes performance degradation during the peak-time.

Figure 2: mean hourly connection requests (cache.kaist.ac.kr)

3.2.3 Daily hit ratio

Figure 3: hit ratio over HTTP port

Figure 4: hit ratio over ICP port

Figure 3 indicates that total savings on the number of connections to the original server are 53% and 48% for the requests on the HTTP port in the cache.kix.net and cache.kaist.ac.kr respectively. Considering that IMS_HIT and REFRESH_HIT refer to the successful staleness check using short If-Modified-Since header, it saves redundant bandwidth consumption for about 20% of the total connection requests a day. However, figure 4 shows quite different hit ratio for the requests over ICP port. cache.kix.net has copies of cached objects stored in child caches since it is at the top level in a hierarchy and 6 child caches resolve miss through it. Caches at the top level showed higher hit ratio over ICP requests than those at a lower level. It might be partly because that the objects which have been already removed from child caches are still kept in the top level cache where it has much more disk storage.

3.2.4 Distribution by Types

Due to the easiness of incorporating multimedia data into a Web documents, the high degree of variation in object-type is observed in Web caches.[Williams96][Smith96] Figure 5 displays content-type distribution. As shown in figure 5, large video and audio files (audio/x-mpeg, video/mpeg) show less than 1% of requests, but represent 45% of bytes transferred due to their large file sizes. Thus large video and audio files could purge many smaller objects in the limited size of a cache. Therefore, we need to compromise between high hit ratio and fast response time to get better performance.


Figure 5: distribution by Types (cache.kaist.ac.kr)


4. Concluding Remarks

At present, we are greatly interested in developing national and global Web caching system to reduce client latency, server load and network load. However, meeting these ends requires much more than just building the infrastructure of Web cache meshes, hierarchies or a single server. It needs efficient Web caching system that provides cached objects which are less stale in a short response time.

National caching system in Korea we describe in this paper addresses these requirements with following configuration and management issues.

We will investigate these issues in the national caching system and plan to coordinate with APAN. APAN will promote the deployment of caching hierarchies in Asia-Pacific region and link these hierarchies to the rest of the world.[APAN97]

Reference

[APAN97]
APAN Network Design Memo, version 0.5. Available from http://www.apan.net

[Arlitt96]
M. Arlitt, C. Williamson, Web Server Workload Characterization: The Search for Invariants, Proceedings of SIGMETRICS 96, 1996.

[DESIRE97]
DESIRE, Web Caching Architecture. Available from http://www.uninett.no/prosjekt/desire/arneberg/altsammen.html

[Dingle96]
A. Dingle, Web Cache Coherence, Proceedings of the 5th WWW Conference, 1996.

[KAIST97]
Nation-wide Caching Project in Korea, http://cache.kaist.ac.kr

[KIX97]
Korea Cache Project Homepage, http://cache.kix.net

[Nabeshima97]
M. Nabeshima, The Japan Cache Project: An Experiment on Domain Cache, Proceedings of 6th WWW Conference, 1997.

[Neal96]
D. Neal, The Harvest Object Cache in New Zealand, Computer Networks and ISDN systems, Vol. 28, p.1415., 1996.

[NLANR97]
National Laboratory for Applied Network Research, A Distributed Testbed for National Information Provisioning, 1997, http://nlanr.net/Cache

[Paxon93]
V. Paxon, Empirically-derived analytic models of wide-area TCP connections: extended report, Technical report LBL-34086, Lowrence Berkely Laboratory, May, 1993.

[Smith96]
N. Smith, The UK National Web Cache - A State of the Art, Computer Networks and ISDN systems, Vol. 28, p.1407, 1996.

[Williams96]
S. Williams, M. Abrams, Removal Policies in Network Caches for World-Wide Web Documents, Proceedings of SIGCOMM'96, 1996.

Appendix A. Hardware and software equipment of Web cache servers in Korea

(As of 1997. 4.24)

machine main memory/ disk software
cache.kix.netDigital Alpha 4100 1G/20Gsquid/1.1.2
cache.nic.or.krSunServer 1000 512M/48Gsquid/1.1.2
cache.kaist.ac.krPentium/Linux 2.07 128M/8Gsquid/1.1.5
spm.kotel.co.krDigital Alpha 2100 128M/15Gsquid/1.1.8
proxy.kren.nm.krR6000 AIX 4.1.3 128M/8Gsquid/1.0.8
kiwi.interpia.netSun SPARC 20 256M/1Gsquid/1.0.22
cache.hansol.netDigital Alpha 200 128M/3Gsquid/1.0.18
proxy.elim.netPentium 128M/8Gsquid/1.1.5
proxy.nuri.netSun Ultra-1 128M/8Gnetscape proxy
proxy.nowcom.co.kr Sun Ultra-2512M/20G squid/1.0.20