1. Introduction
Information provisioning mechanisms, such as web
caching and replication have been becoming increasingly important
in the computer networking system. This has been driven by the
explosive growth of web traffic and unbearable wait for Web pages.
Effective caching system can reduce retrieval latency, network
load, and server load by avoiding pulling the same document several
times over the same connection.[DESIRE97] In addition, Web caching
system which is capable of cooperating with other servers has
great benefits from scalability points of a view. Though cooperating
cache servers have to suffer from difficulties in configuration
such as selection of neighbors or agreement on setting lifetimes
of objects in the cache, they can share load and increase hit
ratio.
For these reasons, there has been rapid growth in
the demand for Web caching that has led to many caching projects
over the past few years. NLANR, starting from 1995, is making
experiments and development with scalable caching system to deploy
a prototype information provisioning infrastructure. [NLANR97]
The high cost of international bandwidth has pushed New Zealand
to adopt Web caching earlier than most other countries.[Neal96]
The large disparity in bandwidth between international and domestic,
has made effective caching system more important in the United
Kingdom.[Smith96] In Korea, we are planning a scalable and high
performance caching system across strategic network locations,
which will be served as "Nation-wide Caching Infrastructure".
By establishing our own and linking it with other countries' (AP/Global
cache coordination), network bandwidth consumption and latency
we have suffered, especially on international links, will be decreased
significantly.
Current caching system configuration with interconnection
topology and hardware and software equipment will be described
in section 2. Cache server statistics including hourly connection
request rate, average connection request count and amount of bytes
transferred, hit ratio and content-type distribution will be presented
in section 3. Section 4 concludes the paper by presenting some
design issues and problems of caching systems.
2. Cache configuration in Korea
2.1 Interconnection topology of cache servers
The root cache, cache.kix.net is located on KIX/KT which is one of the major exchange points in Korea. It has a mutual parent relationship with sv.cache.nlanr.net in U.S and use it.cache.nlanr.net for specific domains in Europe. The institutions and ISPs with direct links to the cache.kix.net configure it as a parent. Neighbor relationship, however, depends on management policies of each cache server. Some ISPs don't join caching hierarchy because a root cache could be a single point of failure.
2.2 Hardware and software platform of cache servers
All the institutions and ISPs in a hierarchy operate
cache servers using Squid, which offers fast and efficient Web
caching. Internet Cache Protocol (ICP) used between caches provides
a good mechanism for creating cooperative caching system. Hardware
system of each cache server is presented in Appendix A .
3. Data collection and analysis
3.1 Data sets
Table 1 lists the data sets used in this study, including
the starting date, duration and amount of data collected at each
site . We gathered log files from two cache servers, one (cache.kix.net)
at the top level in national hierarchy and the other (cache.kaist.ac.kr)
at the second level. The root cache (cache.kix.net) operated
by Korea Telecom serves 6 child caches as well as individual users.
The cache.kaist.ac.kr is at KAIST which is a research university
located in Taejon.
These data sets help us compare the behavior between
a top level cache and a lower level cache. We can also analyze
them to get performance data such as the number of connection
requests per hour, count based hit ratio, and the distribution
of requested content-type.
3.2 Analysis
3.2.1 Daily cache server usage
Table 2 provides daily usage statistics. cache.kaist.ac.kr
serves about 650 users (counted on IP address), 305 K requests
and transfers 2.9 G bytes a day. However, not having as many users
as cache.kaist.ac.kr, cache.kix.net processes requests
about 1.3 times more than cache.kaist.ac.kr because of
6 cache servers connected as children ,
3.2.2 Hourly request pattern
The connection request pattern is dominated by a 24-hour pattern, as has been widely observed before.[Paxon93] Figure 2 is the plot of the hourly connection request of cache.kaist.ac.kr data sets. It shows that connection requests occur mostly during office hours and substantial renewal in the evening hours, when most of the students do research actively. It often causes performance degradation during the peak-time.
3.2.3 Daily hit ratio
Figure 3 indicates that total savings on the number
of connections to the original server are 53% and 48% for the
requests on the HTTP port in the cache.kix.net and cache.kaist.ac.kr
respectively. Considering that IMS_HIT and REFRESH_HIT refer
to the successful staleness check using short If-Modified-Since
header, it saves redundant bandwidth consumption for about 20%
of the total connection requests a day. However, figure 4 shows
quite different hit ratio for the requests over ICP port. cache.kix.net
has copies of cached objects stored in child caches since
it is at the top level in a hierarchy and 6 child caches resolve
miss through it. Caches at the top level showed higher hit ratio
over ICP requests than those at a lower level. It might be partly
because that the objects which have been already removed from
child caches are still kept in the top level cache where it has
much more disk storage.
3.2.4 Distribution by Types
Due to the easiness of incorporating multimedia data
into a Web documents, the high degree of variation in object-type
is observed in Web caches.[Williams96][Smith96] Figure 5 displays
content-type distribution. As shown in figure 5, large video and
audio files (audio/x-mpeg, video/mpeg) show less than 1% of requests,
but represent 45% of bytes transferred due to their large file
sizes. Thus large video and audio files could purge many smaller
objects in the limited size of a cache. Therefore, we need to
compromise between high hit ratio and fast response time to get
better performance.
4. Concluding Remarks
At present, we are greatly interested in developing national and global Web caching system to reduce client latency, server load and network load. However, meeting these ends requires much more than just building the infrastructure of Web cache meshes, hierarchies or a single server. It needs efficient Web caching system that provides cached objects which are less stale in a short response time.
National caching system in Korea we describe in this paper addresses these requirements with following configuration and management issues.
We will investigate these issues in the national
caching system and plan to coordinate with APAN. APAN will promote
the deployment of caching hierarchies in Asia-Pacific region and
link these hierarchies to the rest of the world.[APAN97]
Reference
Appendix A. Hardware and software equipment of Web cache servers in Korea
(As of 1997. 4.24)
| machine | main memory/ disk | software | |
| cache.kix.net | Digital Alpha 4100 | 1G/20G | squid/1.1.2 |
| cache.nic.or.kr | SunServer 1000 | 512M/48G | squid/1.1.2 |
| cache.kaist.ac.kr | Pentium/Linux 2.07 | 128M/8G | squid/1.1.5 |
| spm.kotel.co.kr | Digital Alpha 2100 | 128M/15G | squid/1.1.8 |
| proxy.kren.nm.kr | R6000 AIX 4.1.3 | 128M/8G | squid/1.0.8 |
| kiwi.interpia.net | Sun SPARC 20 | 256M/1G | squid/1.0.22 |
| cache.hansol.net | Digital Alpha 200 | 128M/3G | squid/1.0.18 |
| proxy.elim.net | Pentium | 128M/8G | squid/1.1.5 |
| proxy.nuri.net | Sun Ultra-1 | 128M/8G | netscape proxy |
| proxy.nowcom.co.kr | Sun Ultra-2 | 512M/20G | squid/1.0.20 |