Cachemesh: A Distributed Cache System For World Wide Web Zheng Wang and Jon Crowcroft Department of Computer Science University College London Gower Street, London WC1E 6BT, UK zwang@cs.ucl.ac.uk, jon@cs.ucl.ac.uk Extended Abstract Cachemesh is a distributed cache system that builds upon the work of Harvest cache. Like the Harvest cache, a Cachemesh is comprised of several co-operating cache servers. Nevertheless, Cachemesh achieves co-operating caching in a way different from the Harvest cache. Central to the Cachemesh system is the idea of co-operative cache placement and cache routing. By integrating cache resolution and cache placement, Cachemesh servers can use the cache routing mechanism to forward requests among them in an efficient way. Cache Resolution Cachemesh does not resolve cache location by sending query messages on-demand to neighbor cache servers. Instead, it uses what we call cache routing. Cache servers periodically exchange cache routing information and pre-compute a cache routing table for cache resolution. When a cache server receives a request for an object not in its cache, it simply looks up the responsible server in the cache routing table and forwards the request to that server as the next-hop. Cache routing is, in many respects, similar to the hop-by-hop packet routing, where a routing table is constructed by routing information exchange for packet forwarding. In the case of cache routing, the connections between servers are built with virtual links, thus the servers can be considered to be fully connected with one-hop away between any two servers. Compared with the request/response type of cache resolution in the Harvest cache, cache routing introduces very little latency during cache resolution as the process only involves two lookups. Combined with co-operative cache placement, Cachemesh has extreme low overheads in terms of message exchanges and search. Cache Placement Cachemesh servers co-operate both over cache resolution and cache placement. In fact, cache resolution and cache placement are integrated in Cachemesh. The cache routing table governs how cache location is resolved, but also decides where the cache should be placed. In Cachemesh, we try to put objects from the same Web site on the same cache server as indicated by the cache routing table. To some extent, the co-operative cache placement can be viewed as an intelligent mirroring system, where the Web sites and its corresponding mirror servers are totally self-configuring and objects are updated when requested by the users. Co-operative cache placement is crucial for cache routing to work. When there is no explicit cache placement control, as in the Harvest cache, objects from a same Web site can spread over all cache servers. In such a system, cache routing would have a scalability problem as the cache routing table must have an entry for each object, and must be updated each time when an object is added or deleted from the cache. With co-operative cache placement, objects on the cache servers are grouped by their Web sites. Such aggregation of information allows the size of cache routing table and the frequency of updating to be reduced by two or three orders of magnitude. The scalability of cache routing can be further improved by applying the default route technique widely used in packet routing. A cache server can be designated as the default cache server for all Web sites that do not have explicit entries in the cache routing table. The default cache server can be used to cater all less popular Web sites. Cache Topology The Harvest cache establishes a hierarchy along the network topology. Such a cache topology is most bandwidth efficient, particularly when some co-operating cache servers do not have high speed connectivity. However, to set up such a hierarchy, cache servers need to be placed at the key access points in the network. This often requires significant coordination and administration (which is what NLANR is funded to do). The hierarchical cache topology also has its disadvantages. For example, root cache servers may become overloaded, and there is heavy traffic concentration around the paths to root servers. Cachemesh is intended to be used for cache servers that are well connected by underlying networks. Under such circumstances, the difference between a hierarchical cache topology and a flat one is not that significant. Therefore, in Cachemesh, there is no hierarchical structure. All cache servers operate at the same level. Such cache topology has the advantages of better load sharing, less traffic concentration and robust behaviors in the face of failures, Autonomous The co-operation in Cachemesh is very flexible; each server can decide the level of participation. The services that a server offers to the Cachemesh are only what has been announced through cache routing information exchanges. Cachemesh allows a server to have a private cache which is invisible to other servers, and hence does not have to follow the Cachemesh cache resolution and placement rules. A server can simply cache objects on its local private cache but only announce them when it chooses to do so or when some conditions are met. A cache server can also has implement local policies such as lists of Web sites for which it prefers to act as the designated server. Cache Resolution and Placement Algorithm The cache resolution and placement in Cachemesh are as follows. When a cache server receives a request for an object from a user, it first checks its own cache for the object. If it has the object in cache and the object is still valid, it simply returns the object to the user. If the search returns with a miss, the server looks up in the cache routing table for the cache server responsible for the Web site of the object. If it is itself the server responsible for the Web site, it fetches directly from the Web site and caches the object. Otherwise, it fowards the request to that caching server responsible for the Web site. When a cache server receives a request for an object forwarded from another cache server, it checks its cache for the object. If it has the object in cache and is still valid, it simply returns the object. If the search returns with a miss, it fetches directly the object from the Web site, and caches it. It should be noted that whether an object can be cached or not is also subject to a number of other considerations such as object's cache control directives and the cache replacement policy, which are not addressed in this paper. Cache Routing Exchange and Maintenance One of the key components in the Cachemesh system is the the Cache routing exchange and maintenance. The component includes an information exchange protocol and policy for adding and deleting entries, mechanisms for initializing cache routing table, collision resolution and local private caching. We discuss each pf the issues in detail.