used in parent relationships for requests from non-nlanr caches
NLANR caches fetch directly
More info: ircache-request@nlanr.net and http://ircache.nlanr.net/Cache
TF-cache COM-MESH Experiment,
(John Martin)
Who: European R&D per-country networks,
connected by Europanet, Ebone, or TEN-34.
CHOICE: co-operative hierarchical object indexing and caching for europe
Motivation:
(1) >75% .com domains in US; (2) >60% of requests
for .com; (3) high latency to US
Solution: move part of .com to europe
HITS: from cache 40% requests, 35% volume; siblings (6%, 15%);
direct (40%; 50%)
SERVICE TIME: direct 20-30s; from cache <1s; sibling 2-3s
Current Problems: current stats insufficient
ICP, ICP_OP_HIT_OBJ traffic can swamp a
heavily loaded line - situation gets worse
Statistics: traffic in/out, ratio cacheable/non-cacheable, efficiency
Next: improve statistics, extend to other domains
(.org),
impact of new itlds/DNS, HTTP1.1 deployment, cache busting guidelines
multicast
Joining: next meeting 9/5, Amsterdam
mailing list
mailto:tf-cache@terena.nl,
contacting terena: tech-staff@terena.nl
cache admins survey:
http://w3cache.icm.edu.pl/survey/results,
cache busting document
Japan cache project and web caching in japan Masaaki Nabeshima
What: public cache for .jp sites for caches outside japan:
www-ntt.nttam.com, in California, whose primary server is www.ntt.co.jp
(Tokyo), hit ratio around 75%.
cache.imnet.ad.jp
NTT-operated, on IMnet (in Japan), 500k TCP queries/day
Background:
Many ISP/organizational caches; no nation-wide hierarchy.
Performance:
Hit ratio around 58% higher than usual cache
server operation shows benefit of
cache server for access to specific domain name.
Prefetching not effective for their workloads.
Coherence: Cooperation from primary .jp servers,
asking news company for refresh notifications,
permission to replicate (hard)
Reference: Japan Cache Project Home Page
http://cache.jwindow.net/
Future: shorten path from .jp cache server to KIX
(direct link to US)
Internet object caching from ICTP E. canessa
What: Int'l UNESCO/ISEA research institution,
advanced physics/math training. Trieste, Italy
2000-2500 users/yr, various machines
Getting on the internet: only 1 connection to US
response times ~.05 - .1 kb/se from .us, European links 1-2
orders of magnitude faster
national cache would help
Problems: ICTP Squid setup: need clearer idea for relationships
better monitoring of parent/child perf differences
(dynamic nature of squid is a problem)
may add a commercial parent cache (~35mb bandwidth to us).
Questions:
anticipate retrieval times per domain
predict future perf of parents (to optimize child perf)
transmit such data as part of the ICP MISS reply
crude approach: adjust weight per domain of parent based on such data
FREEnet web cache development,
(Serge Krashakov)
Largest R&d network in Russia (est. '91), 350 scientific institutions,
many at 19kbps
regional servers can get hit rates 40-50% /day
Papers/Krashakov/krashakov.html
Caching in Armenia
(Stephanian, Samvel)
Background:
ex-USSR: Internet attached in 9/95; 1st proxy 10/95
Current Status: 10 Pentium Pro proxy servers, 128RAM, 4GB.
3 ISPs: 128kbps to MCI ($10k/month) (proxy.aic.net, dialup)
115kbps (prox-2aic.net, info, with 32kbps from there to Infocom),
19.2kbps satellite to Moscow from primary server. total b/w > 300 kbps.
20,000 request/day, over 100MB web data directly w/30% hit ratio.
Problems (technical, organizational)
IP-based authentification fail, tuning document freshness,
routing with hierarchy ID
squid specific needs: SSL, better ftpget, better crash recovery, round robin
hard to establish peering agreements with other networks
may look at content filtering since logs show much porn.
Future: promote hierarchies in armenia, Russia, Ukraine, etc.
caching infrastructure in the german research network
(Grimm)
Background:
U. Hanover w/ support by DFN-Verein, Berlin w/funds from BMBF
Frankfurt exchange point; 90 Mbps link (MCI) to DC,
1 Mbps to Moscow; 45MB TEN-34
started as stand-alone cache mesh
Goals :
infrastructure/coordination
maintain top level caches, support others in configuring)
(10 sun ultrasparcs w/ (2 167Mhz cpu; 256MB; 20GB disk (6GB cache))
collect and analyze stats
set up local caching testbed
examine and improve new configurations/conceptions
investigate use of multicast
2 levels of caches, with domain dependence among parents
stable. ~35GB/day from caches
DESIRE Web Caching project
((henny bekker)
European funded w/ 22 partners
Stage 1: assess functionality, effectiveness of web caching,
requirements and recommendations, write architecture document,
cost/benefit analysis
Stage 2: expand mesh, workshops, integrate w/indexing,
autoconfiguration
costs/benefits:
different expectations/objectives for different levels of mesh:
(hit rates, server load, bandwidth/latency savings
architecture document:
for system administrators: describes design, checklist
http://www.desire.org/caching,
mail:
cache-disire@uninett.no
Performance of CERN's httpd vs Squid
(Carlos Maltzahn)
Overview: focus, architectures, methodology,
results, conculsion
Enterprise-level web proxies: focus on workloads, utilization, not meshes, replacement strategies
CERN's architecture (goal: simplicity)
new process/request, URL-derived cache structure,
entirely implemented on disk
Squid's architecture: (goal: performance, portability)
1 process (except ftp and dns), fingerprint-derived cache structure,
meta-data in RAM
Measurement framework:
2 dec alpha 250 4/266, 512MB, 8GB digital unix
split load via round-robin dns
busy day: 24 hours uninterrupted, per-15 minute logs (system stats, service times)
each proxy w/ and without 8 gb cache
Results: memory utilization :
CERN depends on # connections, susceptible to net latency, similar w/ or w/out cache
Squid: cache-size dependent, disk cache expensive
Results: disk i/o : similar
CERN's url based cache structure preserves spatial locality, better use of file system
Squid's fingerprint-based cache structure seems to destroy it
Results: cpu utilization:
depends on load (open connections), state (meta-data), OS
CERN processes have little state (but forks processes)
Squid manages large cache state, so running cacheless is a lose
CERN 16.0 cacheless; 19.8 cache] (Squid 22.6/9.4, needs more because stateless)
kernel (CERN 87.9 cacheless; 128.6) (Squid 89.7; 55.4 )
kernel cycles/request (by process, memory, network management)
....CERN 2.6 cacheless; 3.5 caching; Squid 28.1/13.1
conclusion: squid's non-blocking select loop is expensive
Conclusions:
web proxies need to be more OS aware
non-blocking I/O can be expensive
in-memory meta-data can make cache size more expensive
Squid's cache structure might make ineffective use of file system
web proxies need to be more robust
for high b/w sites: hit rate has higher impact on
resource consumption than on service time
..and architecture has more impact on service time than hit rate
future may see special caching OS (at least quite tuned)
Cache Characteristics of Web Proxies
(David Marwood)
Motivation:
effect of web data access patterns on caches is poorly understood
user population, degree of sharing, resource requirements
dynamic effects: hot sets; thrashing
Question: how is hit rate impacted by cache size?
Web Proxy Simulator (SPA): replacement and coherence based on squid
input is proxy request log data, varies cache size and request rates
traces from DEC, NLANR-sv, au, GMCC (104,494 -537,595) req/day, (67-16,204) clients
Conclusions:
minimum request rate required for effectiveness
thrashing places lower limit on cache size
hot sets place an upper limit on required cache size
Limitations:
original proxy cache size is upper bound on SPA cache size
some user locality info lost in the anonymizing process
browser cache filters accesses
only 4 data sets
Future: user sharing, skew toward popular URLs, impact of other
resources, user profiles, maybe
correlation between (#unique clients, hit rates)
and between URLs of alternative traces
http://www.cs.ubc.ca/spider/marwood/Projects/SPA/Report/Report.html
Observations from a Trans-Pacific Cache Pair
(Kathy Richardson)
objective: (1) evaluate HTPP perf w/ and w/out persistent connections
(2) Evaluate the impact of ICP, TCP on cache performance
Experiment: 93 unique urls repeated 5 times (465 total requests)
Results: persistence saves RTT for cache-cache transfer,
importance depends on RTT..
Caveats: elapsed time not reliable, heavy tails so median deceives
Connection Inefficiencies:
TCP window size effects (determined through sniffers)
window too small: must wait for ACKs after few pkts (standard TCP ACKs after 2 pkts)
ICP and failover need separate mechanisms
tuned HTTP persistent connections: response time 22% improved
eliminating ICP for parent: 61% improvement
Ignored: internal cache delays, TCP/cache service scheduling, persHTTP implementation quirks
Recommendations: persistentHTTP, evaluate hierarchy settings: connections, windows, levels
ACM CCR June 97: John Heidemann article on cache configuration
Simple Usage-based Charging of Web Cache Services
(Anagnostakis)
Why charge?: info on resource consumption, encourage caching
Impact: benefits all 3: content providers, end users, ISPs
Access patterns:
Classify objects based on #requests:
denotes level of sharing
# of objects per class drops very fast in higher classes
# of bytes drops less fast, w/ a hill representing popular docs
can't predict object class
Goals of charging scheme:
reflect costs, thus actual resource consumption by users
reflect service quality, provide incentive to use cache
cost recovery, make caching fair and profitable business
minimize cross-subsidies
Why not charge at the network level?
no details on cache semantics (hit, miss, object locations)
complex correlation of flows to/from, due to higher layer interactions
can use service usage semantics service access patterns
How?: per-object charging: assign ID to each object; divide cost among #users accessing that object
first user that suffers a miss to pay less relative to later hits
cache users should get some reward, either fixed or proportional to object's class
Results: users do pay less when using caches (ie 10% off regular bill)
`netsurfers' pay less than `conservative' users
parameter selection: class 1 objects: w=~1. MISSes pay less.
Future:
explore `willingness' to pay, e.g., cost sensitivity vs quality
currently implemented as a log analyzer
how to charge in an environment w/different policies, QOS's, competition
involve content-providers
Web Cache Charging Policies:
(Kozinski)
Charging models:
based on costs of service (materials, energy, admin costs, amortization)
does not stimulate development of services, since no need to reduce costs
profit-based: traffic reduction gives measurable savings
Non-ISPs caches: immeasurable profits
(alter external/internal traffic rate, customers comfort)
measurable: external links bandwidth savings
agreement between web cache provider and isp for web cache services
Content provider charges ISP rather than clients
clients do not pay for traffic between themselves and cache server
ISP and content provider share profit from traffic reduction
Selling Caching for Real Money
(Donald Neal)
national caches generally do not work because national
telcos do not cooperate
1990-95: single international link operator
int'l link costs dwarf all other costs
int'l traffic charged for per "NZGate MB"
80% discount off-peak (8pm - 9am), 50% discount outgoing
bills based on consumption bands, not pure counts
traffic-based charged funded rapid bandwidth expansion
note: only some traffic costs money
1995: parent cache operated by int'l link operator
(service only provided to customers w/ caches)
int'l link costs still dwarf all other costs
customer int'l traffic divided into
cache int'l traffic, giving MB figure added to NGGate figures
what happened: rapid traffic growth
Why such growth/success?:
topology favored, easy cost comparison
....no choice in charging structure, no billing costs,
single ISP so no bad debts
1996: multiple int'l link operators (esp. telcos)
most universities buy bandwidth for real $$/MB
fixed-pipe pricing, flat rate retail charge popular
some customers sacrifice perf for predictable costs
NZ internet exchange (NZIX) continues to operate
NZIX cache charges in $$/MB, fixed-pipe pricing puts NZIX outside NZ
Conclusions:
cache viability depends on bandwidth pricing structures available to customers
cache location should depend on combination of topology and pricing structure
customers lost or not gained because flat-rate pricing not offered
only some traffic costs money
implementing pipes inside an exchange may not be pretty
1 june 97 - the Virtual Pipe
universities stop paying $$/MB, now based on 1/3 quartile day, calculated from 5min samples
all risk born by customer; but no predictability (find out costs at end of month)
cache is bandwidth provider, parents
compete w/alternative services/pricing structures
problems for existing software: no raw IP logging, no bandwidth control
Comments:
can't impose caching on users since some don't want:
public usage patterns, added latency
proxy logs reliability: so far no complaints
copyright: implied consent by placing something on the web
concern over NZ's stripping of cookies and caching images
need for reporting hits to content providers
Piggyback Cache Validation
(Wills)
Goal: solve cache coherency problem for resources without
expiration time.
Strong consistency: validation by client (IMS each time) or server (tracks every client)
Weak consistency:
piggyback cache state information (e.g., cached URLs) onto HTTP requests
time to live (ttl): client heuristic to determine how
long cached resources are 'good'
avoid communication w/ server for up to date info (IMS queries)
server piggies [in]validation of [stale] documents in reply
Evaluation Criteria: cost, latency, server bandwidth/request
document staleness.
%requested resources returned as stale from cache
combination of costs and staleness: 50/50 weighting
PCV cost: normalized to average cost of GET. e.g.,
want policy to generate fewest unneeded checks
Results: PCVadapt policy yields lowest staleness value
Future work: implementation, server-generated validators,
use PCV info in replacement decisions
Summary: PCV yields fewer requests and connections between proxy and server
best combination of cache consistency and costs
can be implemented with HTTP 1.1
PCV has other uses: server-generated validators, replacement policies.
Comments:
periodic validation likley more important for larger caches
goodness factor might benefit from being broken into its 3 components
1GB cache in experiment might artificially show minimal staleness,
distorting results
Delta Encoding...
(Jeff Mogul)
goal: information-theoretic limit on how much has been seen
getting past all-or-nothing: delta encoding
if IMS true, current HTTP requires re-sending entire value
delta-encoding: send only the changes
research problems: how common are small-delta changes? is it perturbable?
which delta-encoding schemes?
connection between delta-encoding and compression: aiming at information-theoretic limit
minimizing delta implies data compression
since we're compressing deltas, why not compress all
Trace-experiment design full-content traces of web traffic: Digital and ATT
Determine change rate:
almost 50% of the URLs reference more than once
only 10% of instances referenced more than once
web stuff changes too fast for simple caching to help
note: doesn't include images
Possible delta-encoding schemes
Unix diff - e command:
ubiquitous; speed (encoding fast, decoding moderate)
compress result
vdelta algorithm (another study shows best)
Delta-encoding Results vdelta of delta-eligible responses
99% got smaller; 83% of the bytes saved; 39% transfer time saved
of all full-body responses
38% got smaller; 31% of bytes saved; 10% of transfer time saved
caveats: few images, time-saving estimate is conservative
Compression results using gzip of delta-eligible responses
full-body responses:
75% got smaller; 39% bytes saved; 22% transfer time saved
overall: 58% got smaller; 28% bytes saved; 18% transfer time saved
Delta-e vs compression: need both
Protocol design issues: how does client request delta-e, how does server transmit?
Research issues:
deltas: mgmt of base instances (server), obsolete objects
old deltas at server
images: majority of bytes, segregate line art?
jpeg compression? does css1 obviate?
compression: server caching compressed values (esp. dynamic content),
modem interaction
PICS labels
(Wayne Salamonsen, IRDU/NUS, Singapore)
W3C developed
allows URL classification through PICS labels
3rd party PICS bureaus, or on and web servers
appearing in browsers (in HTML), servers and now proxies
....
(database of sources using labels via a specific pic scheme)
Building PICs awareness into proxies advantages:
opportunity for (hierarchial) caching of PICs labels
allows wide scale content filtering and selection
increases proxy functionality, promoting increased use
But: getting labels from bureau represents potential perf bottlenecks
solution: cache labels just like web pages
squid changes to support labels: concurrent page and label fetch
ICP needs to distinguish between label and URL fetches
how to measure performance benefits of label caching?
built on PEP (protocol enhancement protocol), not yet defined by w3c
Summary/Future
investigate mechanisms for PICS label caching at hierarchical proxy level
provide initial Squid-based prototype for experiments
....(suggestion for cgi script in lieu of modifying squid)
evaluate utility
Adaptive Web Caching
(Sally Floyd)
Overview: vision, components
Not covering: caching decisions,
data integrity, unique object names, increment deployment
Vision: popular pages (demand-driven) diffuse from origin to neighboring caches
new caches easily added to infrastructure in self-configuring manner
data integrity property of data itself, does not rely on cache authentication
self organized data flow: caches as autonomous agents (no manual config)
requests forwarded among caches based on local info, e.g. cache routing table
is adaptive to changes in topology, load, etc.
Forwarding URLs: metrics for evaluating neighboring caches or cache groups:
distance in hops (hop cache) or sec (ping/src_rtt) from origin server
distance in cache group hops from origin server
in general direction of the origin server (using nearby routing table?): less interesting now
past success fate/declared willingness to resolve requests, of this type or in general
PICS for metric selection
Overlapping Multicast Groups of caches:
how caches talk to other caches
if first cache doesn't have URL, forward to group
if noone in group has, one will forward to different group
if HIT, members use randomized algorithm to inform the group
if no one reports HIT, member have to decide how to respond
....(1) cache multicasts URL data to group
....(2) caches in group independently decide whether to cache URL
cache that forwarded request unicasts reply to client
Incremental deployment: New algorithms for forwarding requests in unicast cache hierarchy
multicast w/in manually-config groups
long-term research: self organization of multicast groups