To make significant impact on the congestion problems on the Internet, especially outside the US, nothing less than 75% is sufficient. And more than 75% would be very useful.
Mirror Image Internet is a company totally devoted to the distributed storage of popular web content, primarily through caching. To reach the 75% goal, it is necessary to spend significant money, and therefore to set up a revenue stream matching those expenditures. This is not, as we see it, something which is possible to demonstrate in a controlled experiment, but rather something that needs to be proven in a real operating environment and on a intercontinental scale. Having said that, there are many subsystems, algorithms and methods, which we are developing within the overall framework.
To reach the goal of 75%+ hit rate, our hypothesis is that we will deploy a system to meet the following conditions:
1) Local Distribution of Content. Maximum caching efficiency is achieved when content is successfully distributed to the end user. Caches must cooperate such that only one copy of the file is ever downloaded into a given local system.
2) Economies of Scale. High cache hit rates are a function of cache size which in turn is a function of the number of users connected to that cache. Thus our design goal is to maximize the number of users accessing a given cache system. To achieve this, an efficient technique must make caching transparent to the user such that they are automatically pointed to that cache. End users can not be relied on to configure their browsers correctly. In addition, local caches must be linked into a larger system to ensure that all users in a given country, region or other area can utilize the scale benefits of a large multi-tiered caching system.
3) Optimal Cache Size. The optimum size of each caching system, should be determined only on the cost of bandwidth vs. the cost of storage media. This, together with condition 2) above, results in a need for very large numbers of end users in each caching system.
4) Freshness of Content. Files should only be purged when they are known to be obsolete rather than rely on predictive methods. If this is done properly, it will be possible to operate caches which are many times the size of current caches and which are guaranteed to contain only the current information. Purging of obsolete files should be done with a minimum consumption of bandwidth.
5) Uncacheable Files. Currently uncacheable files must be made cacheable by working with content providers. To get this done in real life means accepting their business situation and working with them on their terms.
Conditions 1, 2 and 3 above call for a national or regional caching system, in which all academic and commercial providers of web access participate. Conditions 4 and 5 above call for a global operation which performs these functions for national and regional caching systems everywhere.
The biggest problem in caching, as we see it, is that it makes very little economic sense in California. This is due to a combination of a large concentration of web servers, very cheap bandwidth and a relative lack of interest by California users in content from the rest of the world. Therefore most California web sites are cache unfriendly. And most web site software written in California is not very cache friendly either. This matters since most web servers are in California and most web server software is written there. There is however some hope. Caching makes some sense for operators of large LAN environments, where the cost of external bandwidth is still noticeable.
We have created a company called Mirror Image Internet and have raised a lot of cash (nearly US $6 million) from a proper IPO in Sweden. And we have taken the following steps:
1) Raise enough cash to establish the global operation plus a large number of national caching systems.
2) Develop a cache server/router called an "Interceptor" which is transparent to the user and capable of serving 10 million requests per day. This takes away the voluntary element of most caching, which we think is necessary to reach the critical mass we need to reach the high hit rate. This box has been in production since January and is now ordered by over 20 ISP's in over 15 countries.
3) Develop software for an Exchange Point Cache, the focal point of a two-tier caching hierarchy that maximizes scale. It talks ICP to anyone else, but it also talks a more direct and efficient protocol to the Interceptors. One of these machines is in operations since March in Sweden, and will by June 9th have over 100,000 end users pointing to it through Interceptors or other proxy servers.
4) Order hardware for another three Exchange Point Caches, to be placed in London, Amsterdam and Palo Alto. The purpose of the Palo Alto site is to make Californians more interested in caching, by offering a free service to large LAN users.
5) Develop relationships with some of the largest content providers. In many cases they have agreed to make their entire sites more cache friendly. We have started to recruit "an army of men-in-suits" to spread the gospel of cache friendliness in the web content community. Some of this is for our profit, but most will be for the benefit of the entire caching community.
We estimate that by taking these steps, and rapidly expanding the operation to wherever web users are found, we will be able to deliver "virtual bandwidth" with hit rates over 75% and for less than 1 cent (US) per Megabyte. In the US and in other deregulated areas, it may even be less.
While we will need on-going revenue to support the operations, we are always keen to cooperate with the academic communities everywhere, and we will offer free or heavily subsidized services to schools and universities. We will also need talent, both to do research in the field, and to help us operate the systems.