Background

At Dilato we test a lot of websites. At any given time we have hundreds of tests being executed simultaneously for our clients. Some of these tests are automated, some are manual tests performed by test engineers. We always prefer to test against servers on our internal network: this eliminates bandwidth and latency bottlenecks, decreases testing time and also prevents strange & irreproducible errors caused by degraded network conditions. In the real world this is not always possible as many of the application we test are cloud-based. Replicating the infrastructure they are hosted on is expensive and time consuming.

Latency

Our main office is located in Beijing. Latency to the US is between 150-250ms, while it takes 180-280ms for packets to travel back and forth to Europe. Increased latency and congested intercontinental pipes can lead to packet loss and lower throughput rates. All of this is bad news if we want to execute our tests quickly and reliably.

Deciding what to cache

Luckily there are a few things that we can do. When you use Fiddler to analyze the webpages that you visit / test you will find that for most pages the majority of content is static. In theory you only need to fetch this content once from the internet. Any subsequent requests for the same file can be served from a caching proxy that is placed in your intranet. Using a caching proxy like Squid is more efficient than using the built-in cache that is found in most modern browsers as it benefits all clients on your intranet. Especially since most tests, in particular automated ones, start with a fresh session.

We can divide the web content that is downloaded during our web tests into three categories:

  • Static content: e.g. JavaScript libraries, images, fonts
  • Dynamic content: e.g. the results to your search query
  • Resources that are not available in China: e.g. Twitter / Google / Facebook

Static content is generally safe to cache. Think about libraries JQuery / AngularJS, background images, custom controls and font files.

Dynamic content is generated by the server and relies on user input. Caching this content is not advisable.

Blocked or unstable resources are an interesting one, and a phenomenon that is not mutually exclusive to China, as they can severely slowdown the loading of a webpage. Generally a browser will wait between 60-120 seconds for a request to timeout, during which it may not be able to load other resources on the webpage. We deal with these by either redirecting them to localhost (resulting in a 404 – not found) or by routing the specific request over a gateway that is connected to a VPN.

Best practices when using Squid Cache

Some of the guidelines we used when implementing Squid Caching:

  • 1. Start with an empty cache every morning.
  • 2. Cache both HTTP and HTTPS connections. To intercept HTTPS you need to generate your own root certificate, install the public keys on the clients in your network and use the private key to resign any HTTPS sessions from your squid caching server.
  • 3. Do not use squid cache for your e-mail client / Skype sessions etc.
  • 4. Analyze the websites you want to cache with Fiddler and, if needed, manually configure your filters of what file (-patterns) to cache.
  • 5. If you’re in doubt whether content is safe to cache: don’t cache it.

We found that caching frequently used resources saves us precious time, which we can use to execute more and better tests!