- Everything You Know About Fitness is a Lie – The auther talks to various trainers and puts forward the theory that basic weights and other strength building exercises the real key to fitness
- Radio NZ’s New Hosting – Short writeup of Radio New Zealand moving their hosting from in-house to a remote data centre
- Architecting Scalable Applications in the Cloud – 1st of short series of articles.
- Philip Roth and Wikipedia – A wikipedia responds to Pjilip Roth’s article in the New York Times. I’m afraid Roth doesn’t come off very well.
Category: Tech
Technical
Speeding up Varnish Cache
TLDR; Don’t send http requests though more tests than you need to, especially 100+ item regexp comparisons.
At work we run Varnish Cache in front of our websites. It caches webpages pages so that when people request the same page it gets served quickly out of cache rather than having to be generated by the application each time. Since pages only change every few minutes (at most) and things like pictures barely every change we can serve 98-99% of request out of cache and handle the website with a lot fewer servers.
Last week I noticed that one of our varnish servers (we have several) was using about 60% of CPU (60% on each core) to serve just 2600 hits/second. In the past we’ve seen the servers get a little overloaded at around 4-5000 hits/second while people on the varnish mailing list report getting over 10,000 hits/second easily. I decided to spend a few hours playing with our varnish config to see if I could speed things up.
I suspected the problem was with some regexps we had in vcl-recv which is run for every request received by the cache. But first I setup a test enviroment to help me trace the problem.
- Install Varnish on a test VM ( I’ll call it “server1” ) with our production config
- Run varnishncsa on a production box for a while and copy over the logs ( I copied 1.6 million lines ) to another VM “client1”
- Install some http benchmarking software on client1
- Both server1 and client1 were since CPU ( single core ) with 1GB of RAM on server1 and 750MB on client1.
I actually found the benchmarking software to be a pain. I tried httperf, ab, and siege and found they all had their limitations. The hardest bit was we run multiple domains and it was hard to tell a program to “run their this list of URLs and send them all to this IP” so I ended up just creating about 30 host file entries.
After a bit of playing around I ended up using siege for the testing and using the command line “siege -c 500 -d 1” which generated 1000 requests/second and the options “internet = true” (to pick random urls from the file). I used a list of 100,000 urls from production. I found that sending this many requests used about 68% of the CPU on my server1 while sending more requests or using a larger list of urls tended to overload client1. For the back-end I just used our production servers.
To test I ran siege for at least minute ( to get the cache full ) and then ran vmstat and varnishstat to get the CPU and hitrate once this settled down. The CPU usage jumps around by a couple of percent but the trends are usually obvious.
Now for some actual testing. First I tested our existing config:
Original Config CPU: 70% Hit rate: 98%
Now I switched back to the default varnish config. The hitrate drops a huge amount since our config doesn’t things like remove “?_=1354844750363” at the end of URLs
Default Config CPU: 41% Hit rate: 86%
Now I started adding back bits from our config to see how load increased.
Default + vcl-fetch + vcl-miss + vcl-error CPU: 39% Hit rate: 83% Above + vcl-deliver CPU: 44% Hit rate: 82% Above + production backend configuration CPU: 41% Hit rate: 82% Above + vcl-hit and expire config CPU: 43% Hit rate: 82% Above + vcl-recv CPU: 68% Hit rate: 98%
So it appears that the only bit of the config that makes a serious difference is the vcl_recv.
Our vcl-recv is 500 lines long and looks a bit like VCLExampleAlex from the Varnish website. It included:
- 6 separate blocks of “mobile redirection” code. 2 of these or’d about 120 different brands in User-Agent header. 3 of these applied to production and there were staging copies of each of them too.
- About 20 groups of URL cleanup routines, many with ” if ( req.http.host == ” so they applied to only one domain.
- Several other per-domain routines ( which did things like set all request for some domain to “pass” though the cache )
- Most of the “if” statements were fuzzy regexp matches like ” if ( req.url ~ “utm_” ) “
Overall there were 32 “if” statements that every request went though.
I decided to try and reduce the number of lookups the average request would have to go though. I did this by rearranging the config so that the tests that applied to all requests were first and then I split the rest of the config by domain:
if ( req.url ~ "utm_" ) { set req.url = regsub(req.url, "\&utm_[^&]+",""); set req.url = regsub(req.url, "\&utm_[^&]+",""); } if ( req.http.host == "www.example.com" ) { set req.url = regsub(req.url, "&ref=[^\&]+",""); } else if ( req.http.host == "media.example.com" ) { # Nothing do do for media domain } else if ( req.http.host == "www.example.net" ) { return (pass); } else { # Obscure domains if ( req.http.host == "staging.example.com" { return (pass); } }
Specific bits I did included:
- Make the most popular domain that the top of the config so they would be matched first
- Put domains that got very few his into the default “else” rather than wasting them on their own “else if”
- The media domain got the 2nd highest number of hits but had no special configs so I gave it it’s own “empty” routine rather than letting it fall though to the default “else”
So recalling what I previously had here is the improvement:
Original Config CPU: 70% Hit rate: 98% Split vcl-recv by domain CPU: 44% Hit rate: 98%
I then removed the per-rule domain tests since the rules were now within a single test for that domain and got:
Don't check domain in each rule CPU: 42% Hit rate: 98%
After some more testing I deployed in production
The next step I did was update the mobile redirect rules so that instaed of them going:
if ( req.url ~ "/news.cfm" || req.url ~ "/article.cfm" || req.url == "/" ) && req.http.user-agent ~ ".*(Sagem|SAGEM|Sendo|SonyEricsson|plus another 100 terms
for each request to the domain instead wrap the following around them
if ( req.url == "/" || req.url ~ ".cfm" ) { }
so that only a small percentage of requests would need to be processed by the ” Giant regexp of Doom™ ”
I tested this and got:
Wrapper around mobile redirects CPU: 36% Hit rate: 98%
On an actual production server I got the following with around 600 hits/second
Original Config CPU: 25% Split by Domains CPU: 18% Split by domains + mobile wrapper CPU: 11%
So overall a better than 50% reduction in CPU usage.
Links: A/B testing, Road safety, Where IT goes to die, The Cheapest Generation
- 23 Tips on How to A/B Test Like a Badass – Really great article, mostly applicable to ecommerce sites but plenty of general ideas. I’d like to say I do this at work but unfortunately the culture isn’t there.
- What an RAF pilot can teach us about being safe on the road – Interesting view on how easy it is for motorists just “not to notice” cyclists. I commonly see cyclists these days with flashing lights (front and back) turned on during the day to help improve their visibility.
- The Cheapest Generation – How the consumer wants of young adults differe from those of the previous generation(s). Phones & Walkable neighbourhoods are in, Cars and big houses in the suburbs are no longer as important.
- Where IT goes to die – How large company/Enterprise IT works from an author with experience at smaller companies. He is accerate and it is not very pretty.
Links: PIN numbers, Military leadership, Eggs, The Web stack
- Analysis of PIN Numbers – What are the most and least common 4-digit PIN numbers?
- General Failure – Does the US military no longer punish or demote bad generals? “A culture of mediocrity has taken hold within the Army’s leadership rank—if it is not uprooted, the country’s next war is unlikely to unfold any better than the last two.”
- Why American Eggs Would Be Illegal In A British Supermarket, And Vice Versa – Different approaches to food safety in the US and Britain.
- An Overview of the Web – What happens when your browser requests a web pagewebpage. A step by step though the various layers or computers, protocols and programs. Understandable buy someone a little technical.
Sysadmin Miniconf proposals close in 2 days for linux.conf.au 2013
Once again I’m helping to organise the Sysadmin Miniconf at Linux.conf.au . This time we’ll be in Canberra in the last week of January 2013.
This is a big reminder that proposals for presentations at the Miniconf close at the End of October. If you have a proposal you need to submit it now.
Even if you’ve not 100% finalised your idea let us know now and we can work with you. If we don’t know about it then it is very hard for us to accept it.
We have several proposals that have already been accepted but are very keen to get more.
Links: Mars!, The cdbaby model, Robots, the rest of the world on the web
- Comparison – Mars Curiosity Descent – Ultra HD 30fps Smooth-Motion (YouTube) – Comparison of the original and a greatly improved version.
- If newspapers were run like CDBaby.com – An alternative business/operating model for online news.
- New Wave of Deft Robots Is Changing Global Industry – Robots move beyond the traditional car-assemble type jobs.
- 10 lessons for uncultured web developers – A few little things that trip up web developers, especially those who think the valley is the world.
Links: Weather Prediction, Advertising, 3rd world mobile web and Database failover
- Handling Database Failover at Craigslist – Jeremy Zawodny links to a few recent articles on automated database failover and outlines what he does at craigslist. The consensus seems to be the the added complexity of automating failover results in less reliability than improving the process of doing it manually.
- The Weatherman is not a Moron – An overview on modern weather prediction and how it is improved. In just 25 years the 3 day estimate for landfalls for Gulf of Mexico hurricane has dropped from 350 to 100 miles.
- Stop Advertising in Photo Magazines – Head West to the Web – Is magazine advertising a waste of time? In this article the results are mixed but mostly “No” .
- See your site like the rest of the world does. On the Nokia X2-01 – Fascinating survey of the mobile web browser stats in the BRIC and Next 11 countries.
Squatters hit .kiwi.nz
A couple of days ago the .kiwi.nz second level domain was opened up. Within a day over 1000 domains were registered.
But I was wondering who is registering the domains, I though I’d have a quick look though some top brands and domains:
- telecom.kiwi.nz – Squatter
- vodafone.kiwi.nz – Squatter
- 2degrees.kiwi.nz – Squatter
- google.kiwi.nz – Squatter
- yahoo.kiwi.nz – Squatter
- bing.kiwi.nz – Squatter
- youtube.kiwi.nz – Squatter
- facebook.kiwi.nz – Squatter
- trademe.kiwi.nz – Squatter
- stuff.kiwi.nz – Squatter
- nzherald.kiwi.nz – Squatter
- msn.kiwi.nz – Available
- wikipedia.kiwi.nz – Available
- asb.kwi.nz – Squatter
- bnz.kiwi.nz – Squatter
- westpac.kiwi.nz – Squatter
- kiwibank.kiwi.nz – Available
- nationalbank.kiwi.nz – Squatter
- tv3.kiwi.nz – Squatter
- tvnz.kiwi.nz – Squatter
- sky.kiwi.nz – Legit Owner
- airnz.kiwi.nz – Legit Owner
- skykiwi.kiwi.nz – Squatter
- coke.kiwi.nz – Squatter
- pepsi.kiwi.nz – Squatter
Several in the list above (and I assume other domains) have been registered by the same few people. Overall not a good look but I assume things will calm down after a few lawyers letters and dollars are exchanged.
Where the NZ eyeballs are
So I was wondering what is the market share of New Zealand ISPs these days, do Telecom and TelstraClear still completely dominate the market? or have the smaller ISPs caught up?
Earlier this week I grabbed a sample of the weblogs from a large New Zealand website and checked to see which networks the readers came from.
Using the tools at Team Cymru I checked looked up the origin ASN ( roughly ISP ) for that network, this enabled me to work out which percentage of the traffic came from which ISPs.
- I’ve included the data from 2 times below, a daytime one for the “business traffic” and an evening sample for “home traffic”
- Data during both periods is over 20Mb/s and from a general interest New Zealand website (anonymous as condition of releasing the data)
- Only requests that originate from New Zealand are included.
- There may be a bias of traffic towards Auckland
- Note that only the largest sites will own their own networks, AS and run BGP . Thus sometimes very large companies and other organisations will be included as part of their ISP’s total.
- Stats below are sorted by bandwidth used
I was interested to see how much Telecom (including the Netgate brand which is probably Chorus) still dominates. I was actually surprised to see that Telecom dominates the home market much more than the business market whereas I would have expected the other way around.
Between Noon and 1pm on Thursday the 6th of Sept.
Percent ASN ASN Description 36.82 4771 Telecom New Zealand Ltd. 15.10 4768 TelstraClear Ltd 7.41 4648 Netgate 5.60 17746 Orcon Internet 3.03 7657 Vodafone NZ Ltd. 2.69 9889 Maxnet / Vocus 2.33 9503 FX Networks Limited 1.91 38793 NZCOMMS. Mobile phone Company. New Zealand 1.90 9790 CallPlus Services Limited 1.73 10022 Internet access for Datacom Systems Auckland 1.41 23655 Snap Internet Limited 1.36 9431 The University of Auckland 1.28 9325 Telecom XTRA, Auckland 1.05 4770 ICONZ Ltd 0.92 17492 Vector Communications LTD., 0.88 24183 DTS LTD 0.79 17435 WorldxChange Communications LTD 0.73 18353 Revera NZ Limited 0.71 9245 COMPASS NZ 0.66 18021 Unisys NZ, IT Outsourcer, 0.56 55454 Orcon Internet Ltd 0.48 9872 ITNet Ltd 0.41 17412 Woosh Wireless 0.39 45946 Air New Zealand Limited 0.36 9432 University of Canterbury 0.36 38305 The University of Otago 0.34 2570 Telecom New Zealand Ltd 0.32 9303 KC Computer Service Ltd., 0.30 24324 Kordia Limited 0.29 17649 DMZGlobal Ltd 0.28 55872 BayCity Communications Limited 0.26 9345 Paradise Net 0.25 23838 Solarix Networks Limited 0.25 17705 InSPire Net Ltd 0.25 10200 Web hosting provider and ISP connectivity. 0.24 9876 Airnet 0.23 24318 Ministry of Education 0.23 17663 Housing New Zealand Corporation Internet 0.21 2687 AT&T Global Network Services - AP
Between 8pm and 10pm on Wednesday the 5th of Sept.
Percent ASN ASN Description 62.71 4771 Telecom New Zealand Ltd. 12.97 4768 TelstraClear Ltd 6.34 17746 Orcon Internet 2.89 7657 Vodafone NZ Ltd. 2.74 9790 CallPlus Services Limited 1.47 17435 WorldxChange Communications LTD 1.26 17412 Woosh Wireless 0.96 38793 NZCOMMS. Mobile phone Company. New Zealand 0.92 23655 Snap Internet Limited 0.81 4648 Netgate 0.63 9889 Maxnet / Vocus 0.49 55872 BayCity Communications Limited 0.46 4770 ICONZ Ltd 0.42 17705 InSPire Net Ltd 0.38 17994 Appserv Limited 0.37 9872 ITNet Ltd 0.35 45267 Lightwire LTD 0.32 9325 Telecom XTRA, Auckland 0.27 9245 COMPASS NZ 0.25 45946 Air New Zealand Limited 0.20 10200 Web hosting provider and ISP connectivity. 0.17 9303 KC Computer Service Ltd., 0.16 9345 Paradise Net 0.16 45177 Layer2.co.nz 0.16 38305 The University of Otago 0.15 23735 Enternet Online Ltd 0.13 9431 The University of Auckland 0.13 24324 Kordia Limited 0.11 17492 Vector Communications LTD., 0.10 9876 Airnet 0.10 55853 Megatel
Links: 8bit T2, IQ, Baby names, Low Power computers
- How low (power) can you go? – Charlie Stross looks into the future (as Sci-Fi authors often do) and predicts what 2032 could look like with one square millimetre computers everywhere.
- American Baby Names Are Somehow Getting Even Worse – Bastian, Sincere, Copelia, Luxx
- Race, IQ, and Wealth – Is IQ just a function of GDP/Wealth ?
- Terminator 2 – 20 years ( July 3 1991 – July 3 2011) (Youtube) – You may remember “8-bit trip” by Rymdreglage a couple of years ago. Here is their stop-motion tribute to T2 ( which I’ve seen 5 times in the theatre).