lca2013 – Day 2 – Session 3

Getting your talk accepted: write a convincing talk proposal – Jacinta Richardson

Background

  • On a lot of papers committees
  • LCA, SAGE-AU, OSDC

Pick Conference you want to speak at

  • Some easier to get into than others
  • SAGE-AU 50%
  • OSDC 50%
  • YAPC easy
  • OS Bridge – harder
  • OSCON – 30-50% chance
  • everywhere else – medium
  • LCA – really hard. 5x the proposals received than accepted

Speaker rewards

  • Free entry

Call for Proposals

  • Not always widely distributed
  • Join mailing lists, watch websites, ask

Write Abstract

  • The hard bit
  • Some confs narrow or wide on talk topics
  • Audience 1 – The programme Comittee
    • Doesn’t know if you are a good speaker
    • Look for link to video of you speaking
    • If no video then assume if writing bad/good then speaking similar
    • Check spelling and writing style
    • Tell a story but not too long
    • Not academia , avoid insane amounts of jargon
    • Paragraphs good. Might only read the first
    • “read first sentence of each paragraph”
  • Audience 2 – The attendees
    • Why your talk?
    • Against other options
    • Good title
    • Skip over – “X for fun and profit” , “making X sexy” , “What I did on my X holidays”
    • 5 or fewer words for title
    • Convincing first paragraph or even the first sentence

Ask for help

  • From usergroups
  • Or people you have met at LCA
  • people on the papers comittee

Enabling Compute Clusters atop OpenStack – Enis Afgan

cloudman – usecloudman.org

  • People want a ready to use service, something they can just sit down and start using
  • Bridge between Saas and IaaS
  • Allows somebody to create a pre-configured compute cluster

Deploy

  • Start with Cloud account
  • Start master instance
  • Use Cloudman web interface
  • Multiple types of clusters availabile

Galaxy Cluster

  • Used for Genomic Science
  • Web based platform

Value Added features

  • Customise your instance, add tools, add image, snapshot images, share images
  • Auto scaling
  • Flexible architecture ( openstack, Amazon , etc)

Open Programming Lightning Talks

Adam Harvey

  • Not all are sites are facebook
  • Big frameworks are overkill for some people
  • Microframework – “Under 1000 lines of code”
  • Silex
    • autoloads in lots of extra code
    • 33,086 lines of php code being pulled in
    • Not very micro 🙁
  • Slim
    • Autoload – 6000 lines
  • Flight
    • Autoload – 800 lines of code
  •  Maybe just use raw PHP instead of a framework

Paul L – The Poor Man’s SANbox

  • Allow people to enter python code into program
  • Way to stop them doing bad stuff was over my head

Dave Boucher – Yak Shaving

  • Transactional memory – red/black tree insertion
  • Graphically show how RB tree inserts something
  • Use SVG library in python
  • SVG has animations
  • Pretty!

Tom Sutton – Safe Strings in Haskell APIs

  • Turn on OverloadedStrings
  • Create customer datatype
  • Can put special string types that don’t do things like concatenation if you don’t want (eg for a special type with SQL commands)
  • andhetalkedsofastatheendIcouldn’tundertsandhissolution

Roger Barnes – poker, packets, pipes, and python

  • Wanted a poker buddy
  • packet caputure between andriod app and server while playing online poker
  • ngrep
  • Hack your router to get Linux on it
  • Grab stream of info – all plaintext!
  • ipython notebook
  • parsing game, map card values
  • Need live capture data
  • Solution: ssh + ngrep + pipes
  • watch out for buffering
  • grab poker value and hints into lookups tables

Benner Leslie

  • Python and Haskell
  • Embed Haskell code into python
  • Wanted to keep writing python most of the time and only use Haskell where it was needed
  • Combine using foreign C-types

Nico – LatProc and Clockwork

  • Libary for tools process control
  • Controls machine that gets wool samples from bails of wool
  • latproc on github

Duncan Rowe – Some commands I’ve developed over the years

  • pd – keeps recent dirs in stack , allows you to skip to them
  • sfl – searchs for strings in multiple paths
  • bak – backup a file, just renames to filename.bak , various other options

Russell Stuart – PAMPython

  • PAM in python
  • PAM modules normally require C
  • Can do various PAM functions in python
  • Good for one-shot commands
  • Sneaks in under all the programs that depend on PAM

Peter Chubb – When Arduino is not enough

  • Stellaris launchpad just $12
  • RaspberryPi $35
  • Odriod U: quad core 2G RAM, 3W – PC like performance $69

 

Share

lca2013 – Day 2 – Session 2

Open Govt Miniconf – Open data panel

  • Cassie Findlay – State records – NSW
    • Mostly hard copies records but digital archives initiative
    • Started Open Data Project, making metadate as open data
    • API to catalogue
    • OpenGov NSW website
      • Mostly Annual Reports from Govt Agencies
      • Place for other docs to be released in future
      • Copies of last 20 years of Govt Gazettes
      • New API
  • Christen Normal – ACT Govt
    • Even when govt data available, often in hard to use formats
    • Needs to be more in line (format wise) to what community expects. APIs not PDFs
    • Variety of reasons why people want data, journalists, public, many with short time
    • New website, web servers API, download data
    • Technology not problem, people just have different attitudes
    • Expensive, not my job, people will find something was done wrong
  • John Billia – Aus Govt ICT management Office
    • Covers emerging technologies
    • Looking at Big Data over last 6 months
    • If used effectively will led to better govt
    • Concerns to pricacy, wishes of citizens as to if/who their data shared
    • Problems with governments of projects, better data might improve outcomes
    • Example was open source policy a few years ago
      • Checks every govt tender to make sure complies with Open Source policy
    • Also managing Whole of Govt transition to IPv6
      • Just under half Govt Agencies have already enabled IPv6 on external facing websites etc
      • More that half of rest will be up by end of Q1 2013 and rest by Q2 or Q3 2013
  • Julian Carver – Christchurch Earthquake recovery agency
    • NZ Declaration on Open and Transparent data
      • Active programme of release of data
    • Example “Charities register” , data via API
    • Example “ASB Property Guide” – brings up property data
    • Example “Info Connect” , transport data, used by 12 apps, eg traffic delays, looking at traffic flows to predict economic data
    • Best way to happen was to built it ourselves and embarrass govt
    • Ask people what they want released
    • Compulsory for agencies to release data
    • data.govt.nz ahead of Aus Central and State Govt totals

 

Share

lca2013 – Day 2 – Session 1

Today I thought I’d go between miniconfs a little looking for talks I’m interested in. I was originally going to come mostly to openstack but I some of the talks seem a little specialised and there are some good talks elsewhere. Hopefully everybody will to close to schedule so switching will be easy.

Introduction to OpenStack – Joshua McKenty

Worked at NASA

Components increasing a rapid rate but most computer, networking and storage

Almost all commiters are paid to work on openstack

Quantum/Networking: Lots of paid plugins alongside free options

Generally a bit of an intro to the openstack organisation rather than the technology that I was expecting.

The WebKit Browser Engine – An Overview – Dirk Schulze

What is webkit?

  • Just a browser engine
  • Used in Safari, Chrome and others
  • High market share on mobile, lower penetration on desktop

Components

  • Webcore – triggering load pages, actually drawing, calculating layout,
  • Javascript engine (alternative used in Chrome is V8)
  • “Webkit” – platform dependent stuff, access graphic libraries (gtk, qt)

Webcore

  • html document + CSS + Javascript
  • Parse HTML docs, get dom elements, create a DOM Tree
  • Everything in DOM tree can be accessed from javascript
  • Render Tree just has stuff needed to render the page
  • Eg <head> element doesn’t get into render tree but are in dom tree, same with “display: none” elements
  • Some elements are in render tree but not Dom tree (eg anonymous blocks)
  • Renderlayer – render elements above/below other elements ( eg using – style=”z-index: 1;” ), stacking context ( also via opacity, filter or mask )

Render Object

  • Layout – dimension of the element ( height, width, plus borders, positioning )
    • So if updated only need to repaint affected area
  • Paint – Draw element on screen
  • Multiple paint phases called from RenderLayer
    • Background, borders
    • floating content
    • inline content
    • Based on CSS boxing model, code follows spec
  • Hit testing
    • Pointer events – homer, onmouseover
  • SVG – different from other renderers
    • One element – one renderer
    • No CSS boxing model, No anonymous blocks, different handling on transformations

Implementing new elements and Interfaces

  • Look at the specification

 

Share

Linux.conf.au – Day 2 Keynote: Radia Perlman

Tuesdays Keynote was by Radia Perlman of Network Protocol Folklore

General protocol about protocol design

  • Need more people in the field that hate computers
  • Autoconfiguration
  • Knobs if you want them
  • Be evolutionary if possible

Networking is taught as if TCP/IP arrived from the sky. As if nothing else ever existed

She teaches by looking at a problem and looking at how different protocols solve it

Comparing technologies

  • Nobody knows both of them
  • Everybody is partisan
  • Both moving targets
  • Hard to compare via benchmarks since people are just comparing implementations rather than actual technology

The Story of Ethernet

  • ISO 7 layer model
  • Ethernet was intended to be layer 2, neighbour to neighbour, was are packets forwarded
  • Ethernet physical was a new type of link, multiple nodes on single link
  • But we haven’t done CSMA/CD networks for years
  • No hopcount field in Ethernet since never occurred to designers that people would be forwarded the packets
  • People started building networks layer-2 only without layer-3
  • Needed to forwarded Ethernet between networks, but had to work with existing ethernet packets -> Bridge
  • Spanning tree reduced created loop-free subset of the topology

Why is wrong with IP as L3 protcol

  • Every link must have own address block
  • Configuration intensive
  • In 1992 Internet could have adopted CLNP but NIH
  • Also advantages not obvious then since things like DHCP, NAT so advantages of CLNP not as obvious

TRILL

  • Switches run routing protocol between themselves
  • Replaces spanning tree (switch by switch basis)
  • Wraps ethernet packets in trill headers, forwards to other trill switch and then unwraps
  • Various ways to link which end devices are behind which trill switch
  • Link state routing between trill switches to create shortest paths
  • Can upgrade switches to trill one by one an “just starts working better”
  • Anything can do the trill encapsulate/decapsulate

Similar to TRILL

  • VXLAND / NVGRE
  • Wrap IP rather than ethernet

Protocol Forklore

  • Version number
    • What is the purpose?
    • What is the new protocol vs the old protocol?
    • Envelope says how to parse the header (how to parse the packet)
    • Need to define what node does when it sees a different version number
  • Parameters
    • minimise these
  • Latency
    • cut-through – forward before you have received the whole thing
    • Destination should be near the start of the header
    • tcp has checksum so need to see the whole header before you forward

 

Share

Linux.conf.au – Day 1 – Keynote

The Future of the Linux Desktop – Bdale Garbee

General career update, retired from HP
Doing rocket electronics business – Altus Metrum
Involved with Freedom Box

Is 2013 finally the year of the Linux Desktop?

Percentage of people whose main desktop is on a desk is dropping (although desks more common a LCA that possibly elsewhere)

Not everything needs a “desktop” interface (eg fridges, TVs)
Desktop is interface to Universal computer environment

  • Email, web, design, software development, accounting, managing a small business, presentations
  • User is completely in charge

Will Linux ever displace Windows?

  • Some big deployments
  • Cost of change can be high , re-education of users, people know applications instead of concepts
  • OEMS have strong dis-incentives
    • Offering to “reduce their software expense” is a non starter
    • Pre-loaded Windows does not cost OEMs large money, can be net-revenue source
    • Joint marketing opportunities with software vendors

Will Linux ever displace Apple?

  • Wall Gardens can be very beautiful, alluring… captivating
  • Mac OS X
    • Credible technical base
    • Plausible additional target for free software applications
  • iOS
    • Oh please! World most proprietary operating environment
    • Hard to ship free software
    • Hostile to hardware devs

Many desktop devs have been lured to mobile

  • Core technology elements certainly relivant
  • So much effort applied to lot of things that didn’t make it
  • Android consumes open source, uses lot of open source but ecosystems arn’t really open
  • They are not a universal computing enviroments

Is this work on mobile useful to us?

  • Can one UI really span all things? The idea is certainly appealing…
  • Interface capabilities vary widely
    • keyboard centric vs touch centric
    • Screen size

Personal computers with Free Software were meant to empower!

  • Any user *can* become a developer, every dev is a user
  • Expanding the user base by reaching more people is laudable
    • Accessility, multi-lingual, appealing to non-geeks

Feeling abandoned by Linux desktop developers

  • Confusion over target audenience
  • Not eating their own dogfood
  • Huge piles of software that interfaces in complex ways makes it hard for users to become developers
  • Was with bunch of Gnome devs, none of them uses evolution to read email
  • Not scratching our own itches

Tradeoffs associated with encompassing apps, system functions

  • eg Gnome desktop relationship with network manager

XFCE4 as Debian Wheezy’s default

  • Gnome too big to fit on single OS install CD
  • Most distributed have moved to DVD image but Debian wants to stay with credible single CD option

Why can debian easy change desktop without hurting users
What really matters: Applications

  • Desktop doesn’t really matter, it just gets in the way
  • Want to use any application with any desktop
  • Linux gives us the ability to multitask, don’t take it away

What really matters: Efficiency

  • Buy a faster computer should mean applications run faster
  • For most modern computing, battery life is a really big deal
    • Composting is expensive
    • Shiny can be fun, but is all the “bling” really worth the cost?
    • Oh, and because my laptop is my desk, please don’t cook my legs

What really matters: Customizing

  • Users won’t to customise
    • Personalisation is part of taking ownership
    • Investing time is okay as the returned value persists
  • Ability to automate things that are repetitive
    • Scripting is valuable part of Unix heritage
    • Don’t hide access to text interfaces too deeply
  • Coping with the industry infatuation with 1366×768 displays
    • Waste as few pixels as possible on “decorations”
    • Vertical “panel” support

What really Matters: Hackable

  • The real reason I run free software
    • I’ve known since I was a kid I was a “tool maker”
    • Immense gratification from fixing and sharing the fixes
  • I want to be able to undertsand and fix the software I use
    • Gave up trying to get evolution to build
    • Complexity gets in the way of “casual contribution”, killing the long tail effect!
    • Linux kernel has many devs that just submit single patch
  • I want yo be able to share easily with others
    • Any app should work on any desktop
    • Ability to push patches upstream

What does all this mean?

  • Fell good about how Linux is winning in the mobile space!
  • Pick realistic goals.. can’t easily convert OEMs from Windows
  • We should build the kind of systems *we* want to use!
  • Collaborative development model is awesomely powerful
    • Differentiate in interoperable ways!
    • Empower users to be developers so we can get long tail effect
Share

Linux.conf.au 2013 – Day 0 Sunday

So I am off to my 10th Linux.conf.au ( every year from 2004 ) in Canberra Australia.

To get there I flew over to Sydney ( leave 7am Sunday, arrive 8am ) and then took the bus down to Canberra (leave 10:15 , arrive 13:30 ) and then the LCA people organised a bus for us from the bus station to the halls.

I’m staying at John XXIII Hall on campus. As you may guess from the name it is a Catholic College which means there are photos of the Pope on the condom machines in the toilets. No aircon in the rooms but that is pretty common, wired Internet was working though.

Signup was pretty effecient this time around (apart from a big queue at start ), we just gave over a piece of paper with our name on it, they typed our name in the the software and printed out our badge.

This year the bags were pre-placed in our room with our stuff in it, which is a good idea since it speeds up registration. Stuff in my bag was a t-shirt and a hygiene pack ( with shampoo, conditioner, soap and sunblock ).

I went for a walk with Devdas Bhaget to look for some lunch, it was pretty hot so hard going. We got a little lost too and were unable to find the pub so just ended up getting some snacks for lunch and heading back. Later I went with a group of people out to the Pub for dinner

Share

Links: Parody trailers, Obama fundraising, Japan!, American academic jobs

Share

Links: Exercise, Radio NZ Hosting, Scale in the Cloud, Philip Roth

Share

Speeding up Varnish Cache

TLDR; Don’t send http requests though more tests than you need to, especially 100+ item regexp comparisons.

At work we run Varnish Cache in front of our websites. It caches webpages pages so that when people request the same page it gets served quickly out of cache rather than having to be generated by the application each time. Since pages only change every few minutes (at most) and things like pictures barely every change we can serve 98-99% of request out of cache and handle the website with a lot fewer servers.

Last week I noticed that one of our varnish servers (we have several) was using about 60% of CPU (60% on each core) to serve just 2600 hits/second. In the past we’ve seen the servers get a little overloaded at around 4-5000 hits/second while people on the varnish mailing list report getting over 10,000 hits/second easily. I decided to spend a few hours playing with our varnish config to see if I could speed things up.

I suspected the problem was with some regexps we had in vcl-recv which is run for every request received by the cache. But first I setup a test enviroment to help me trace the problem.

  1. Install Varnish on a test VM ( I’ll call it “server1” ) with our production config
  2. Run varnishncsa on a production box for a while and copy over the logs  ( I copied 1.6 million lines ) to another VM “client1”
  3. Install some http benchmarking software on client1
  4. Both server1 and client1 were since CPU ( single core ) with 1GB of RAM on server1 and 750MB on client1.

I actually found the benchmarking software to be a pain. I tried httperf, ab, and siege and found they all had their limitations. The hardest bit was we run multiple domains and it was hard to tell a program to “run their this list of URLs and send them all to this IP” so I ended up just creating about 30 host file entries.

After a bit of playing around I ended up using siege for the testing and using the command line “siege -c 500 -d 1” which generated 1000 requests/second and the options “internet = true” (to pick random urls from the file). I used a list of 100,000 urls from production. I found that sending this many requests used about 68% of the CPU on my server1 while sending more requests or using a larger list of urls tended to overload client1. For the back-end I just used our production servers.

To test I ran siege for at least minute ( to get the cache full ) and then ran vmstat and varnishstat to get the CPU and hitrate once this settled down. The CPU usage jumps around by a couple of percent but the trends are usually obvious.

Now for some actual testing. First I tested our existing config:

Original Config                              CPU: 70%    Hit rate: 98%

Now I switched back to the default varnish config. The hitrate drops a huge amount since our config doesn’t things like remove “?_=1354844750363” at the end of URLs

Default Config                               CPU: 41%    Hit rate: 86%

Now I started adding back bits from our config to see how load increased.

Default + vcl-fetch + vcl-miss + vcl-error   CPU: 39%    Hit rate: 83%
Above   + vcl-deliver                        CPU: 44%    Hit rate: 82%
Above   + production backend configuration   CPU: 41%    Hit rate: 82%
Above   + vcl-hit and expire config          CPU: 43%    Hit rate: 82%
Above   + vcl-recv                           CPU: 68%    Hit rate: 98%

So it appears that the only bit of the config that makes a serious difference is the vcl_recv.

Our vcl-recv is 500 lines long and looks a bit like VCLExampleAlex from the Varnish website. It included:

  • 6 separate blocks of “mobile redirection” code. 2 of these or’d about 120 different brands in User-Agent header. 3 of these applied to production and there were staging copies of each of them too.
  • About 20 groups of URL cleanup routines, many with ” if ( req.http.host == ” so they applied to only one domain.
  • Several other per-domain routines ( which did things like set all request for some domain to “pass” though the cache )
  • Most of the “if” statements were fuzzy regexp matches like ” if ( req.url ~ “utm_” )

Overall there were 32 “if” statements that every request went though.

I decided to try and reduce the number of lookups the average request would have to go though. I did this by rearranging the config so that the tests that applied to all requests were first and then I split the rest of the config by domain:

if ( req.url ~ "utm_" ) {
 set req.url = regsub(req.url, "\&utm_[^&]+","");
 set req.url = regsub(req.url, "\&utm_[^&]+","");
}
if ( req.http.host == "www.example.com" ) {
 set req.url = regsub(req.url, "&ref=[^\&]+","");
} else if ( req.http.host == "media.example.com" ) {
  # Nothing do do for media domain
} else if ( req.http.host == "www.example.net" ) {
 return (pass);
} else {
 # Obscure domains
 if  ( req.http.host == "staging.example.com" {
 return (pass);
 }
}

Specific bits I did included:

  • Make the most popular domain that the top of the config so they would be matched first
  • Put domains that got very few his into the default “else” rather than wasting them on their own “else if”
  • The media domain got the 2nd highest number of hits but had no special configs so I gave it it’s own “empty” routine rather than letting it fall though to the default “else”

So recalling what I previously had here is the improvement:

Original Config                              CPU: 70%    Hit rate: 98%
Split vcl-recv by domain                     CPU: 44%    Hit rate: 98%

I then removed the per-rule domain tests since the rules were now within a single test for that domain and got:

Don't check domain in each rule              CPU: 42%    Hit rate: 98%

After some more testing I deployed in production

The next step I did was update the mobile redirect rules so that instaed of them going:

if  ( req.url ~ "/news.cfm" || req.url ~ "/article.cfm" || req.url == "/" )
  && req.http.user-agent ~ ".*(Sagem|SAGEM|Sendo|SonyEricsson|plus another 100 terms

for each request to the domain instead wrap the following around them

if ( req.url == "/" || req.url ~ ".cfm" ) {
}

so that only a small percentage of requests would need to be processed by the ” Giant regexp of Doom™ ”

I tested this and got:

Wrapper around mobile redirects             CPU: 36%    Hit rate: 98%

On an actual production server I got the following with around 600 hits/second

Original Config                      CPU: 25%
Split by Domains                     CPU: 18%
Split by domains + mobile wrapper    CPU: 11%

So overall a better than 50% reduction in CPU usage.

Share

Links: A/B testing, Road safety, Where IT goes to die, The Cheapest Generation

  • 23 Tips on How to A/B Test Like a Badass – Really great article, mostly applicable to ecommerce sites but plenty of general ideas. I’d like to say I do this at work but unfortunately the culture isn’t there.
  • What an RAF pilot can teach us about being safe on the road – Interesting view on how easy it is for motorists just “not to notice” cyclists. I commonly see cyclists these days with flashing lights (front and back) turned on during the day to help improve their visibility.
  • The Cheapest Generation – How the consumer wants of young adults differe from those of the previous generation(s). Phones & Walkable neighbourhoods are in, Cars and big houses in the suburbs are no longer as important.
  •  Where IT goes to die – How large company/Enterprise IT works from an author with experience at smaller companies. He is accerate and it is not very pretty.
Share