Linux.conf.au 2015 – Day 5 – Session 2

When Everything Falls Apart: Stories of Version Control System Scaling – Ben Kero

  • Sysadmin at Mozilla looking after VCS
  • Primarily covering mercurial
  • Background
    • Primarily mercurial
    • 3445 repos (1223 unique)
    • 32 million commits
    • 2TB+ transfer per day
    • 1000+ clones per day
    • Biggest customer = ourselves
    • tested platforms > 12
  • Also use  git (a lot) and a bit of:  subversion, CVS, Bazaar, RCS
  • 2 * ssh servers, 10 machines mirror http traffic behind load balancer
  • 1st story – know what you are hosting
    • Big git repo 1.7G somebody asked to move off github
    • Turned out to be mozilla git mirror, so important to move
    • plenty of spare resources
    • But high load straight away
    • turned out to be mercurial->git converter, huge load
    • Ran garbage collection – took several hours
    • tweaked some other settings
  • 2nd story
    • 2003 . “Try” CI system
    • Simple CI system (before the term existed or they were common)
    • flicks off to build server, sends status back to dev
    • mercurial had history being immutable up until v2.1 and mozilla was stuck on old version
    • ended up with 29,000 brashes in repo
    • Around 10,000 heads some operations just start to fail
    • Wait times for pushes over 45 minutes. Manual fixes for this
    • process was “hg serve” only just freezein gup, not any debug info
    • had to attached debugging. trying to update the cache.
    • cache got nuked by cached push, long process to rebuild it.
    • mercurial bug 4255 in process of being looked at, no fix yet
  • The new system
    • More web-scalable to replace old the system
    • Closer to the pull-request model
    • multi-homing
    • leverage mercurial bundles
    • stores bundles in scalable object store
    • hopefully minimal retooling from other groups (lots of weird systems supported)
  • Planet release engineering @ mozilla

SL[AUO]B: Kernel memory allocator design and philosophy – Christopher Lameter

  • NOTE: I don’t do kernel stuff so much of this is over my head.
  • Role of the allocator
    • page allocator only works in full page size (4k) and is fairly slow
    • slab allocator for smaller allocation
    • SLAB is one of the “slab allocators”
  • kmeme_cache , numa aware, etc
  • History
    • SLOB: K&R 1991-1999 . compact
    • SLAB: Solaris 199-2008 . cache friendly, benchmark friendly
    • SLUB: 2008-today , simple and instruction costs count, better debugging, defrag, execution time friendly
  • 2013 – work to split out common code for allocators
  • SOLB
    • manages list of free objects with the space of free objects
    • have to traverse list to find object of sufficient size
    • rapid fragmentation of memory
  • SLAB
    • queues per cpu and per node to track cache hotness
    • queues for each remote node
    • complete data structures
    • cold object expiration every 2 seconds on each CPU
    • large systems with LOTS of CPUs have huge amount of memory trapped, spending lots of time cleaning cache
  • SLUB
    • A lot less queuing
    • Pages associated with per-cpu. increased locality
    • page based policies and interleave
    • de-fragmentation on multiple levels
    • current default in the kernel
  • slabinfo tool for SLUB. tune, modify, query, control objects and settings
  • can be asked to go into debug mode even when debugging not enabled with rest of the kernel
  • Comparing
    • SLUB faster (SLAB good for benchmarks)
    • SLOB slow
    • SLOB less memory overhead for small/simple systems (only, doesn’t handle lots of reallocations that fragment)
  • Roadmap
    • More common framework
    • Various other speedups and features

 

Share