When Everything Falls Apart: Stories of Version Control System Scaling – Ben Kero
- Sysadmin at Mozilla looking after VCS
- Primarily covering mercurial
- Background
- Primarily mercurial
- 3445 repos (1223 unique)
- 32 million commits
- 2TB+ transfer per day
- 1000+ clones per day
- Biggest customer = ourselves
- tested platforms > 12
- Also use git (a lot) and a bit of: subversion, CVS, Bazaar, RCS
- 2 * ssh servers, 10 machines mirror http traffic behind load balancer
- 1st story – know what you are hosting
- Big git repo 1.7G somebody asked to move off github
- Turned out to be mozilla git mirror, so important to move
- plenty of spare resources
- But high load straight away
- turned out to be mercurial->git converter, huge load
- Ran garbage collection – took several hours
- tweaked some other settings
- 2nd story
- 2003 . “Try” CI system
- Simple CI system (before the term existed or they were common)
- flicks off to build server, sends status back to dev
- mercurial had history being immutable up until v2.1 and mozilla was stuck on old version
- ended up with 29,000 brashes in repo
- Around 10,000 heads some operations just start to fail
- Wait times for pushes over 45 minutes. Manual fixes for this
- process was “hg serve” only just freezein gup, not any debug info
- had to attached debugging. trying to update the cache.
- cache got nuked by cached push, long process to rebuild it.
- mercurial bug 4255 in process of being looked at, no fix yet
- The new system
- More web-scalable to replace old the system
- Closer to the pull-request model
- multi-homing
- leverage mercurial bundles
- stores bundles in scalable object store
- hopefully minimal retooling from other groups (lots of weird systems supported)
- Planet release engineering @ mozilla
SL[AUO]B: Kernel memory allocator design and philosophy – Christopher Lameter
- NOTE: I don’t do kernel stuff so much of this is over my head.
- Role of the allocator
- page allocator only works in full page size (4k) and is fairly slow
- slab allocator for smaller allocation
- SLAB is one of the “slab allocators”
- kmeme_cache , numa aware, etc
- History
- SLOB: K&R 1991-1999 . compact
- SLAB: Solaris 199-2008 . cache friendly, benchmark friendly
- SLUB: 2008-today , simple and instruction costs count, better debugging, defrag, execution time friendly
- 2013 – work to split out common code for allocators
- SOLB
- manages list of free objects with the space of free objects
- have to traverse list to find object of sufficient size
- rapid fragmentation of memory
- SLAB
- queues per cpu and per node to track cache hotness
- queues for each remote node
- complete data structures
- cold object expiration every 2 seconds on each CPU
- large systems with LOTS of CPUs have huge amount of memory trapped, spending lots of time cleaning cache
- SLUB
- A lot less queuing
- Pages associated with per-cpu. increased locality
- page based policies and interleave
- de-fragmentation on multiple levels
- current default in the kernel
- slabinfo tool for SLUB. tune, modify, query, control objects and settings
- can be asked to go into debug mode even when debugging not enabled with rest of the kernel
- Comparing
- SLUB faster (SLAB good for benchmarks)
- SLOB slow
- SLOB less memory overhead for small/simple systems (only, doesn’t handle lots of reallocations that fragment)
- Roadmap
- More common framework
- Various other speedups and features