- How Linux Containers and Docker will change how we develop and deploy apps (pdf) – Interesting Talk from Scale12x conference
- What Linux can learn from Solaris performance and vice-versa ( slides, also on youtube) – Another good Scale12x talk
- The Littlest Boy – Backpack nukes and the Us Troops trained to deliver them
- The New York Times’ List of Potential New Countries, and Others As Well – Interesting list
Linux.conf.au 2014 – Day 5 – Finish
Winner Rusty Wrench Award: Andrew Tridgell
Host of LCA2015: Auckland!!
Website: lcauckland.org.nz
Lightning Talks part 2
- My toothbrush has a serial number
- after sales support
- can they find it for me?
- In the post-Snowden world this should be investigated
- DIY Book Scanning for Fun
- Scanned book useful for good reasons
- diybookscanner.org
- Freedom Box project update
- Almost ready for 0.2 release which will be pretty good
- OneRNG
- Open Hardware, Random number generator
- Trustable, see raw or AES whitened
- trying various options
- onerng.info
- Central Coast LUG
- cclugtmp@gmail.com
- Bitcoin Myths
- Anonymous – Nope, all transactions records
- Bubble – nope, infrastaructer
- Giant Ponzi scheme – Not sold as investment, no claims
- dlect – Lecture recording downloader
- uqlectures.sf.net
- Looking for help and the extend to other Universities
- Debian in Australia
- Trying to get Debian Australia mailing list started
- Bitcoin architecture applied to capital markets
- Learning Opportunities in Rocketry Software
- Maths makes by head hurt
- Electronic Frontiers Australia
- Would like to invite you to volunteer and drink beer
- LA does other things
- pycon AU in August
- Drupal camps
- Barcamp
- Join a user group
- hacker space
- add your blog to our planet
Linux.conf.au 2014 – Day 5 – Session 2
Provisioning Bare Metal with OpenStack by Devananda van der Veen
- Tried to use the existing NOVA tool (which was used for VM provisioning) but all sorts of limitations
- Hacked it a lot and then gave up
- Created a new project “Ironic” for bare-metal provisioning in May 2013
- Status
- Being working on but lots of devs
- Not in main release yet
- Some push to simplify openstack installation – Triple-O (Openstack on Openstack)
- “Openstack is not a virtualisation layer”
- Security not there yet
- Driver Interface
- 3 classes of interfaces: core, common, vendor
- core: power management, deploy,
- common: console, rescue
- vendor: fireware? boot-from-volume? something-else?
- Architecture
- REST API
- DB
- conductor services
- RPC
- ( the slide makes more sense )
- Many conductors, Many drivers
- If the cluster changes
- take-over hooks
- consistent has updates
- node(s) re-mapped to conductors
- Feature equiv to bare-metal in a few months
- Usable by that point, eventually to replace bare-metal, may require re-deployment
Talk only took 20 minutes. That was quick
Linux.conf.au 2014 – Day 5 – Session 1
D-Bus in the kernel by Lennart Poettering
- kdbus
- Most newer OS designs started around powerful IPC
- Linux only had IPC primitives ( sockets, fifos, shared memory)
- DBUS is powerful IPC
- Method Call Transactions, Signals, Properties, OO, Broadcasting, Discovery, Introspection, Policy, etc, etc etc
- D-Bus has it limitations
- Suitable only for control, not payload
- It’s inefficient ( 10 copies, 4 complete validations, 4 context switches per duplex method call transaction)
- Can we optimise that rather than moving it into the kernel? – “We are, We are moving it into the kernel with kdbus”
- Credentials one can send/recv are limited
- No implicit timestamping
- Not available in early boot, initrd, late boot
- Hookup for security frameworks happens in userspace
- Activatable bus services are independent from other system services
- Codebase is a bit too baroque, XML
- No race-free exit-on-idle bus activated services
- D-Bus is fantastic, solves real problems
- Right approach: good concepts, generic, comprehensive, covers all areas
- Established, it is the single most used local high-level IPC system on linux, bindings most languages
- Used in init systems, desktop, etc
- kdbus
- Suitable for large data (GiB), zero-copy, optionally reusable
- Efficient ( 2 or fewer copies, 2 validations, 2 context switches, per duplex method call transaction)
- Comprehensive credentials sent along ( uid, pid, gid, selinux, cgroup, caps, audit, etc etc)
- Implicit timestamping
- Always available from earliest boot to latest shutdown
- Open for LSMs to hook into from the kernel side
- Activation is identical to activation of other services
- Userspace is much simpler, No XML
- Rece-free exist-on-idle
- Overview
- Intention to replace current D-BUS implimentation
- Should be drop-in so program using bindings should be drop in without changing anything
- No performance tests yet
- Receiver buffers
- Single copy to destinations(s)
- method call Windows
- Name registry
- memfds
- File descriptors for memory regions
- Zero Copy
- Sealing
- At 512k and above zero copy is faster than single copy, below that the mapping overhead is too expensive
- a bit like Android ashmem
- Signal Broadcasting
- Bloom filters
- Userspace
- proxy provides compatibility with dbus1 sockets
- When?
- Code all there, pretty much implemented, policy still to do
- Compile-time switch in systemd
- We hope to get in the kernel in 2014
- gdbus support coming soon
VisualEditor: Wikipedia’s new editing system, and how you can use it too by James Forrester
- Why a visual editor
- Drop off in new contributors is major issue for wikipedia
- MediaWiki is based on the very rich wikitext syntax
- Wikitext stops people becoming contributors or staying away
- Not an appropriate way to ask people to contributive things in the 21st century
- Prevents us from providing other helpful editing tools
- Flexibility and modularity
- Modular
- support by 300 languages
- replace or extend existing tools
- expand it for editing new content
- What?
- Client-side JavaScript contentEditable HTML editor
- Stand-alone, integrate into any platform
- Mediawiki integration (parsoid converts HTML+RDFa to/from wikitext)
- 3 Main layers
- Data Model
- We can’t edit the input HTML+RDFa directly
- Reliable convert from HTML to things we can edit and back without corruption
- Synchronise with ContentEditable (both ways)
- Needs to allow real-time collaborative editing
- All sorts of problems with spliting characters in various languages
- Browser contentEditable
- Messes up HTML, inconsistent key handling
- Native text insertion, spell-check, selection and cursor
- Equal rendering
- Programmatic cursor handling
- Intercept the browser’s native cursor handling
- allows us to skip over non-editable elements
- Thought this will allow to avoid UX issues
- but this breaks internationalisation
- Move to “observe and fixup” of native browser action
- Input Method Editors
- Lots of languages reply on IMEs
- Mainly East Asian and Indic languages
- Each IME has it’s own unique way of changing content
- But can change by OS, browser, IME version
- SurfaceObserver
- Poll DOM for changes and notify model
- Lots of events emit no of few events (eg spellcheck)
- Possibly use MutationObserver in the future
- Re-render if really necessary
- Actual rich editing of stuff
- Annotation (bold, underline)
- Generated content (references, tables )
- Tool UX types – Dialogs
- Data Model
- Future
- Patches welcome
- Maybe can be integrated in wordpress or something ( but might be hard)
Linux.conf.au 2014 – Day 5 – Keynote – Jonathan Oxer
- How many people have wanted to work on something that went into space? – Everybody
- How many people have actually done that? – about a dozen
- How about if every kid could answer “tes” to that question
- Cubesats – Small standard way to luanch tidy stats – 10cm cubes
- ArduSat – Kickstart – June 2012
- Feb 2013 – Revises design of board
- Mar 2013 – New design built
- June 2013 – Hardware integrated and built – ArduSat-1 and ArduSat-X
- 3 Cubesats Launched to ISS include the 2 ArduSat’s
- Need to upgrade/update Experiments in flight
- Browser based updates
- Code sent via Internet to california, run on duplicate of satellite
- Transmitted to satellite
- Distributed to one of the nodes
- CubeSat System bus
- Supervisor node controls smaller nodes
- Stores stuff on Sd card
- Allocates experiment to node, powers up the node
- sends it via “serial port” to node
- Supervisor controls Power and reset and serial for each node
- node thinks it is just a stand-alone off the shelf Ardino
- Satellite
- 4-6 month life
- 12-15 experiments
- If allocated in 1 week blocks we can sell those for $300 each
- So much backlog and interest is plan to launch 30 ArduSats/year for next 5 years
- Make Satellite available
- Flyable version costs approx $50k
- Replace flying modules with open-source parts
- TODO
- Release of designs without attracting the lack helicopters
- Flight computer design
- Terrestrial sensor suite design
- Development of education material
- Wait there is more…
- Water powered cube-sat
- 5kg total mass, 2.5kg of water propellant
- On kickstarter
Linux.conf.au 2014 – Day 4 – Session 3
Disaster Recovery Lessons I Hoped I’d Never Have to Learn by Bdale Garbee
- Got backups?
- Do you keep a copy of your essential data off-site
- If someone called you and said your house was gone: completely devastating or just a really bad day?
- 11th June 2013
- Got no warnings of fire, saw it one km away and then got instant evacuation order
- Had 20m-1h to pack (unsure of time due to heat of situation)
- Went to leave and fire was on driveway, unable to get car out.
- Had to park car and firefighters walked them out.
- The fire
- Started 1pm 11th of June 2013
- 13,000 homes evacuated
- 94,000 acres evacuated
- 500 fire fighters involved
- Destroyed – 14,290 acres ( 57.8 km²) , 511 homes, 2 people
- Let back into property on 21 June
- Aftermath
- House pretty destroyed
- Normally 2000-2200 degrees, His house got to 3000 degrees plus Fahrenheit
- Most metals melted
- Ready to go – what would you grab?
- Usuaul pocket contents.. mobile phone, wallet, keys
- Photo albums and scrape books
- Jewelry
- Computers
- Stuff you havn’t thought about
- Mobile phone charger
- Identity documents
- Safe deposit box key
- Account numbers and contact info
- Hows your insurance
- Types and levels of coverage
- Structure
- Contents
- Loss of use
- Outbuildings
- Landscaping
- Replacement cost vs actual cash value
- Personal vs Business Property – Have you even taken a business tax deduction for any of these items?
- Costed out how much it would cost to rebuild old house, got that much money to build new cost (not the same)
- Types and levels of coverage
- Rapidly recoving
- Take care of human needs first
- Find a place to live.. being homeless sucks
- Minimise the amount of “throw away” stuff purchased ( don’t buy stuff you don’t want to keep, eg apartment-only stuff)
- Pick one or two things to “put back to normal” ASAP
- Mobile phone for my son
- Big-screen TV with cable service
- 3d printer
- art supplies
- Registries and wish lists
- Cleaning up
- Lots of metal extracted from the ashes – copper, silver
- Found very few intact items – a Japanese doll, a few other ceramics
- Since house taken out by embers around 2/3s of trees intact
- Recovering Hobbies and Small Business
- Back on the net quickly but not back in business
- Commitment to AJ and Mike Beatie for rocket launch
A web page in seven syscalls by Tollef Fog Heen
- “Most of us, well at least the ones of us who have CS degrees”
- Modern Computers – Cores, Caches, 64 bit, Virtual memory
- Massively multi-threaded – usually 1000 threads or more
- Relative performance of CPU -> system calls -> Disk access
- Varnish from HTTP’s point of view is a origin server not a “http cache”
- Two processes
- manager process – starts child, compiles config, watchdo on child and restarts if dies
- Child process, handles all connections, handles storage
- params shared via shared memory
- VCL
- Domain specific language
- Compiled into C when loaded
- Can escape to C from within VCL
- Can import plugins
- Optimisations
- Avoid syscalls
- Memory workspaces
- Length-counted strings
- Threads
- Don’t fight the VM, use hints
- Don’t copy data
- LIFO schedule threads
- Preallocate
- Accept filters
Linux.conf.au 2014 – Day 4 – Session 2
Is it safe to mosh? by Jim Cheetham
- Replacement for ssh remote terminal connectivity, uses udp
- http://mosh.mit.edu/
- Remote terminal applications, changing IPs, intermittent connectivity, more robust and responsive than ssh
- It is safe? It depends…
- Use cases differ, requirements differ
- Highpoints
- No “disconnect” when roaming/sleeping
- SSP remains responsive; Control-C works when cat’ing a large file or big “find”
- Instant predictive local echo
- Very clean UTF-8 terminal
- In all the main distros
- Run from userspace
- Demo “Luckily one of the things I need is an unreliable network”
- Cloud at cost – cloudatcost.com – $35 VM for life
- connect via ssh, run mosh-server, disconnects and reconnects back via mosh
- Problems
- Terminal scrollback is not yet implimented
- “If you want scrollback, go get tmux. If you’ve got screen, go get tmux”
- Logging is not mature
- Server may live after client has died
- SSP transport
- diff and patch are the two main methods
- RTT times are tracked
- delayed acks reduce traffic requirements
- 3s heartbeats keep the session alive
- SSP Datagram
- PAyload from transport layer is opaque
- AES-128 protects the payload
- UDP – receives packets from anywhere
- Timestamps everything – maintain RTT estimates
- SSP authentication
- 63 bit monotonically increasing, unencrypted
- out of order packaets discarded
- at 2PB the session dies
- Payload must decrypt – not realistic to brute-force
- SSP allows roaming
- The server knows where the client was
- But doesn’t care – utmp is updated though
- Other protocols are “protected” by having fixed network endpoints – which can be spoofed
- Roaming
- IP shouldn’t have tied IPs to location, but too late now
- SSP is designed to ignore IP address
- What is safety
- Risk = Likelihood * damage
- If client or server is compromised then session can always be taken over
- What is unsafe
- Connections from known-bad locations – known in advance
- Connections from known-comprimised users – detected by behaviour
- Connections to insecure software – Prohibited by administrator
- Good and bad habits
- ssh password vs keys
- Detached terminal sessions with privilege
- YES for home users and Small business
- POSSIBLY for Enterprise users
Below The Line: Fixing The Voting Process With Technology by Benno Rice
- Australian Senate
- So many people vote above the line because it is only one tick, below the line up to 100 seperate votes
- If you vote above the line then you accept the order of preferences from the people you voted for
- Can get party preference lists from Australian Electoral comission
- Create a custom “how to vote card”
- Site ideas
- Store nothing
- Just do it
- First site 2010
- Python
- javascript, jquery, sortable
- ballot renderer – python, reportlab, WSGI, truly awful code
- Hosted on dreamhost
- Melted on polling day
- Typed in the data by hand, it was not fun
- 2013 version of site
- Got data in csv from AEC
- Also did lower house (Geo lookup to find electorate)
- Store and share ballots
- Can shuffle parties as well as candidates
- Links to party websites
- Ruby
- Javascript – Angular , ui.sortable
- Ballot renderer – Python – reportlab
- Geolocation – AEC has division boundaries mapped and availbale
- PostGIS, Python, Google Maps API
- Storing and sharing – python, redis
- Ballot rendering in html – ruby, Haml, Reactive via bootstrap
- Ballots stored under a random identifier that was never reused
- Rackspace hosting – free hosting
- Cloudflare as CDN
- 2600 concurrent users
- 165,000 unique visitors
- 34,000 PDFs
- Conclusion
- The senate voting system is broken
- You too can change the world
- Just do it
- 20+ people in the room used the site to vote below the line
Linux.conf.au 2014 – Day 4 – Session 1
Programming Diversity by Ashe Dryden
- What is diversity
- More than gender
- backgrounds experiences and lifestyles
- not always visable
- sexuality, age, language, class, race, ability
- Terms
- Intersectionality
- The interaction of traits (race, sex, etc) and how people treat you beacuse of that
- Privilege
- unearned advantages a person gets for a perceived trait
- Education, access to technology, higher pay, assumed competency, quality of network
- Seen as a skill-set instead of traits
- Easily fit/identify with subculture
- Stereotype Threat
- Worry you will confirm the stereotype that is applied to you
- Lots of pressure
- Imposter Syndrome
- Unable to internalise their accomplishments
- almost anyone can suffer
- less likely to apply for jobs, apply to talk at conferences or even attend conferences
- Marginalised
- Doesn’t fit into the default groups
- their needs or desires being ignored
- Even marginalised groups marginalise others, nobody is trait blind
- Intersectionality
- Women are about 20% of tech
- Maybe Women aren’t into programming
- Women like Grace Hopper prominat in field early
- No physical of biological difference in race or gender affecting programming ability
- Bulgaria
- 73% of CS Students are women
- teach children in schools that STEM is important to everybody, push everybody towards it
- Diversity matters –
- Companies that are more diverse tend to have better sales, profits, etc
- Diverse teams:
- solve complex problems faster
- more creative and stimulated
- get better decisions and generate for something
- financial viability and success
- Why lack of diversity?
- Pipeline
- Difference in toys and games for boys and girls
- no famous role models that represent them
- Access to technology. On average Boys get first computer at age 11, girls at age 14. Early teens great best age to learn and retain skills
- Geek stereotypes
- people who don’t identify and aren’t represented by the geek stereotype are turned off by those who do
- Attrition
- 56% of women leave tech in 10 years
- twice the rate of men
- our Grandmothers more likely to be programmers than our granddaughters are
- Why attrition?
- Harassment
- People in marginalised groups twice as likely to report being harassed or mistreated
- men 2.7 more likely to be promoted to higher ranking positions
- Why can I do about this stuff?
- Change starts with us
- educate people who don’t understand this problem
- Get to know people different from us – talk to people wearing a specific color that day
- Follow people on twitter that are different from you
- bias & discrimination are often subtle
- learn to apologize
- Talk about these issues openly ” That’s not cool 🙁 “
- increase education and access
- Facilitate event for marginalised groups
- work with colleges and universities to remove bias
- “have you programmed before?”
- Thinks about what the “about” page of your website looks like
- Think about the company culture
- Job listing language and requirements – joblint.org
- Interviewing
- equal pay
- mentoring and career goal attainment
- Pipeline
From Kookaburra to the Cloud: where to now for copyright in Australia by Ben Powell
- Several recent cases
- Australian law deals by exception, under copyright except where “fair dealing” , “fair use” etc allowed specicly by law
- ALRC Review
- More exceptions or general “fair use”
- Report not yet tabled, but interim discussion paper released
- Kookaburra
- Song from 1932
- “Down under” 1981
- Nobody noticed till 2007 when on TV Quiz show
- Court decided infringing
- Two culturally significant songs
- Fair Use vs Fair dealing
- Fair dealing has specific exceptions
- Things are not fair dealing
- Sampling
- non commercial use of incidental music
- memes
- commercial services to allow recording in the cloud
- stoarage of copyright material
- copying DVD to other devices
- search engines (thumbnails)
- digital archiving
- More exceptions?
- Quotations
- in the Berne Convention
- anachronistic term
- doesn’t cover transformation, implies verbatim use
- Transformation
- not a substitute for the original work
- low threshold – undermines creators rights
- high threshold – confusing, how much change needed
- How does commercial use fit?
- Hard for court to decide
- Private and Domestic use
- Format shifting and time shifting exists already (VHS only, not DVD)
- doesn’t cover the cloud
- not technology neutral
- Canadian more technology neutral but “non-commercial” bit heard to define
- Quotations
- Fair Use
- See US Copyright Act
- Fair Use in Australia
- Fairness facter
- illustrative uses (non-exhaustive)
- flexible defence, weighing up the factors
- Advantages
- Balance
- Flexible
- aligns with community expectations
- Against Fair Use
- Uncertainly ( parliament vs law)
- Requires litigation
- Originated from different legal enviroment
- The reply to objections
- Uncertainty – See normal consumer law with terms like “unfair contracts” , “misleading and deceptive conduct”
- Different legal env – same common law roots, AUSFTA meant to “harmonise” copyright law.
- International Law , 3 step test – The US gets away with it, never chellenged
- Govt unlikely to go forward with fair use based on their leanings
- The introduction of a Fair Use defence would encourage Australian innovation
- “General the US likes to export the ‘bad parts’ of it’s copyright law, not the ‘good bits’ “
Linux.conf.au 2014 – Day 4 – Keynote – Matthew Garrett
Matthew Garrett
Security in the modern world
- 2013 was an interesting year
- UEFI Secure boot was deployed to the masses. On most PCs by default
- ..and vendor implementations promptly broken
- Snowden revelations
- First highly visible hypervisor related compromise?
- ..no it turns out
- Who are we protecting against?
- The NSA?
- Our Hosting providers?
- Opportunistic attackers?
- Imperfect security is better than no security
- NSA
- Leaked material is from 2007-2008 so don’t know how is advanced
- No evidence that the entire stack has been subverted
- Leaked material describes model-specific (rather than vendor-specific) exploits
- Plausible that the vendors aren’t actively involved
- although passive involvement is likely
- Would it be in anyone’s interest to have a generic exploit?
- Intelligence agencies are probably not your biggest concern
- Most security compromises are either political or profit driven
- But that doesn’t make the user feel better
- What can we do to protect users
- Protecting the entire chain
- Boot verification is an absolute requirement
- OS’s are just too big to be perfect
- Persistent infections (of boot process) make recovery impractical
- …but so is user freedom
- Stopping users from building/booting their own kernels is not a good long term situation
- …ideally including the firmware
- Boot verification is an absolute requirement
- Where do we stand
- UEFI Secure boot on x86 systems
- Guaranteed that user can replace keys
- No guaranteed that the user can replace the fireware
- Andriod
- Primary compute device for many
- Some permit the user to replace the OS
- No ability to replace keys or fireware – cannot boot your own signed kernels
- Need to push vendors to provide replacement of OS and keys
- Apple
- No ability to replace OS, keys or fireware
- UEFI Secure boot on x86 systems
- How much can I trust my system
- OS backdoors
- Doesn’t really seem necessary, too many holes already
- Firmware backdoors
- Why has nobody audited the Jetway (leaked BIO and fireware) leak?
- Lower Level?
- AMT, CPU microcode
- AMT has a lot of access to running and even turned off systems, Intel would be greatly embarrassed
- CPU Microcode – could be updates by OS-level exploit
- It’s fine, all my data is in the cloud
- What even *is* the cloud
- If you are giving you data to someone else you are trusting them not to lose it or steal it
- …History suggest this is not a good idea
- But this is still a spectrum
- Running you server means you trust all your software
- Running a VM means you need to trust the hypervisor and other guests
- ..do you trusts those guests .. do you trusts those guest will be unable to compromise the hypervisor
- Questions to ask you cloud providers
- What secuity to isolates guests? selinux over kvm perhaps?
- How do you manage hypervisor updates in response to security issues?
- my mechanisms do you have to detect compromises to the hypervisor?
- what is your response to to finding a compromised device?
- Can you trust them at all?
- Introspection of the bare metal is hard
- Introspection of VMs is trivial
- Virtualisation requires different security considerations than bare metal requirements, more attacks
- OS backdoors
- Security in 2014
- Be more agressive about securing every layer of systems
- .. but do so in a way that ensures users don’t have to choose between freedom and security
- Start asking cloud vendors hard questions
- … and their customers, too
- Security and free are two sides of the same coin
- Don’t buy into any narrative that asks you you to give up one for the other
Linux.conf.au 2014 – Day 3 – Session 3
Continuous Integration for your database migrations by Michael Still
- Running unit and integration test on all patches
- Terminology
- sqlalchemy – The database ORM the Openstack nova uses
- Schema version: a single database schema, represented by a number
- Database migration: the process of moving between schema versions
- Motivation
- Test weren’t testing upgrades on real large production data
- We found the following things
- Schema drift – some deployments had schemas that wenr’t possible to upgrade because they didn’t match current tools
- Performance issues – Some upgrades took too long
- Broken downgrades – didn’t work for non-trivial downgrades
- Are downgrades important?
- Turbo-hipster is a test runner
- A series of test plugins
- Register with Zuul
- Runs task plugins when requested, return results
- Task Plugin
- Python Plugin
- The DB upgrade plugin
- Upgrade to tunk
- Upgrade to the patch
- Downgrade to the 1st migration in the release
- Upgrade again
- Pass / fail based on analysis of the logs from the shell script
- Lets go made with plugins
- Email people when code they worked on is changed by others
- Cause doc bugs to be created
- Cause dependant patches when a patch requres changes to “flow down” repos.
- Much already in Gerrit but does it need to be?
- OMG Security
- This is a bit scary
- We’re running code on our workers provided by 3rd parties
- Mitigation
- Limited access to nodes
- untrusted code tested with network turned off
- checks logs for suspicious data
- We’re working on dataset anonymousation
- Running a process with networking turned off
- Explored LXC (containers)
- netns is much simpler
- Interesting Bugs
- Slow upgrade -> Dev iterated his code multiple times ran against the test until was fast enough
- Would be happy to do this with Postgres if Postgres community wants to help get it going
Live upgrading many thousands of servers from an ancient RedHat 7.1 to a 10 year newer Debian based one by Marc Merlin
- Longer version http://marc.merlins.org/linux/talks/ProdNG-LCA2014/
- Google Started with a Linux CD (in our case Red Hat 6.2)
- Then kickstart
- updates had ssh loops to connect to machines and upgrade
- Any push based method is doomed
- Running from cron will break eventually
- Across thousands of machines a percentage will fail and have to br fixed by hand
- File Level syncing
- makes all you servers the same
- Exclude a few files (resolv.conf, syslog)
- Doesn’t scale well but he can have rsync-like software that doesn’t something similar
- All servers are the same
- for the root partition yes
- per-machine software outside root parition
- static links for libraries
- hundreds of different apps with own dependencies
- How to upgarde root partition
- just security upgrades mainly
- running Redhat 7.1 for a long time
- How to upgrade base packages
- upgrade packages, create and test new master image, slowly push to live
- only two images in prod, current and the old one
- How about pre/post installs?
- removed most of them
- sync daemon has a watch on some files and does something when that file changed
- How did running 7.1 work out?
- It works a long time but not forever
- Very scary
- Oh and preferable not reboot the machines if at all possible
- What new distribution?
- Workstations already moved to debian from redhat
- Debian has more packages
- Ubuntu is better than debain so started with Ubuntu Dapper
- Init System choice
- Boot time not a big decided
- Consistent Boot order very useful
- systemd a lot of work to convert, upstart a lot too
- systemd option for future
- ProdNG
- self hosting
- Entirely rebuilt from source
- Remove unneeded dependencies
- end distribution 150MB (without google custom bits)
- No complivated upstart, dbus, plymouth
- Small is quicker to sync
- Newer packages not always better, sometimes old is good, new stuff as lots of extra stuff you might not need
- How to push it
- 20k+ files changed
- How to convince people it will work, how to test?
- push hard to do slowly, have to maintain 2 very different systems in prod
- Turned into many smaller jumps
- Take debian packages into rpms and install on existing server one at a time
- Cruft Removal
- Get rid of junks, like X fonts, X server, random locales
- Old libs nothing is using
- No C++ left so libstdc++ removed
- One at time
- Upgrade libc from 2.2.2 to 2.3.6
- Upgrade small packages and work up
- 150 packages upgraded a few at a time. took just over 2 years
- Convert rpms to debs
- Same packages on both images
- Had to convert internal packages from rpms to debs
- used alien and custom scrip to convert.
- changelogs have more fixed format in debs than rpms
- Switch live base packages everything back to debs
- Only one major bug
- Lessons learned
- If you are maintain a lot of machines if you have your own fork you can remove all the bits you don’t need
- Forcing server uses to use an API you provide and not to write to the root FS
- File level sync recovers from any state and is more reliable than most other methods
- You can do crazy things like distribution switches
- Don’t blindly install upstream updates
- If you don’t need it remove it
- You probably don’t want to run the latest thing, more trouble than it is worth
- Smaller jumps is easier
- the best way to do a huge upgrade is a few packages at a time