Linux.conf.au 2014 – Day 3 – Session 3

Continuous Integration for your database migrations by Michael Still

Running unit and integration test on all patches
Terminology
- sqlalchemy – The database ORM the Openstack nova uses
- Schema version: a single database schema, represented by a number
- Database migration: the process of moving between schema versions
Motivation
- Test weren’t testing upgrades on real large production data
- We found the following things
  - Schema drift – some deployments had schemas that wenr’t possible to upgrade because they didn’t match current tools
  - Performance issues – Some upgrades took too long
  - Broken downgrades – didn’t work for non-trivial downgrades
- Are downgrades important?
Turbo-hipster is a test runner
- A series of test plugins
- Register with Zuul
- Runs task plugins when requested, return results
Task Plugin
- Python Plugin
The DB upgrade plugin
- Upgrade to tunk
- Upgrade to the patch
- Downgrade to the 1st migration in the release
- Upgrade again
- Pass / fail based on analysis of the logs from the shell script
Lets go made with plugins
- Email people when code they worked on is changed by others
- Cause doc bugs to be created
- Cause dependant patches when a patch requres changes to “flow down” repos.
- Much already in Gerrit but does it need to be?
OMG Security
- This is a bit scary
- We’re running code on our workers provided by 3rd parties
- Mitigation
  - Limited access to nodes
  - untrusted code tested with network turned off
  - checks logs for suspicious data
  - We’re working on dataset anonymousation
Running a process with networking turned off
- Explored LXC (containers)
- netns is much simpler
Interesting Bugs
- Slow upgrade -> Dev iterated his code multiple times ran against the test until was fast enough
Would be happy to do this with Postgres if Postgres community wants to help get it going

Live upgrading many thousands of servers from an ancient RedHat 7.1 to a 10 year newer Debian based one by Marc Merlin

Longer version http://marc.merlins.org/linux/talks/ProdNG-LCA2014/
Google Started with a Linux CD (in our case Red Hat 6.2)
Then kickstart
updates had ssh loops to connect to machines and upgrade
Any push based method is doomed
Running from cron will break eventually
Across thousands of machines a percentage will fail and have to br fixed by hand
File Level syncing
- makes all you servers the same
- Exclude a few files (resolv.conf, syslog)
- Doesn’t scale well but he can have rsync-like software that doesn’t something similar
All servers are the same
- for the root partition yes
- per-machine software outside root parition
- static links for libraries
- hundreds of different apps with own dependencies
How to upgarde root partition
- just security upgrades mainly
- running Redhat 7.1 for a long time
How to upgrade base packages
- upgrade packages, create and test new master image, slowly push to live
- only two images in prod, current and the old one
How about pre/post installs?
- removed most of them
- sync daemon has a watch on some files and does something when that file changed
How did running 7.1 work out?
- It works a long time but not forever
- Very scary
- Oh and preferable not reboot the machines if at all possible
What new distribution?
- Workstations already moved to debian from redhat
- Debian has more packages
- Ubuntu is better than debain so started with Ubuntu Dapper
Init System choice
- Boot time not a big decided
- Consistent Boot order very useful
- systemd a lot of work to convert, upstart a lot too
- systemd option for future
ProdNG
- self hosting
- Entirely rebuilt from source
- Remove unneeded dependencies
- end distribution 150MB (without google custom bits)
- No complivated upstart, dbus, plymouth
- Small is quicker to sync
- Newer packages not always better, sometimes old is good, new stuff as lots of extra stuff you might not need
How to push it
- 20k+ files changed
- How to convince people it will work, how to test?
- push hard to do slowly, have to maintain 2 very different systems in prod
Turned into many smaller jumps
- Take debian packages into rpms and install on existing server one at a time
Cruft Removal
- Get rid of junks, like X fonts, X server, random locales
- Old libs nothing is using
- No C++ left so libstdc++ removed
One at time
- Upgrade libc from 2.2.2 to 2.3.6
- Upgrade small packages and work up
- 150 packages upgraded a few at a time. took just over 2 years
Convert rpms to debs
- Same packages on both images
- Had to convert internal packages from rpms to debs
- used alien and custom scrip to convert.
- changelogs have more fixed format in debs than rpms
Switch live base packages everything back to debs
- Only one major bug
Lessons learned
- If you are maintain a lot of machines if you have your own fork you can remove all the bits you don’t need
- Forcing server uses to use an API you provide and not to write to the root FS
- File level sync recovers from any state and is more reliable than most other methods
- You can do crazy things like distribution switches
- Don’t blindly install upstream updates
- If you don’t need it remove it
- You probably don’t want to run the latest thing, more trouble than it is worth
- Smaller jumps is easier
- the best way to do a huge upgrade is a few packages at a time