Continuous Integration for your database migrations by Michael Still
- Running unit and integration test on all patches
- Terminology
- sqlalchemy – The database ORM the Openstack nova uses
- Schema version: a single database schema, represented by a number
- Database migration: the process of moving between schema versions
- Motivation
- Test weren’t testing upgrades on real large production data
- We found the following things
- Schema drift – some deployments had schemas that wenr’t possible to upgrade because they didn’t match current tools
- Performance issues – Some upgrades took too long
- Broken downgrades – didn’t work for non-trivial downgrades
- Are downgrades important?
- Turbo-hipster is a test runner
- A series of test plugins
- Register with Zuul
- Runs task plugins when requested, return results
- Task Plugin
- Python Plugin
- The DB upgrade plugin
- Upgrade to tunk
- Upgrade to the patch
- Downgrade to the 1st migration in the release
- Upgrade again
- Pass / fail based on analysis of the logs from the shell script
- Lets go made with plugins
- Email people when code they worked on is changed by others
- Cause doc bugs to be created
- Cause dependant patches when a patch requres changes to “flow down” repos.
- Much already in Gerrit but does it need to be?
- OMG Security
- This is a bit scary
- We’re running code on our workers provided by 3rd parties
- Mitigation
- Limited access to nodes
- untrusted code tested with network turned off
- checks logs for suspicious data
- We’re working on dataset anonymousation
- Running a process with networking turned off
- Explored LXC (containers)
- netns is much simpler
- Interesting Bugs
- Slow upgrade -> Dev iterated his code multiple times ran against the test until was fast enough
- Would be happy to do this with Postgres if Postgres community wants to help get it going
Live upgrading many thousands of servers from an ancient RedHat 7.1 to a 10 year newer Debian based one by Marc Merlin
- Longer version http://marc.merlins.org/linux/talks/ProdNG-LCA2014/
- Google Started with a Linux CD (in our case Red Hat 6.2)
- Then kickstart
- updates had ssh loops to connect to machines and upgrade
- Any push based method is doomed
- Running from cron will break eventually
- Across thousands of machines a percentage will fail and have to br fixed by hand
- File Level syncing
- makes all you servers the same
- Exclude a few files (resolv.conf, syslog)
- Doesn’t scale well but he can have rsync-like software that doesn’t something similar
- All servers are the same
- for the root partition yes
- per-machine software outside root parition
- static links for libraries
- hundreds of different apps with own dependencies
- How to upgarde root partition
- just security upgrades mainly
- running Redhat 7.1 for a long time
- How to upgrade base packages
- upgrade packages, create and test new master image, slowly push to live
- only two images in prod, current and the old one
- How about pre/post installs?
- removed most of them
- sync daemon has a watch on some files and does something when that file changed
- How did running 7.1 work out?
- It works a long time but not forever
- Very scary
- Oh and preferable not reboot the machines if at all possible
- What new distribution?
- Workstations already moved to debian from redhat
- Debian has more packages
- Ubuntu is better than debain so started with Ubuntu Dapper
- Init System choice
- Boot time not a big decided
- Consistent Boot order very useful
- systemd a lot of work to convert, upstart a lot too
- systemd option for future
- ProdNG
- self hosting
- Entirely rebuilt from source
- Remove unneeded dependencies
- end distribution 150MB (without google custom bits)
- No complivated upstart, dbus, plymouth
- Small is quicker to sync
- Newer packages not always better, sometimes old is good, new stuff as lots of extra stuff you might not need
- How to push it
- 20k+ files changed
- How to convince people it will work, how to test?
- push hard to do slowly, have to maintain 2 very different systems in prod
- Turned into many smaller jumps
- Take debian packages into rpms and install on existing server one at a time
- Cruft Removal
- Get rid of junks, like X fonts, X server, random locales
- Old libs nothing is using
- No C++ left so libstdc++ removed
- One at time
- Upgrade libc from 2.2.2 to 2.3.6
- Upgrade small packages and work up
- 150 packages upgraded a few at a time. took just over 2 years
- Convert rpms to debs
- Same packages on both images
- Had to convert internal packages from rpms to debs
- used alien and custom scrip to convert.
- changelogs have more fixed format in debs than rpms
- Switch live base packages everything back to debs
- Only one major bug
- Lessons learned
- If you are maintain a lot of machines if you have your own fork you can remove all the bits you don’t need
- Forcing server uses to use an API you provide and not to write to the root FS
- File level sync recovers from any state and is more reliable than most other methods
- You can do crazy things like distribution switches
- Don’t blindly install upstream updates
- If you don’t need it remove it
- You probably don’t want to run the latest thing, more trouble than it is worth
- Smaller jumps is easier
- the best way to do a huge upgrade is a few packages at a time