Linux.conf.au 2016 – Sysadmin Miniconf – Session 3

The life of a Sysadmin in a research environment – Eric Burgueno

Everything must be reproducible
Keeping system up as long as possible, not have an overall uptime percentage
One person needs to cover lots of roles rather than specialise
2 Servers with 2TB of RAM. Others smaller according to need
Lots of varied tools mostly bioinformatics software
90TB to over 200TB of data over 2 years. Lots of large files. Big files, big servers.
Big job using 2TB of RAM taking 8 days to run.
The 2*2TB servers can be joined togeather to create a single 4TB server
Have to customize environment for each tool, hard when there have lots of tools and also want to compare/collaborate against other places where software is being run.
Reproducible(?) Research

Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes – Andrew McDonnell

Live Demo

Order in the chaos: or lessons learnt on planning in operations – Peter Hall

Lead of an Ops team at REA group. Looks after dev teams for 10-15 applications
Ops is not a project, but works with many projects
Many sources of work, dev, security, incidents, infrastructure improvement
Understand the work
- Document your work
- Talk about it, 15min standup
Scedule things
- and prepare for the unplanned
- Perhaps 2 weeks
- Leave lots of slack
Interruptions
- Assign team members to each ops teams
- Rotating “ops goal keeper”
- Developers on pager
Review Often
Longer term goals for your team
Failure demand vs value demand.
- Make sure [at least some of] what you are doing is adding value to the environment

From Commit to Cloud – Daniel Hall

Deployments should be:
- fast – 10 minutes
- small – only one feature change and person doing should be aware of all of what is changing
- easy – little human work as possible, simple to understand
We believe this because
- less to break
- devs should focus on dev
- each project should be really easy to learn, devs can switch between projects easy
- Don’t want anyone from being afraid to deploy
Able to rollback
- 30 microservices
- 2 devs plus some work from others
How to do it
- Microservices arch (optional but helps)
- git , build agent, packaging format with dependencies
- something to run you stuff
code -> git -> built -> auto test -> package -> staging -> test -> deploy to prod
Application is built triggere by git
- build.sh script in each repo
Auto test after build, don’t do end-to-end testing, do that in staging
Package app – they use docker – push to internal docker repo
Deploy to staging – they use curl to push json mesos/matathon with pulls container. Testing run there
Single Click approval to deploy to staging
Deploy to prod – should be same as how you deploy to staging.

LNAV – Paul Wayper

Point at a dir. read all the files. sort all the lines together in timestamp order
Colour codes, machines, different facilities(daemons). Highlights IPs addresses
Errors lines in red, warning lines in yellow
Regular expressions highlighted. Fully pcre compatable
Able to move back and force and hour or a day at a time with special keys
Histograph of error lines, number per minutes etc
more complete (SQL like) queries
compiles as a static binary
Ability to add your own log file formats
Ability share format filters with others
Doesn’t deal with journald logs
Availbale for spel, fedora, debian but under a lot of active development.
acts like tail -f to spot updates to logs.