2016 – Sysadmin Miniconf – Session 3

The life of a Sysadmin in a research environment – Eric Burgueno

  • Everything must be reproducible
  • Keeping system up as long as possible, not have an overall uptime percentage
  • One person needs to cover lots of roles rather than specialise
  • 2 Servers with 2TB of RAM. Others smaller according to need
  • Lots of varied tools mostly bioinformatics software
  • 90TB to over 200TB of data over 2 years. Lots of large files. Big files, big servers.
  • Big job using 2TB of RAM taking 8 days to run.
  • The 2*2TB servers can be joined togeather to create a single 4TB server
  • Have to customize environment for each tool, hard when there have lots of tools and also want to compare/collaborate against other places where software is being run.
  • Reproducible(?) Research

Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes – Andrew McDonnell

Live Demo

Order in the chaos: or lessons learnt on planning in operations – Peter Hall

  • Lead of an Ops team at REA group. Looks after dev teams for 10-15 applications
  • Ops is not a project, but works with many projects
  • Many sources of work, dev, security, incidents, infrastructure improvement
  • Understand the work
    • Document your work
    • Talk about it, 15min standup
  • Scedule things
    • and prepare for the unplanned
    • Perhaps 2 weeks
    • Leave lots of slack
  • Interruptions
    • Assign team members to each ops teams
    • Rotating “ops goal keeper”
    • Developers on pager
  • Review Often
  • Longer term goals for your team
  • Failure demand vs value demand.
    • Make sure [at least some of] what you are doing is adding value to the environment


From Commit to Cloud – Daniel Hall

  • Deployments should be:
    • fast – 10 minutes
    • small – only one feature change and person doing should be aware of all of what is changing
    • easy – little human work as possible, simple to understand
  • We believe this because
    • less to break
    • devs should focus on dev
    • each project should be really easy to learn, devs can switch between projects easy
    • Don’t want anyone from being afraid to deploy
  • Able to rollback
    • 30 microservices
    • 2 devs plus some work from others
  • How to do it
    • Microservices arch (optional but helps)
    • git , build agent, packaging format with dependencies
    • something to run you stuff
  • code -> git -> built -> auto test -> package -> staging -> test -> deploy to prod
  • Application is built triggere by git
    • script in each repo
  • Auto test after build, don’t do end-to-end testing, do that in staging
  • Package app – they use docker – push to internal docker repo
  • Deploy to staging – they use curl to push json mesos/matathon with pulls container. Testing run there
  • Single Click approval to deploy to staging
  • Deploy to prod – should be same as how you deploy to staging.

LNAV – Paul Wayper

  • Point at a dir. read all the files. sort all the lines together in timestamp order
  • Colour codes, machines, different facilities(daemons). Highlights IPs addresses
  • Errors lines in red, warning lines in yellow
  • Regular expressions highlighted. Fully pcre compatable
  • Able to move back and force and hour or a day at a time with special keys
  • Histograph of error lines, number per minutes etc
  • more complete (SQL like) queries
  • compiles as a static binary
  • Ability to add your own log file formats
  • Ability share format filters with others
  • Doesn’t deal with journald logs
  • Availbale for spel, fedora, debian but under a lot of active development.
  • acts like tail -f to spot updates to logs.