The life of a Sysadmin in a research environment – Eric Burgueno
- Everything must be reproducible
- Keeping system up as long as possible, not have an overall uptime percentage
- One person needs to cover lots of roles rather than specialise
- 2 Servers with 2TB of RAM. Others smaller according to need
- Lots of varied tools mostly bioinformatics software
- 90TB to over 200TB of data over 2 years. Lots of large files. Big files, big servers.
- Big job using 2TB of RAM taking 8 days to run.
- The 2*2TB servers can be joined togeather to create a single 4TB server
- Have to customize environment for each tool, hard when there have lots of tools and also want to compare/collaborate against other places where software is being run.
- Reproducible(?) Research
Creating bespoke logging systems and dashboards with Grafana, in fifteen minutes – Andrew McDonnell
Live Demo
Order in the chaos: or lessons learnt on planning in operations – Peter Hall
- Lead of an Ops team at REA group. Looks after dev teams for 10-15 applications
- Ops is not a project, but works with many projects
- Many sources of work, dev, security, incidents, infrastructure improvement
- Understand the work
- Document your work
- Talk about it, 15min standup
- Scedule things
- and prepare for the unplanned
- Perhaps 2 weeks
- Leave lots of slack
- Interruptions
- Assign team members to each ops teams
- Rotating “ops goal keeper”
- Developers on pager
- Review Often
- Longer term goals for your team
- Failure demand vs value demand.
- Make sure [at least some of] what you are doing is adding value to the environment
From Commit to Cloud – Daniel Hall
- Deployments should be:
- fast – 10 minutes
- small – only one feature change and person doing should be aware of all of what is changing
- easy – little human work as possible, simple to understand
- We believe this because
- less to break
- devs should focus on dev
- each project should be really easy to learn, devs can switch between projects easy
- Don’t want anyone from being afraid to deploy
- Able to rollback
- 30 microservices
- 2 devs plus some work from others
- How to do it
- Microservices arch (optional but helps)
- git , build agent, packaging format with dependencies
- something to run you stuff
- code -> git -> built -> auto test -> package -> staging -> test -> deploy to prod
- Application is built triggere by git
- build.sh script in each repo
- Auto test after build, don’t do end-to-end testing, do that in staging
- Package app – they use docker – push to internal docker repo
- Deploy to staging – they use curl to push json mesos/matathon with pulls container. Testing run there
- Single Click approval to deploy to staging
- Deploy to prod – should be same as how you deploy to staging.
LNAV – Paul Wayper
- Point at a dir. read all the files. sort all the lines together in timestamp order
- Colour codes, machines, different facilities(daemons). Highlights IPs addresses
- Errors lines in red, warning lines in yellow
- Regular expressions highlighted. Fully pcre compatable
- Able to move back and force and hour or a day at a time with special keys
- Histograph of error lines, number per minutes etc
- more complete (SQL like) queries
- compiles as a static binary
- Ability to add your own log file formats
- Ability share format filters with others
- Doesn’t deal with journald logs
- Availbale for spel, fedora, debian but under a lot of active development.
- acts like tail -f to spot updates to logs.