LCA2010 – Day 3 – Simon Lyall's Blog

Wednesday is the first day of Linux.conf.au proper. I thought that today I’d just keep my notes in a blog post to prevent doubling up.

The keynote was Benjamin Mako Hill talked about various things the most interesting bit was “antifeatures”. Things like DRM, crippling of products etc. The one of these I most hate right now is they way that cheap netbooks have fairly low specs (small resolutions, low RAM, slow CPUs ) partially because they have to keep the spec below a certain value in order to qualify for the really cheap Windows license.

The dreamwidth talk was quiet interesting (although the speakers pre-rehearsed banter between the speakers didn’t really work). Lots of practical examples , war stories and good sound advice.

Selena Deckelmann talked about choosing which open source database your should choose. The quick answer is “what problem are your trying to solve?”. She did a survey of the 50-odd databases out there and got 25 replies. Also did her own research and comparisons. Classified DBs into several categories (which I won’t list) such as

General Model – Key-Value, OLTP.
Distribution model (replication, partitioning, sharing).
Memory vs disk (eg keegin g everything in memory only like memcached).
HA options, Node failover.
Code dev model – Core +modules , Monolithic , Infrastructure
Community dev model – Dictator, Feature driven, Small group, A mix

Results at http://ossdbsurvey.org

Databases implement each others protocols
Need verification that protocols correctly implimented
Need tools/test to check things like replication working
More connections between projects/people (eg java seperate)

Ted Ts’o – Production-Ready filesystems

Hard to make robust. Many different workloads, lots of state, very parallel
Hard to balance getting it out with getting it stable enough to be fairly safe to use
75-100 persons-years for filesystem to be production ready.
eg zfs around a dozen people , start 2001, announced 2005, shipped 2006, people confident with it around 2008-2009
Ext4 renamed from ext4dev at end 2008
Ext4 Shipping is some community distributions, soon in some enterprise distributions, widespread adoption 12+ months later
Lots of bugfixes still in ext4, most not real-world and picked up by auto-tools or careful checks in weird conditions.
Ted: “my other prefered term for Dbench is ‘random number generator’ “
Paths like online resize, online defrag that are not regularly tested by users or testers so source of many bugs.
Many bugs were in the recently subsystems and features
Making General purpose file system takes longer and a lot more effort than you might expect. Labour of love, hard to justify from business perspective.
Solid state drives with “flash translation layer” in place are fairly much the same as spinning disks. Extra optimizations for disks don’t help but they don’t hurt

Matthew Garrett on the Linux community

Started by listing things he’s not talked about
The Linux community is “Like the Koreas”
To be a member of the Linux community “you just have to care, just have to turn”
As community we are very hostile, it’s seen okay to flame and it is being rewarded still
Should we stop just cause it’s a nice thing to do or because it’ll stop scaring people off?
Ubuntu code of conduct has mean’t that users are consider part of the community more than in other distributions
Code of Conduct must be enforced or it’s useless
“We value code above all else… not a good thing” . We need people to feel that by using software they are part of something
Communty entirely based on technical excellence or encompasing everybody who users, cares, contributes to projects
Idea for positive examples Wiki with pointers to COPs and best practice examples
Not gained behavior standards normally associated with grown communities

Sage Weil – ceph distributed file system

How different
scaleable to 1000s , grow from a few
reliable, HA, replicated data, fast recovery
snapshots, quota-like accounting
Motivation – avoid bottlenecks and symetrical shared disks
avoid manual workload partition, p3p-like protocols, intell storage agents
POSIX file system , scaleable metadata server
metadata (MDS) servers/clusters and object store boxes seperate
CRUSH hash function used to distrubtute objects across devices, works as devices are added. Spread them out explicitly across infrastructure if required
fast (no lookups), relieable, stable
celp object storage daemon on each node
talks to peers on other node: rep data, detect failures, migrate data
hashing fuction means nodes don’t have to negotiate with each other, CRUSH says where data is going.
monitor storage nodes, moves data around, make sure it’s in the right places, uptodate. fixes if required.
raw storage API if you don’t need full filesystem fun (dirs etc)
proxy that emulates s3 REST interface
metadata cluster , uses object store for all long term storage, needs memory and fast network for performance.
metadata streamed to journal. large journal (100s MB) flushed now and then
snapshotting on per-directory basisi via simple mkdir
snapshot leverages btrfs copy-on-write storage layer
file systems client near-posix
kernel client, FUSE, Hadoop clients
stable but not production ready
client should be in mainline kernel soon
aim to work in multiple datacentre, across unrelieble links
http://ceph.newdream.net/

Paul Fenwick – Worlds Worst Inventions

Not really a technical talk. More a few stories about funny inventions. Quiet amusing but I’m not sure it fits in with the rest of the conference.