Thursday, May 25, 2006

Ext3 and full data journaling

Ext3 is a stable and mature file system, offering a good balance of speed and reliability. But what many people do not realise is that the default journaling support is only for meta-data, not all data. Here is the relevant section from 'man tune2fs':

journal_data
When the filesystem is mounted with journalling
enabled, all data (not just metadata) is committed
into the journal prior to being written into the
main filesystem.

journal_data_ordered
When the filesystem is mounted with journalling
enabled, all data is forced directly out to the main
file system prior to its metadata being committed to
the journal.

journal_data_writeback
When the filesystem is mounted with journalling
enabled, data may be written into the main filesys-
tem after its metadata has been committed to the
journal. This may increase throughput, however, it
may allow old data to appear in files after a crash
and journal recovery.

So the default mount option is with "journal_data_ordered". This is considered the fastest option, but at the expense of full data recovery in the event of a power outage etc. You can look at many of the tunable parameters with 'tune2fs -l /dev/hdx' or in my case as I'm using LVM:

# tune2fs -l /dev/mapper/VolGroup00-LogVol00
tune2fs 1.38 (30-Jun-2005)
Filesystem volume name:
Last mounted on:
Filesystem UUID: b0c69d9c-234f-444d-ba95-f979a4902f4d
Filesystem magic number: 0xEF53
Filesystem revision #: 1 (dynamic)
Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file
Default mount options:
Filesystem state: clean
Errors behavior: Continue
Filesystem OS type: Linux
Inode count: 19005440
Block count: 19005440
Reserved block count: 950272
Free blocks: 5362926
Free inodes: 18381437
First block: 0
Block size: 4096
Fragment size: 4096
Reserved GDT blocks: 1024
Blocks per group: 32768
Fragments per group: 32768
Inodes per group: 32768
Inode blocks per group: 1024
Filesystem created: Wed Jul 6 20:23:44 2005
Last mount time: Thu May 25 10:03:33 2006
Last write time: Thu May 25 10:03:33 2006
Mount count: 226
Maximum mount count: -1
Last checked: Wed Jul 6 20:23:44 2005
Check interval: 0 ()
Reserved blocks uid: 0 (user root)
Reserved blocks gid: 0 (group root)
First inode: 11
Inode size: 128
Journal inode: 8
First orphan inode: 695882
Default directory hash: tea
Directory Hash Seed: 11dc53e2-545c-4880-a6a6-792557a40a3d
Journal backup: inode blocks


The value 'Default mount options: ' is empty meaning its only using the meta-data journaling. To set a new value here we run:

# tune2fs -o journal_data /dev/mapper/VolGroup00-LogVol00

NOTE: I've run this command on a mounted file system (in fact on the root file system / ) with no ill effects. However, if you are concerned about your data (and I suggest you always have backups) then only run this command on file systems after they are dismounted; either boot in rescue mode or from a bootable cd like Knoppix.

Also, we can edit our /etc/fstab to set the default mount option there by adding the "data=journal" option:

/dev/VolGroup00/LogVol00 / ext3 defaults,noatime,data=journal 1 1

Thats it. We now need to reboot the system (for / ) or remount (for any other file system) to begin taking advantage of full data journaling.

I've not noticed any performance degredation with "journal_data" and have heard reports that it is actually faster in some circumstances .

No comments: