xfs檔案系統的性能優化主要分4塊
1. 邏輯卷/raid優化部分
2. xfs mkfs 優化部分
3. xfs mount 優化部分
4. xfsctl 優化部分
以上幾個部分,建議了解原理後針對應用場景再展開,通過man手冊可以了解原理。
手冊有:
man lvcreate
man xfs
man mkfs.xfs
man mount
man xfsctl
下面簡單講一下詳細的優化過程:
1.1
建立pv前,将塊裝置對齊,前面1mb最好不要配置設定,從2048 sector開始配置設定。
fdisk -c -u /dev/dfa
start 2048
end + (2048*n) - 1
或者使用parted建立分區。
1.2
與性能相關的需要指定2個參數,
1. 條帶數量,和pv數量一緻即可
-i, --stripes stripes
gives the number of stripes. this is equal to the number of physical volumes to scatter the logical volume.
2. 條帶大小,和資料庫塊大小一緻,例如postgresql預設為 8kb。
-i, --stripesize stripesize
gives the number of kilobytes for the granularity of the stripes.
stripesize must be 2^n (n = 2 to 9) for metadata in lvm1 format. for metadata in lvm2 format, the stripe size may be a larger power of 2 but must not exceed the physical extent size.
3. 建立快照時,指定的參數
chunksize, 最好和資料庫的塊大小一緻, 例如postgresql預設為 8kb。
-c, --chunksize chunksize
power of 2 chunk size for the snapshot logical volume between 4k and 512k.
例如:
#lvcreate -i 3 -i 8 -n lv01 -l 100%vg vgdata01
logical volume "lv01" created
xfs包含3個section,data, log, realtime files。
預設情況下 log存在data裡面,沒有realtime。所有的section都是由最小機關block組成,初始化xfs是-b指定block size。
2.1 data
包含 metadata(inode, 目錄, 間接塊), user file data, non-realtime files
data被拆分成多個allocation group,mkfs.xfs時可以指定group的個數,以及單個group的size。
group越多,可以并行進行的檔案和塊的allocation就越多。你可以認為單個組的操作是串行的,多個組是并行的。
但是組越多,消耗的cpu會越多,需要權衡。對于并發寫很高的場景,可以多一些組,(例如一台主機跑了很多小的資料庫,每個資料庫都很繁忙的場景下)
2.2 log
存儲metadata的log,修改metadata前,必須先記錄log,然後才能修改data section中的metadata。
也用于crash後的恢複。
2.3 realtime
被劃分為很多個小的extents, 要将檔案寫入到realtime section中,必須使用xfsctl改一下檔案描述符的bit位,并且一定要在資料寫入前完成。在realtime中的檔案大小是realtime extents的倍數關系。
allocation group數量和size相乘等于塊裝置大小。數量多少和使用者需求的并行度相關。
allocation group數量最好是下面邏輯卷對應pv數量的倍數,例如有3個pv,則ag可以是9個,或者900個。
log最好放在ssd上,速度越快越好。最好不要使用cgroup限制log塊裝置的iops操作。
realtime不需要的話,不需要建立。
-b size=8192 與資料庫塊大小一緻
-d agcount=9000,sunit=16,swidth=48
假設有9000個并發寫操作,使用9000個allocation groups
(機關512 bytes)與lvm或raid塊裝置的條帶大小對齊
與lvm或raid塊裝置條帶跨度大小對齊,以上對應3*8 例如 -i 3 -i 8。
例子
#mkfs.xfs -f -b size=8192 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01
meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=260417 blks
= sectsz=512 attr=2
data = bsize=8192 blocks=2343748608, imaxpct=5
= sunit=1 swidth=3 blks
naming =version 2 bsize=8192 ascii-ci=0
log =internal log bsize=8192 blocks=260413, version=2
= sectsz=512 sunit=1 blks, lazy-count=1
realtime =none extsz=8192 blocks=0, rtextents=0
nobarrier
largeio 針對資料倉庫,流媒體這種大量連續讀的應用
nolargeio 針對oltp
logbsize=262144 指定 log buffer
logdev= 指定log section對應的塊裝置,用最快的ssd。
noatime,nodiratime
swalloc 條帶對齊
#mount -t xfs -o nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc /dev/mapper/vgdata01-lv01 /data01
控制檔案打開政策,略。
#mount -o noatime,swalloc /dev/mapper/vgdata01-lv01 /data01
mount: function not implemented
原因是用了目前核心不支援的塊大小,改成4096即可
[ 5736.642924] xfs (dm-0): file system with blocksize 8192 bytes. only pagesize (4096) or less will currently work.
[ 5736.695146] xfs (dm-0): sb validate failed with error -38.
排除
#mkfs.xfs -f -b size=4096 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01
meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=520834 blks
data = bsize=4096 blocks=4687497216, imaxpct=5
= sunit=2 swidth=6 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=520830, version=2
= sectsz=512 sunit=2 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
1.
xfs(5) xfs(5)
name
xfs - layout of the xfs filesystem
description
an xfs filesystem can reside on a regular disk partition or on a logical volume. an xfs filesystem has up to three parts: a data section, a log section, and a realtime section. using the default
mkfs.xfs(8) options, the realtime section is absent, and the log area is contained within the data section. the log section can be either separate from the data section or contained within it. the
filesystem sections are divided into a certain number of blocks, whose size is specified at mkfs.xfs(8) time with the -b option.
the data section contains all the filesystem metadata (inodes, directories, indirect blocks) as well as the user file data for ordinary (non-realtime) files and the log area if the log is internal to the
data section. the data section is divided into a number of allocation groups. the number and size of the allocation groups are chosen by mkfs.xfs(8) so that there is normally a small number of equal-sized
groups. the number of allocation groups controls the amount of parallelism available in file and block allocation. it should be increased from the default if there is sufficient memory and a lot of allo-
cation activity. the number of allocation groups should not be set very high, since this can cause large amounts of cpu time to be used by the filesystem, especially when the filesystem is nearly full.
more allocation groups are added (of the original size) when xfs_growfs(8) is run.
the log section (or area, if it is internal to the data section) is used to store changes to filesystem metadata while the filesystem is running until those changes are made to the data section. it is
written sequentially during normal operation and read only during mount. when mounting a filesystem after a crash, the log is read to complete operations that were in progress at the time of the crash.
the realtime section is used to store the data of realtime files. these files had an attribute bit set through xfsctl(3) after file creation, before any data was written to the file. the realtime section
is divided into a number of extents of fixed size (specified at mkfs.xfs(8) time). each file in the realtime section has an extent size that is a multiple of the realtime section extent size.
each allocation group contains several data structures. the first sector contains the superblock. for allocation groups after the first, the superblock is just a copy and is not updated after mkfs.xfs(8).
the next three sectors contain information for block and inode allocation within the allocation group. also contained within each allocation group are data structures to locate free blocks and inodes;
these are located through the header structures.
each xfs filesystem is labeled with a universal unique identifier (uuid). the uuid is stored in every allocation group header and is used to help distinguish one xfs filesystem from another, therefore you
should avoid using dd(1) or other block-by-block copying programs to copy xfs filesystems. if two xfs filesystems on the same machine have the same uuid, xfsdump(8) may become confused when doing incremen-
tal and resumed dumps. xfsdump(8) and xfsrestore(8) are recommended for making copies of xfs filesystems.
operations
some functionality specific to the xfs filesystem is accessible to applications through the xfsctl(3) and by-handle (see open_by_handle(3)) interfaces.
mount options
refer to the mount(8) manual entry for descriptions of the individual xfs mount options.
see also
xfsctl(3), mount(8), mkfs.xfs(8), xfs_info(8), xfs_admin(8), xfsdump(8), xfsrestore(8).
xfs(5)