天天看點

PostgreSQL on XFS 性能優化 - 1概要1. 邏輯卷優化部分2. XFS mkfs 優化部分3. XFS mount 優化部分4. xfsctl 優化部分[排錯][參考]

xfs檔案系統的性能優化主要分4塊

1. 邏輯卷/raid優化部分

2. xfs mkfs 優化部分

3. xfs mount 優化部分

4. xfsctl 優化部分

以上幾個部分,建議了解原理後針對應用場景再展開,通過man手冊可以了解原理。

手冊有:

man lvcreate

man xfs

man mkfs.xfs

man mount

man xfsctl

下面簡單講一下詳細的優化過程:

1.1 

建立pv前,将塊裝置對齊,前面1mb最好不要配置設定,從2048 sector開始配置設定。

fdisk -c -u /dev/dfa

start  2048

end + (2048*n) - 1

或者使用parted建立分區。

1.2 

與性能相關的需要指定2個參數,

1. 條帶數量,和pv數量一緻即可

       -i, --stripes stripes

              gives the number of stripes.  this is equal to the number of physical volumes to scatter the logical volume.

2. 條帶大小,和資料庫塊大小一緻,例如postgresql預設為 8kb。

       -i, --stripesize stripesize

              gives the number of kilobytes for the granularity of the stripes.

              stripesize must be 2^n (n = 2 to 9) for metadata in lvm1 format.  for metadata in lvm2 format, the stripe size may be a larger power of 2 but must not exceed the physical extent size.

3. 建立快照時,指定的參數

chunksize, 最好和資料庫的塊大小一緻, 例如postgresql預設為 8kb。

       -c, --chunksize chunksize

              power of 2 chunk size for the snapshot logical volume between 4k and 512k.

例如:

#lvcreate -i 3 -i 8 -n lv01 -l 100%vg vgdata01

  logical volume "lv01" created

xfs包含3個section,data, log, realtime files。

預設情況下 log存在data裡面,沒有realtime。所有的section都是由最小機關block組成,初始化xfs是-b指定block size。

2.1 data

包含 metadata(inode, 目錄, 間接塊), user file data, non-realtime files

data被拆分成多個allocation group,mkfs.xfs時可以指定group的個數,以及單個group的size。

group越多,可以并行進行的檔案和塊的allocation就越多。你可以認為單個組的操作是串行的,多個組是并行的。

但是組越多,消耗的cpu會越多,需要權衡。對于并發寫很高的場景,可以多一些組,(例如一台主機跑了很多小的資料庫,每個資料庫都很繁忙的場景下)

2.2 log

存儲metadata的log,修改metadata前,必須先記錄log,然後才能修改data section中的metadata。

也用于crash後的恢複。

2.3 realtime

被劃分為很多個小的extents, 要将檔案寫入到realtime section中,必須使用xfsctl改一下檔案描述符的bit位,并且一定要在資料寫入前完成。在realtime中的檔案大小是realtime extents的倍數關系。

allocation group數量和size相乘等于塊裝置大小。數量多少和使用者需求的并行度相關。

allocation group數量最好是下面邏輯卷對應pv數量的倍數,例如有3個pv,則ag可以是9個,或者900個。

log最好放在ssd上,速度越快越好。最好不要使用cgroup限制log塊裝置的iops操作。

realtime不需要的話,不需要建立。

-b size=8192  與資料庫塊大小一緻

-d agcount=9000,sunit=16,swidth=48

   假設有9000個并發寫操作,使用9000個allocation groups

   (機關512 bytes)與lvm或raid塊裝置的條帶大小對齊

    與lvm或raid塊裝置條帶跨度大小對齊,以上對應3*8 例如 -i 3 -i 8。

例子

#mkfs.xfs -f -b size=8192 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01 

meta-data=/dev/mapper/vgdata01-lv01 isize=256    agcount=9000, agsize=260417 blks

         =                       sectsz=512   attr=2

data     =                       bsize=8192   blocks=2343748608, imaxpct=5

         =                       sunit=1      swidth=3 blks

naming   =version 2              bsize=8192   ascii-ci=0

log      =internal log           bsize=8192   blocks=260413, version=2

         =                       sectsz=512   sunit=1 blks, lazy-count=1

realtime =none                   extsz=8192   blocks=0, rtextents=0

nobarrier

largeio 針對資料倉庫,流媒體這種大量連續讀的應用

nolargeio 針對oltp

logbsize=262144   指定 log buffer

logdev=  指定log section對應的塊裝置,用最快的ssd。

noatime,nodiratime

swalloc  條帶對齊

#mount -t xfs -o nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc /dev/mapper/vgdata01-lv01 /data01

控制檔案打開政策,略。

#mount -o noatime,swalloc /dev/mapper/vgdata01-lv01 /data01

mount: function not implemented

原因是用了目前核心不支援的塊大小,改成4096即可

[ 5736.642924] xfs (dm-0): file system with blocksize 8192 bytes. only pagesize (4096) or less will currently work.

[ 5736.695146] xfs (dm-0): sb validate failed with error -38.

排除

#mkfs.xfs -f -b size=4096 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01 

meta-data=/dev/mapper/vgdata01-lv01 isize=256    agcount=9000, agsize=520834 blks

data     =                       bsize=4096   blocks=4687497216, imaxpct=5

         =                       sunit=2      swidth=6 blks

naming   =version 2              bsize=4096   ascii-ci=0

log      =internal log           bsize=4096   blocks=520830, version=2

         =                       sectsz=512   sunit=2 blks, lazy-count=1

realtime =none                   extsz=4096   blocks=0, rtextents=0

1. 

xfs(5)                                                                  xfs(5)

name

       xfs - layout of the xfs filesystem

description

       an  xfs  filesystem  can  reside  on  a  regular  disk  partition  or on a logical volume.  an xfs filesystem has up to three parts: a data section, a log section, and a realtime section.  using the default

       mkfs.xfs(8) options, the realtime section is absent, and the log area is contained within the data section.  the log section can be either separate from  the  data  section  or  contained  within  it.   the

       filesystem sections are divided into a certain number of blocks, whose size is specified at mkfs.xfs(8) time with the -b option.

       the  data  section  contains all the filesystem metadata (inodes, directories, indirect blocks) as well as the user file data for ordinary (non-realtime) files and the log area if the log is internal to the

       data section.  the data section is divided into a number of allocation groups.  the number and size of the allocation groups are chosen by mkfs.xfs(8) so that there is normally a small number of equal-sized

       groups.   the number of allocation groups controls the amount of parallelism available in file and block allocation.  it should be increased from the default if there is sufficient memory and a lot of allo-

       cation activity.  the number of allocation groups should not be set very high, since this can cause large amounts of cpu time to be used by the filesystem, especially when the  filesystem  is  nearly  full.

       more allocation groups are added (of the original size) when xfs_growfs(8) is run.

       the  log  section  (or  area,  if it is internal to the data section) is used to store changes to filesystem metadata while the filesystem is running until those changes are made to the data section.  it is

       written sequentially during normal operation and read only during mount.  when mounting a filesystem after a crash, the log is read to complete operations that were in progress at the time of the crash.

       the realtime section is used to store the data of realtime files.  these files had an attribute bit set through xfsctl(3) after file creation, before any data was written to the file.  the realtime  section

       is divided into a number of extents of fixed size (specified at mkfs.xfs(8) time).  each file in the realtime section has an extent size that is a multiple of the realtime section extent size.

       each allocation group contains several data structures.  the first sector contains the superblock.  for allocation groups after the first, the superblock is just a copy and is not updated after mkfs.xfs(8).

       the next three sectors contain information for block and inode allocation within the allocation group.  also contained within each allocation group are data structures to  locate  free  blocks  and  inodes;

       these are located through the header structures.

       each  xfs filesystem is labeled with a universal unique identifier (uuid).  the uuid is stored in every allocation group header and is used to help distinguish one xfs filesystem from another, therefore you

       should avoid using dd(1) or other block-by-block copying programs to copy xfs filesystems.  if two xfs filesystems on the same machine have the same uuid, xfsdump(8) may become confused when doing incremen-

       tal and resumed dumps.  xfsdump(8) and xfsrestore(8) are recommended for making copies of xfs filesystems.

operations

       some functionality specific to the xfs filesystem is accessible to applications through the xfsctl(3) and by-handle (see open_by_handle(3)) interfaces.

mount options

       refer to the mount(8) manual entry for descriptions of the individual xfs mount options.

see also

       xfsctl(3), mount(8), mkfs.xfs(8), xfs_info(8), xfs_admin(8), xfsdump(8), xfsrestore(8).

                                                                        xfs(5)