PostgreSQL on XFS 性能優化 - 1概要1. 邏輯卷優化部分2. XFS mkfs 優化部分3. XFS mount 優化部分4. xfsctl 優化部分[排錯][參考]

xfs檔案系統的性能優化主要分4塊

1. 邏輯卷/raid優化部分

2. xfs mkfs 優化部分

3. xfs mount 優化部分

4. xfsctl 優化部分

以上幾個部分，建議了解原理後針對應用場景再展開，通過man手冊可以了解原理。

手冊有：

man lvcreate

man xfs

man mkfs.xfs

man mount

man xfsctl

下面簡單講一下詳細的優化過程：

1.1

建立pv前，将塊裝置對齊，前面1mb最好不要配置設定，從2048 sector開始配置設定。

fdisk -c -u /dev/dfa

start 2048

end + (2048*n) - 1

或者使用parted建立分區。

1.2

與性能相關的需要指定2個參數，

1. 條帶數量，和pv數量一緻即可

-i, --stripes stripes

gives the number of stripes. this is equal to the number of physical volumes to scatter the logical volume.

2. 條帶大小，和資料庫塊大小一緻，例如postgresql預設為 8kb。

-i, --stripesize stripesize

gives the number of kilobytes for the granularity of the stripes.

stripesize must be 2^n (n = 2 to 9) for metadata in lvm1 format. for metadata in lvm2 format, the stripe size may be a larger power of 2 but must not exceed the physical extent size.

3. 建立快照時，指定的參數

chunksize, 最好和資料庫的塊大小一緻, 例如postgresql預設為 8kb。

-c, --chunksize chunksize

power of 2 chunk size for the snapshot logical volume between 4k and 512k.

例如：

#lvcreate -i 3 -i 8 -n lv01 -l 100%vg vgdata01

logical volume "lv01" created

xfs包含3個section，data, log, realtime files。

預設情況下 log存在data裡面，沒有realtime。所有的section都是由最小機關block組成，初始化xfs是-b指定block size。

2.1 data

包含 metadata(inode, 目錄, 間接塊), user file data, non-realtime files

data被拆分成多個allocation group，mkfs.xfs時可以指定group的個數，以及單個group的size。

group越多，可以并行進行的檔案和塊的allocation就越多。你可以認為單個組的操作是串行的，多個組是并行的。

但是組越多，消耗的cpu會越多，需要權衡。對于并發寫很高的場景，可以多一些組，（例如一台主機跑了很多小的資料庫，每個資料庫都很繁忙的場景下）

2.2 log

存儲metadata的log，修改metadata前，必須先記錄log，然後才能修改data section中的metadata。

也用于crash後的恢複。

2.3 realtime

被劃分為很多個小的extents, 要将檔案寫入到realtime section中，必須使用xfsctl改一下檔案描述符的bit位，并且一定要在資料寫入前完成。在realtime中的檔案大小是realtime extents的倍數關系。

allocation group數量和size相乘等于塊裝置大小。數量多少和使用者需求的并行度相關。

allocation group數量最好是下面邏輯卷對應pv數量的倍數，例如有3個pv，則ag可以是9個，或者900個。

log最好放在ssd上，速度越快越好。最好不要使用cgroup限制log塊裝置的iops操作。

realtime不需要的話，不需要建立。

-b size=8192 與資料庫塊大小一緻

-d agcount=9000,sunit=16,swidth=48

假設有9000個并發寫操作，使用9000個allocation groups

(機關512 bytes)與lvm或raid塊裝置的條帶大小對齊

與lvm或raid塊裝置條帶跨度大小對齊，以上對應3*8 例如 -i 3 -i 8。

例子

#mkfs.xfs -f -b size=8192 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01

meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=260417 blks

= sectsz=512 attr=2

data = bsize=8192 blocks=2343748608, imaxpct=5

= sunit=1 swidth=3 blks

naming =version 2 bsize=8192 ascii-ci=0

log =internal log bsize=8192 blocks=260413, version=2

= sectsz=512 sunit=1 blks, lazy-count=1

realtime =none extsz=8192 blocks=0, rtextents=0

nobarrier

largeio 針對資料倉庫，流媒體這種大量連續讀的應用

nolargeio 針對oltp

logbsize=262144 指定 log buffer

logdev= 指定log section對應的塊裝置，用最快的ssd。

noatime,nodiratime

swalloc 條帶對齊

#mount -t xfs -o nobarrier,nolargeio,logbsize=262144,noatime,nodiratime,swalloc /dev/mapper/vgdata01-lv01 /data01

控制檔案打開政策，略。

#mount -o noatime,swalloc /dev/mapper/vgdata01-lv01 /data01

mount: function not implemented

原因是用了目前核心不支援的塊大小，改成4096即可

[ 5736.642924] xfs (dm-0): file system with blocksize 8192 bytes. only pagesize (4096) or less will currently work.

[ 5736.695146] xfs (dm-0): sb validate failed with error -38.

排除

#mkfs.xfs -f -b size=4096 -d agcount=9000,sunit=16,swidth=48 /dev/mapper/vgdata01-lv01

meta-data=/dev/mapper/vgdata01-lv01 isize=256 agcount=9000, agsize=520834 blks

data = bsize=4096 blocks=4687497216, imaxpct=5

= sunit=2 swidth=6 blks

naming =version 2 bsize=4096 ascii-ci=0

log =internal log bsize=4096 blocks=520830, version=2

= sectsz=512 sunit=2 blks, lazy-count=1

realtime =none extsz=4096 blocks=0, rtextents=0

xfs(5) xfs(5)

name

xfs - layout of the xfs filesystem

description

an xfs filesystem can reside on a regular disk partition or on a logical volume. an xfs filesystem has up to three parts: a data section, a log section, and a realtime section. using the default

mkfs.xfs(8) options, the realtime section is absent, and the log area is contained within the data section. the log section can be either separate from the data section or contained within it. the

filesystem sections are divided into a certain number of blocks, whose size is specified at mkfs.xfs(8) time with the -b option.

the data section contains all the filesystem metadata (inodes, directories, indirect blocks) as well as the user file data for ordinary (non-realtime) files and the log area if the log is internal to the

data section. the data section is divided into a number of allocation groups. the number and size of the allocation groups are chosen by mkfs.xfs(8) so that there is normally a small number of equal-sized

groups. the number of allocation groups controls the amount of parallelism available in file and block allocation. it should be increased from the default if there is sufficient memory and a lot of allo-

cation activity. the number of allocation groups should not be set very high, since this can cause large amounts of cpu time to be used by the filesystem, especially when the filesystem is nearly full.

more allocation groups are added (of the original size) when xfs_growfs(8) is run.

the log section (or area, if it is internal to the data section) is used to store changes to filesystem metadata while the filesystem is running until those changes are made to the data section. it is

written sequentially during normal operation and read only during mount. when mounting a filesystem after a crash, the log is read to complete operations that were in progress at the time of the crash.

the realtime section is used to store the data of realtime files. these files had an attribute bit set through xfsctl(3) after file creation, before any data was written to the file. the realtime section

is divided into a number of extents of fixed size (specified at mkfs.xfs(8) time). each file in the realtime section has an extent size that is a multiple of the realtime section extent size.

each allocation group contains several data structures. the first sector contains the superblock. for allocation groups after the first, the superblock is just a copy and is not updated after mkfs.xfs(8).

the next three sectors contain information for block and inode allocation within the allocation group. also contained within each allocation group are data structures to locate free blocks and inodes;

these are located through the header structures.

each xfs filesystem is labeled with a universal unique identifier (uuid). the uuid is stored in every allocation group header and is used to help distinguish one xfs filesystem from another, therefore you

should avoid using dd(1) or other block-by-block copying programs to copy xfs filesystems. if two xfs filesystems on the same machine have the same uuid, xfsdump(8) may become confused when doing incremen-

tal and resumed dumps. xfsdump(8) and xfsrestore(8) are recommended for making copies of xfs filesystems.

operations

some functionality specific to the xfs filesystem is accessible to applications through the xfsctl(3) and by-handle (see open_by_handle(3)) interfaces.

mount options

refer to the mount(8) manual entry for descriptions of the individual xfs mount options.

PostgreSQL on XFS 性能優化 - 1概要1. 邏輯卷優化部分2. XFS mkfs 優化部分3. XFS mount 優化部分4. xfsctl 優化部分[排錯][參考]

繼續閱讀

Testlink安裝部署之XAMPP

set define off關閉替代變量功能

報錯：'mysql' 不是内部或外部指令，也不是可運作的程式或批處理檔案。

Linxu常用指令技巧彙總

ERROR 1 (HY000): Can't create/write to file '/tmp/#sql_4188_1.MYI' (Errcode: 28)

艱難安裝LDAP,SSL認證

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

MySQL的4種隔離級别？出現問題

XX系統實施過程問題總結

無元件上傳圖檔到資料庫中，最完整解決方案

【MySQL資料庫】資料庫索引事務1.索引2.事務

neo4j之cypher使用文檔

NOSQL安全攻擊

mybatis_入門程式Mybatis入門

登入plsql 報錯 the account is locked --使用者被鎖

SequoiaDB巨杉資料庫C++驅動概述