天天看點

MyRocks引擎:入坑須知

我們a 最近釋出了Percona Server with MyRocks的GA版本。本文會讓你了解到為什麼Facebook要在生産環境使用MyRocks了。如果你使用Percona repositories ,你能夠簡單的安裝MyRocks插件并且用ps-admin --enable-rocksdb來啟動它。

Percona recently released Percona Server with MyRocks as GA. You can see how Facebook explains wins they see in production with MyRocks. Now if you use Percona repositories, you can simply install MyRocks plugin and enable it with ps-admin --enable-rocksdb.

将它和典型的InnoDB進行比較時,存在一些主要和次要差別,我想在此強調一下。第一個主要的差別是MyRocks (based on RocksDB) 使用Log Structured Merge Tree資料結構,不是InnoDB的B+ tree資料結構。

There are some major and minor differences when comparing it to typical InnoDB deployments, and I want to highlight them here. The first important difference is that MyRocks (based on RocksDB) uses Log Structured Merge Tree data structure, not a B+ tree like InnoDB.

你能夠在我釋出在DZone的文章 當中了解到更多關于LSM引擎的資訊 。總的來說,LSM引擎更适合寫密集型的應用場景,讀取速度可能會比較慢,全表掃描對于引擎來說負擔會太重。當使用MyRocks作為應用底層時,需要特别注意這一點。MyRocks 不是加強版的InnoDB,也不能在所有應用場景下替換InnoDB。他有自己的優勢/局限,就像InnoDB一樣,你需要根據你資料的存取模式來選擇使用哪一個引擎。

You learn more about the LSM engine in my article for DZone.The summary is that an LSM data structure is good for write-intensive workloads, with the expense that reads might slow down (both point reads and especially range reads) and full table scans might be too heavy for the engine. This is important to keep in mind when designing applications for MyRocks. MyRocks is not an enhanced InnoDB, nor a one-size-fits-all replacement for InnoDB. It has its own pros/cons just like InnoDB. You need to decide which engine to use based on your applications data access patterns.

還有什麼其他需要注意的差別的嗎?

What other differences should you be aware of?

讓我們看一下目錄結構。目前,所有的表和資料庫都是存儲在mysqldir的.rocksdb隐藏目錄當中。名字和位址可以改變,但是所有的資料庫當中的所有表還是存儲在一系列的.sst檔案當中,沒有per-table / per-database的區分。

Let’s look at the directory layout. Right now, all tables and all databases are stored in a hidden .rocksdb directory inside mysqldir. The name and location can be changed, but still all tables from all databases are stored in just a series of .sst files. There is no per-table / per-database separation.

預設情況下,MyRocks 使用LZ4來壓縮所有的表。能夠通過改變rocksdb_default_cf_options當中的變量來改變壓縮的設定。預設值為compression=kLZ4Compression;bottommost_compression=kLZ4Compression。我們選擇 LZ4,是因為它在很小的cpu負載下提供了可接受的壓縮比。其他的壓縮方式包括Zlib 和 ZSTD,或者直接不壓縮。你能夠在Peter和我的文章當中學習到更多關于壓縮比VS速度 的資訊。為了比較裝載了來自我自制路由器軟體的流量統計資料的MyRocks表的實體大小,我使用了為pmacct收集器軟體建立的下表。

By default in Percona Server for MySQL, MyRocks will use LZ4 compression for all tables. You can change compression settings by changing the rocksdb_default_cf_options server variable. By default it set to compression=kLZ4Compression;bottommost_compression=kLZ4Compression. We chose LZ4 compression as it provides acceptable compression level with very little CPU overhead. Other possible compression methods are Zlib and ZSTD, or no compression at all. You can learn more about compression ratio vs. speed in Peter’s and my post.To compare the data size of a MyRocks table loaded with traffic statistic data from my homebrew router, I’ve used the following table created for pmacct collector:

正如你所看見的,表中有大概2000萬條資料。MyRocks (用預設的 LZ4 壓縮)使用了828MB。 InnoDB (預設,未壓縮) 使用了3760MB。

as you can see, there are about 20mln records in this table. MyRocks (with default LZ4 compression) uses 828MB. InnoDB (uncompressed) uses 3760MB.

你能夠在 .rocksdb 目錄的LOG檔案當中找到RocksDB 執行個體的詳細資訊。檢視這些日志,可以進行更詳細的診斷。你也能夠使用SHOW ENGINE ROCKSDB STATUS指令,但是這會比SHOW ENGINE INNODB STATUS傳回的内容更複雜,需要消耗大量的精力和時間去了解。

You can find very verbose information about your RocksDB instance in the LOG file located in .rocksdb directory. Check this file for more diagnostics. You can also try the SHOW ENGINE ROCKSDB STATUS command, but it is even more cryptic than SHOW ENGINE INNODB STATUS. It takes time to parse and to understand it.

注意,現在MyRocks隻支援 READ-COMMITTED 隔離級别。并沒有 REPEATABLE-READ 隔離級别,也沒有像InnoDB裡一樣的gap鎖。理論上, RocksDB 隻支援 SNAPSHOT 隔離級别。然而,MySQL 當中并沒有SNAPSHOT 隔離級别的概念,是以我們沒有實作特殊的文法去支援。如果你對這個感興趣,請聯系我們。

Keep in mind that at this time MyRocks supports only READ-COMMITTED isolation levels. There is no REPEATABLE-READ isolation level and no gap locking like in InnoDB. In theory, RocksDB should support SNAPSHOT isolation level. However, there is no notion of SNAPSHOT isolation in MySQL so we have not implemented the special syntax to support this level. Please let us know if you would be interested in this.

當你試圖加載大量的資料到MyRocks 當中時,你可能會遇到問題(不幸的是這個可能是你使用MyRocks 時的首次工作,當你使用LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table 或者 ALTER TABLE innodb_table ENGINE=ROCKSDB)。假如你的表太大,并且你沒有足夠的記憶體,RocksDB 就會崩潰。在生産環境中,你應該為你加載資料的session設定rocksdb_bulk_load=1。了解更多請檢視文章:https://github.com/facebook/mysql-5.6/wiki/data-loading。

For bulk loads, you may face problems trying to load large amounts of data into MyRocks (and unfortunately this might be the very first operation when you start playing with MyRocks as you try to LOAD DATA, INSERT INTO myrocks_table SELECT * FROM innodb_table or ALTER TABLE innodb_table ENGINE=ROCKSDB). If your table is big enough and you do not have enough memory, RocksDB crashes. As a workaround, you should set rocksdb_bulk_load=1 for the session where you load data. See more on this page: https://github.com/facebook/mysql-5.6/wiki/data-loading.

在MyRocks中的Block cache有點類似于innodb_buffer_pool_size,但是對于MyRocks它主要有利于讀取資料。您可能需要調整rocksdb_block_cache_size設定。另外,它預設使用buffered reads,在這種情況下,作業系統的cache緩存着壓縮的資料,而RockDB block cache 會緩存未壓縮的資料。你可以保持這種兩層的緩存機制,或者你可以修改rocksdb_use_direct_reads=ON關閉緩存,強制block cache直接讀取。LSM樹的本質要求當一層變滿時,有一個合并過程将壓縮資料推到下一層。這個過程可能相當密集并會影響使用者查詢速度。可以将其調整為不那麼密集。

Block cache in MyRocks is somewhat similar to innodb_buffer_pool_size, however for MyRocks it’s mainly beneficial for reads. You may want to tune the rocksdb_block_cache_size setting. Also keep in mind it uses buffered reads by default, and in this case the OS cache contains cached compressed data and RockDB block cache will contain uncompressed data. You may keep this setup to have two levels of cache, or you can disable buffering by forcing block cache to use direct reads with rocksdb_use_direct_reads=ON. The nature of LSM trees requires that when a level becomes full, there is a merge process that pushes compacted data to the next level. This process can be quite intensive and affect user queries. It is possible to tune it to be less intensive.

現在還沒有類似于Percona XtraBackup一樣的熱備軟體來執行MyRocks表的熱備份(我們正在研究這個)。你可以使用mysqldump 來進行邏輯備份,或者使用檔案系統層面的snapshots ,比如LVM 或 ZFS 。

Right now there is no hot backup software like Percona XtraBackup to perform a hot backup of MyRocks tables (we are looking into this). At this time you can use mysqldump for logical backups, or use filesystem-level snapshots like LVM or ZFS.

在我們的官方文檔當中,你可以了解到更多關于MyRocks 的優勢和局限性.

You can find more MyRocks specifics and limitations in our docs at https://www.percona.com/doc/percona-server/LATEST/myrocks/limitations.html.

我們期待大家的回報 。

We are looking for feedback on your MyRocks experience!

##更新(2018-02-12) 在獲得Facebook MyRocks 團隊的回報後,我對原來的文章進行了更新。

UPDATES (12-Feb-2018) Updates to the original post with the feedback provided by Facebook MyRocks team

MyRocks引擎:入坑須知

隔離級别 MyRocks 支援READ COMMITTED 和 REPEATABLE READ隔離級别,不支援 SERIALIZABLE。想了解更詳細的資訊可以閱讀https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation。MyRocks 實作REPETABLE READ的方法和InnoDB不一樣 — MyRocks 使用類似PostgreSQL 的snapshot isolation。 在Percona server 中,不允許在MyRocks 表上使用 REPEATABLE READ 隔離級别,因為REPEATABLE READ 隔離級别在innodb和myrocks db上的處理方式不一樣。

MyRocks引擎:入坑須知

Isolation Levels MyRocks supports READ COMMITTED and REPEATABLE READ. MyRocks does not support SERIALIZABLE. Please read https://github.com/facebook/mysql-5.6/wiki/Transaction-Isolation for details. The way to implement REPETABLE READ was different from MyRocks and InnoDB — MyRocks used PostgreSQL style snapshot isolation. In Percona Server we do not allow REPEATABLE READ for MyRocks tables, as the behavior will be different from InnoDB.

MyRocks引擎:入坑須知

線上二進制備份工具 網上有一個開源的線上二進制備份工具—myrocks_hotabackup:https://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup

MyRocks引擎:入坑須知

Online Binary Backup Tool There is an open source online binary backup tool for MyRocks — myrocks_hotabackuphttps://github.com/facebook/mysql-5.6/blob/fb-mysql-5.6.35/scripts/myrocks_hotbackup

原文釋出時間為:2018-03-27

本文作者:魏新平

繼續閱讀