PostgreSQL 作為圖資料庫存儲引擎

cayley是go語言寫的一個圖資料庫引擎，支援restful api，内置查詢編輯器和可視化，支援mql和javascript查詢接口，後端存儲支援檔案格式，postgresql，mongodb，leveldb，bolt。子產品化設計，擴充後端存儲非常容易。

本文将以postgresql為例，示範一下cayley的使用。

安裝go：

yum install -y go

執行以下指令，克隆cayley和依賴：

mkdir -p ~/cayley && cd ~/cayley

export gopath=`pwd`

export path=$path:~/cayley/bin

mkdir -p bin pkg src/github.com/google

cd src/github.com/google

git clone https://github.com/google/cayley

cd cayley

go get github.com/tools/godep

godep restore

go build ./cmd/cayley

樣本資料：

$ ll data

-rw-rw-r--. 1 postgres postgres 26m jan 17 21:45 30kmoviedata.nq.gz

-rw-rw-r--. 1 postgres postgres 463 jan 17 21:45 testdata.nq

$ gunzip 30kmoviedata.nq.gz

cayley使用幫助：

$ ./cayley --help

no command --help

usage:

cayley command [flags]

commands:

init create an empty database.

load bulk-load a quad file into the database.

http serve an http endpoint on the given host and port.

dump bulk-dump the database into a quad file.

repl drop into a repl of the given query language.

version version information.

flags:

-alsologtostderr=false: log to standard error as well as files

-assets="": explicit path to the http assets.

-config="": path to an explicit configuration file.

-db="memstore": database backend.

-dbpath="/tmp/testdb": path to the database.

-dump="dbdump.nq": quad file to dump the database to (".gz" supported, "-" for stdout).

-dump_type="quad": quad file format ("json", "quad", "gml", "graphml").

-format="cquad": quad format to use for loading ("cquad" or "nquad").

-host="127.0.0.1": host to listen on (defaults to all).

-ignoredup=false: don't stop loading on duplicated key on add

-ignoremissing=false: don't stop loading on missing key on delete

-init=false: initialize the database before using it. equivalent to running `cayley init` followed by the given command.

-load_size=10000: size of quadsets to load

-log_backtrace_at=:0: when logging hits line file:n, emit a stack trace

-log_dir="": if non-empty, write log files in this directory

-logstashtype="": enable logstash logging and define the type

-logstashurl="172.17.42.1:5042": logstash url and port

-logtostderr=false: log to standard error instead of files

-port="64210": port to listen on.

-prof="": output profiling file.

-quads="": quad file to load before going to repl.

-query_lang="gremlin": use this parser as the query language.

-read_only=false: disable writing via http.

-replication="single": replication method.

-stderrthreshold=0: logs at or above this threshold go to stderr

-timeout=30s: elapsed time until an individual query times out.

-v=0: log level for v logs

-vmodule=: comma-separated list of pattern=n settings for file-filtered logging

假設已有一個postgresql資料庫。

ip : 192.168.150.132

port : 1921

dbname : postgres

user : digoal

pwd : digoal_pwd

初始化

./cayley init -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable"

導入資料

./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable"

50億測試資料約2tb。

開啟repl或http接口服務。

./cayley repl -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"

或

./cayley http -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"

使用http接口的圖例：

query shape：

後端是postgresql時，cayley自動将mql或javascript自動轉換成sql到資料庫查詢，并傳回結果。

對于postgresql作為後端的場景，優化的手段：

1. 使用gpu加速hash join和資料掃描。

2. 使用分區表，減少無用塊掃描。

3. 其他通用的pg優化手段

如果資料量大到單庫的計算資源和io資源性能支撐不住，可以用greenplum來實作分布式查詢。

查詢接口：

javascript/gremlin api documentation

圖對象

根據節點id，檢索，傳回路徑

路徑對象

路徑相交，節點比對等

查詢路徑對象

數值轉換，等。

[參考]1. https://github.com/google/cayley

PostgreSQL 作為圖資料庫存儲引擎

繼續閱讀

Go學習筆記: 結構體、方法、接口

06結構體、接口結構體接口

go語言筆記（結構體、方法、接口）go語言筆記（結構體、方法、接口）

Go中結構體和接口的定義

Rich Domain Model

DB2表壓縮功能

華為筆試軟體

項目管理那些事兒

OS --written test1

OS-written test2

壓縮編碼M-JPEG、MPEG4、H.264

Linux之父警告全球程式員：我剛釋出的5.12核心有bug，你們千萬别用

轉詳解C#資料庫存取圖檔三大方式

為什麼要選擇UniDAC

BMP檔案結構及圖像每行位元組計算方法

磁盤結構及在Linux中的命名