cayley是go語言寫的一個圖資料庫引擎,支援restful api,内置查詢編輯器和可視化,支援mql和javascript查詢接口,後端存儲支援檔案格式,postgresql,mongodb,leveldb,bolt。子產品化設計,擴充後端存儲非常容易。
本文将以postgresql為例,示範一下cayley的使用。
安裝go:
yum install -y go
執行以下指令,克隆cayley和依賴:
mkdir -p ~/cayley && cd ~/cayley
export gopath=`pwd`
export path=$path:~/cayley/bin
mkdir -p bin pkg src/github.com/google
cd src/github.com/google
git clone https://github.com/google/cayley
cd cayley
go get github.com/tools/godep
godep restore
go build ./cmd/cayley
樣本資料:
$ ll data
-rw-rw-r--. 1 postgres postgres 26m jan 17 21:45 30kmoviedata.nq.gz
-rw-rw-r--. 1 postgres postgres 463 jan 17 21:45 testdata.nq
$ gunzip 30kmoviedata.nq.gz
cayley使用幫助:
$ ./cayley --help
no command --help
usage:
cayley command [flags]
commands:
init create an empty database.
load bulk-load a quad file into the database.
http serve an http endpoint on the given host and port.
dump bulk-dump the database into a quad file.
repl drop into a repl of the given query language.
version version information.
flags:
-alsologtostderr=false: log to standard error as well as files
-assets="": explicit path to the http assets.
-config="": path to an explicit configuration file.
-db="memstore": database backend.
-dbpath="/tmp/testdb": path to the database.
-dump="dbdump.nq": quad file to dump the database to (".gz" supported, "-" for stdout).
-dump_type="quad": quad file format ("json", "quad", "gml", "graphml").
-format="cquad": quad format to use for loading ("cquad" or "nquad").
-host="127.0.0.1": host to listen on (defaults to all).
-ignoredup=false: don't stop loading on duplicated key on add
-ignoremissing=false: don't stop loading on missing key on delete
-init=false: initialize the database before using it. equivalent to running `cayley init` followed by the given command.
-load_size=10000: size of quadsets to load
-log_backtrace_at=:0: when logging hits line file:n, emit a stack trace
-log_dir="": if non-empty, write log files in this directory
-logstashtype="": enable logstash logging and define the type
-logstashurl="172.17.42.1:5042": logstash url and port
-logtostderr=false: log to standard error instead of files
-port="64210": port to listen on.
-prof="": output profiling file.
-quads="": quad file to load before going to repl.
-query_lang="gremlin": use this parser as the query language.
-read_only=false: disable writing via http.
-replication="single": replication method.
-stderrthreshold=0: logs at or above this threshold go to stderr
-timeout=30s: elapsed time until an individual query times out.
-v=0: log level for v logs
-vmodule=: comma-separated list of pattern=n settings for file-filtered logging
假設已有一個postgresql資料庫。
ip : 192.168.150.132
port : 1921
dbname : postgres
user : digoal
pwd : digoal_pwd
初始化
./cayley init -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable"
導入資料
./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable"
50億測試資料約2tb。
開啟repl或http接口服務。
./cayley repl -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"
或
./cayley http -db=sql -dbpath="postgres://digoal:[email protected]:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210"
使用http接口的圖例:
query shape:
後端是postgresql時,cayley自動将mql或javascript自動轉換成sql到資料庫查詢,并傳回結果。
對于postgresql作為後端的場景,優化的手段:
1. 使用gpu加速hash join和資料掃描。
2. 使用分區表,減少無用塊掃描。
3. 其他通用的pg優化手段
如果資料量大到單庫的計算資源和io資源性能支撐不住,可以用greenplum來實作分布式查詢。
查詢接口:
javascript/gremlin api documentation
圖對象
根據節點id,檢索,傳回路徑
路徑對象
路徑相交,節點比對等
查詢路徑對象
數值轉換,等。
[參考]1. https://github.com/google/cayley