Django 開發連載 - 與 Elasticsearch 的互動

給 CSDN 部落格建立一個全文索引應用

–2018.08.11

首先要解決的問題是，Python 通路 ElasticSearch 資料庫的接口
在 Django 的網頁架構基礎上，将使用者請求發送給 ElasticSearch,傳回結果
需要儲存使用者每一次的搜尋關鍵字
提供并發可靠保證

Python 下可以與 ElasticSearch 互動的用戶端有兩個:

elasticsearch-py
elasticsearch-dsl

elasticsearch-dsl 是建立在 elasticsearch-py 之上的，相比之下，更加符合 python 使用者的習慣

elasticsearch-py 更加靈活和易于擴充。

Since I was using Django — which is written in Python — it was easy to interact with ElasticSearch. There are two client libraries to interact with ElasticSearch with Python. There’s ++elasticsearch-py++, which is the official low-level client. And there’s ++elasticsearch-dsl++, which is build upon the former but gives a higher-level abstraction with a bit less functionality.

elasticsearch-py 的用法如下：

from datetime import datetime
from elasticsearch import Elasticsearch
es = Elasticsearch()

doc = {
    'author': 'kimchy',
    'text': 'Elasticsearch: cool. bonsai cool.',
    'timestamp': datetime.now(),
}
res = es.index(index="test-index", doc_type='tweet', id=, body=doc)
print(res['result'])

res = es.get(index="test-index", doc_type='tweet', id=)
print(res['_source'])

es.indices.refresh(index="test-index")

res = es.search(index="test-index", body={"query": {"match_all": {}}})
print("Got %d Hits:" % res['hits']['total'])
for hit in res['hits']['hits']:
    print("%(timestamp)s %(author)s: %(text)s" % hit["_source"])

查詢部分更加接近于 ElasticSearch DSL 原始的文法

在 Kibana 中調試 ElasticSearch 查詢的時候，通常我們會使用 ElasticSearch 文檔中所教授的文法，這種文法拿來在 elasticSearch-py 下直接可以運作：

from elasticsearch import Elasticsearch
client = Elasticsearch()

response = client.search(
    index="my-index",
    body={
      "query": {
        "filtered": {
          "query": {
            "bool": {
              "must": [{"match": {"title": "python"}}],
              "must_not": [{"match": {"description": "beta"}}]
            }
          },
          "filter": {"term": {"category": "search"}}
        }
      },
      "aggs" : {
        "per_tag": {
          "terms": {"field": "tags"},
          "aggs": {
            "max_lines": {"max": {"field": "lines"}}
          }
        }
      }
    }
)

for hit in response['hits']['hits']:
    print(hit['_score'], hit['_source']['title'])

for tag in response['aggregations']['per_tag']['buckets']:
    print(tag['key'], tag['max_lines']['value'])

elasticsearch-dsl 的用法如下：用函數來封裝了一層 DSL

from elasticsearch import Elasticsearch
from elasticsearch_dsl import Search

client = Elasticsearch()

s = Search(using=client, index="my-index") \
    .filter("term", category="search") \
    .query("match", title="python")   \
    .exclude("match", description="beta")

s.aggs.bucket('per_tag', 'terms', field='tags') \
    .metric('max_lines', 'max', field='lines')

response = s.execute()

for hit in response:
    print(hit.meta.score, hit.title)

for tag in response.aggregations.per_tag.buckets:
    print(tag.key, tag.max_lines.value)

如此看來， elasticsearch-py 提供了全套基礎建設，包括 DSL，而 elasticsearch-dsl 隻是對其中的檢索功能做了封裝，而本身還依賴于 elasticsearch-py 提供的底層架構

try:
    import _pickle as pickle
except ImportError:
    import pickle
import os
from elasticsearch import Elasticsearch

class loadJson(object):
    def loadAllFiles(self,path):
        localPath = os.fsencode(path)
        for file in os.listdir(localPath):
            filename = path+os.fsdecode(file)
            filehandler = open(filename,'rb')
            jsonObj = pickle.load(filehandler)
            filehandler.close()
            self.saveToElasticSearch(jsonObj)
            print(jsonObj)

    def saveToElasticSearch(self,doc):
        es = Elasticsearch("http://192.168.1.112:9200")
        es.index(index="csdnblog",doc_type="CSDNPost",body=doc)


utlLoader = loadJson()
utlLoader.loadAllFiles("G:\\SideProjects\\CSDN_Blogs\\PostThread\\")

上面的代碼，作用是将我從 CSDN 中爬取的 Blog 儲存為本地 Json 檔案之後，反序列化這些 Json 檔案，最終存入 ElasticSearch 做全文索引。

安裝 ElasticSearch 用戶端

在實作上述的功能之前，我們還必須在 virtualenv 下建立的Django 中安裝 ElasticSearch 用戶端。

定位到 virtualenv 目錄，激活 virtualenv 環境，安裝 elasticsearch 用戶端：

activate.bat
pip3 install elasticsearch
pip3 list

安裝完畢之後，使用 pip3 list 來檢視已經安裝的包.

此時安裝的便是低層次的 elasticsearch 用戶端，接近于 elasticsearch DSL 文法的用戶端，而 elasticsearch-dsl 便是基于這個庫二次開發的庫。安裝的時候加上字尾名 -dsl便可：

pip3 install elasticsearch-dsl

提供一個通路 elasticsearch 的入口

之前的 Django 項目，我們在 SqlHub 下順利可以實作請求視圖函數之間的關聯。以此為基礎，在 Index.html 中增加一個表單，指向即将建立的視圖函數，用來傳回從 elaticsearch 請求的結果。

關鍵點是在 SqlHub\Index.html 中建立動作 FullTextSearch 以及在 views.py 中配置好動作的視圖函數 fulltextsearch，使其可将結果展現。

建立搜尋表單

<form action="/SqlHub/FullTextSearch" method = "post">

{% csrf_token %}

    Search Key Word:<input type = text name = keyword><br>

    <input type = submit>

</form>

參考文章：

https://medium.freecodecamp.org/elasticsearch-with-django-the-easy-way-909375bc16cb

該文告訴我們的是如何使用 elasticsearch-dsl 實作 CRUD 的操作，并且 Django 項目中無需配置 elasticsearch ，僅需要安裝 elasticsearch 庫并正确引用即可。

在這裡我隻是做了一個參考，是以本次使用的是純正的 elasticsearch-py 版本。

https://elasticsearch-py.readthedocs.io/en/master/

這是 elasticsearch Python 用戶端的官方文檔。可以找到一切有關 Python 通路 elasticsearch 的方法

實作簡單的 elasticsearch 全文索引的視圖函數

該視圖函數接收使用者送出的請求，并将該請求丢給 elasticsearch 處理，接收到結果後，調用 elasticsearch 展現界面( es.html) 來展示此次請求的結果

from django.shortcuts import render_to_response, render
from SqlHub.models import SqlNew
from django.template import RequestContext
from django.http import HttpResponseRedirect
import time
import datetime
from elasticsearch import Elasticsearch

def archive(request):
    posts = SqlNew.objects.all()
    curtime = datetime.datetime.now()
    context = {"posts": posts, "curtime": curtime}
    return render(request, 'Index.html', context)



def newone(request):
    curtime = datetime.datetime.now()
    oneblog = SqlNew()
    oneblog.title = request.POST["title"]
    oneblog.body = request.POST["body"]
    oneblog.timestamp = curtime
    oneblog.save()
    return HttpResponseRedirect('/SqlHub')



def fulltextsearch(request):

    es = Elasticsearch({"192.168.1.10:9200"})
    ret = es.search(index="csdnblog2"
                    ,body= {
                          "query":{
                            "term":{"pageContent": "cluster"}
                          }
                        }
                    )
    resultback = ret["hits"]["hits"]
    context_rs = {"results":resultback}
    return render(request,'es.html',context_rs)

提供一個展現 elasticsearch 全文索引查詢結果的模闆

在該模闆上也要實作使用者送出 elaticsearch 請求的動作。

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>




<form action="/SqlHub/FullTextSearch" method = "post">

{% csrf_token %}

    Search Key Word:<input type = text name = keyword><br>

    <input type = submit>

</form>


{% for item in results %}

{% for key,value in item.items %}
    {% if key == "_source" %}
        {% for key1,value1 in value.items %}
                {% if key1 == "article_url" %}
                    {{ value1 }}<br>
                {% endif %}
        {% endfor %}
    {% endif %}
{% endfor %}

{% endfor %}


</body>
</html>

Django 是無法通路 Python 資料字典的，是以隻能用這類方法解決一下。或者将資料字典改為對象。

最終還要配置表單動作與視圖函數的映射關系：

from django.urls import path, include
import SqlHub.views

urlpatterns = [        path(r'',SqlHub.views.archive),
                       path(r'New', SqlHub.views.newone),
                       path(r'FullTextSearch', SqlHub.views.fulltextsearch),

               ]

Django 開發連載 - 與 Elasticsearch 的互動

提供一個通路 elasticsearch 的入口

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入