Spark2.2+ES6.4.2（三十二）：ES API之index的create/update/delete/open/close（建立index時設定setting，并建立index後根據avro模闆動态設定index的mapping）

要想通過ES API對es的操作，必須擷取到TransportClient對象，讓後根據TransportClient擷取到IndicesAdminClient對象後，方可以根據IndicesAdminClient對象提供的方法對ES的index進行操作：create index,update index(update index settings,update index mapping),delete index,open index,close index。

準備工作（建立TransportClient，IndicesAdminClient）

第一步：導入ES6.4.2的依賴包：

<dependencies>
        <!--Spark -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql-kafka-0-10_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-10_2.11</artifactId>
            <version>2.2.0</version>
        </dependency>
        <dependency>
            <groupId>com.twitter</groupId>
            <artifactId>bijection-avro_2.11</artifactId>
            <version>0.9.5</version>
        </dependency>
        <dependency>
            <groupId>com.databricks</groupId>
            <artifactId>spark-avro_2.11</artifactId>
            <version>3.2.0</version>
            <type>jar</type>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch</groupId>
            <artifactId>elasticsearch-spark-20_2.11</artifactId>
            <version>6.4.2</version>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>transport</artifactId>
            <version>6.4.2</version>
        </dependency>
    </dependencies>

備注：這裡依賴可能有點多，elastricsearch api操作的話就是依賴org.elasticsearch.client。

第二步：擷取TransportClient，IndicesAdminClient對象：

/**
     * 擷取ES Client API對象。
     * */
    public static TransportClient getClient() {
        Map<String, String> esOptionsMap = getSparkESCommonOptions();

        return getClient(esOptionsMap);
    }

    /**
     * 擷取ES Client API對象。
     * */
    public static TransportClient getClient(Map<String, String> esOptionsMap) {
        Settings settings = Settings.builder()//
                .put("cluster.name", esOptionsMap.get("cluster.name")) //
                .put("client.transport.sniff", true)//
                .build();

        PreBuiltTransportClient preBuiltTransportClient = new PreBuiltTransportClient(settings);
        TransportClient client = preBuiltTransportClient;

        // 192.168.1.120,192.168.1.121,192.168.1.122,192.168.1.123
        String esNodeStr = esOptionsMap.get("es.nodes");
        String[] esNodeArr = esNodeStr.split(",");

        try {
            for (String esNode : esNodeArr) {
                client.addTransportAddress(new TransportAddress(InetAddress.getByName(esNode), 9300));
            }
        } catch (UnknownHostException e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }

        return client;
    }

    public static IndicesAdminClient getAdminClient() {
        Map<String, String> esOptionsMap = getSparkESCommonOptions();

        return getAdminClient(esOptionsMap);
    }

    public static IndicesAdminClient getAdminClient(Map<String, String> esOptionsMap) {
        TransportClient client = getClient(esOptionsMap);
        IndicesAdminClient adminClient = client.admin().indices();
        return adminClient;
    }

備注：其中getSparkESCommonOptions()中配置對象包含：

cluster.name=es-application
es.nodes=192.168.1.120,192.168.1.121,192.168.1.122,192.168.1.123
es.port=9200
es.index.auto.create=true
pushdown=true
es.nodes.wan.only=true
es.mapping.date.rich=false #//設定讀取es中date資料類型字段時，把它當做string來讀取。
es.scroll.size=10000

ES API之Exists/Create Index：

建立index之前，需要判斷index及其對應的類型是否存在，使用這個方法：

/**
     * 是否ES包含某個索引類型
     * 
     * @param indexName
     *            index
     * @param indexType
     *            index對應的type
     * */
    public static boolean typeExists(String indexName, String indexType) {
        TypesExistsResponse typeResponse = getAdminClient().prepareTypesExists(indexName).setTypes(indexType).execute().actionGet();
        if (typeResponse.isExists()) {
            return true;
        }
        return false;
    }

    /**
     * 判斷ES中是否存在某個index<br>
     * 是否包含類型，待驗證，看别人調用時是不需要帶類型的。
     * */
    public static boolean indexExists(String... indices) {
        IndicesExistsRequest request = new IndicesExistsRequest(indices);
        IndicesExistsResponse response = getAdminClient().exists(request).actionGet();
        if (response.isExists()) {
            return true;
        }
        return false;
    }

建立index，包含兩種：不指定mapping和isettings隻建立一個空的index；指定mapping和settings建立複雜的index。

建立一個空的index：

/**
     * 建立簡單索引——沒有指定mapping<br>
     * 此時資料插入時，會讀取資料的資料的字段名稱，自動建立mapping字段（但是，存在問題資料類型不能完好的控制，比如double類型可能會被比對為float，date類型的格式消失）
     * */
    public static boolean indexCreate(String indexName) {
        CreateIndexResponse response = getAdminClient().prepareCreate(indexName).get();
        return response.isAcknowledged();
    }

備注：此時資料插入時，會讀取資料的資料的字段名稱，自動建立mapping字段（但是，存在問題資料類型不能完好的控制，比如double類型可能會被比對為float，date類型的格式消失）

建立複雜的index：

/**
     * 建立複雜索引(類型/mapping)，指定索引的setting和mapping，其中mappingSource是一個json資料字元串。
     * 
     * @param indexName
     *            索引名
     * @param indexType
     *            索引類型名
     * @param builder
     *            索引mapping
     */
    public static boolean indexCreate(String indexName, String indexType, XContentBuilder builder) {
        Settings settings = Settings.builder() //
                .put("index.mapping.ignore_malformed", true)//
                .put("index.refresh_interval", "60s") //
                .put("index.number_of_shards", 4)//
                .put("index.number_of_replicas", 0)//
                .put("index.max_result_window", 500000)//

                .put("index.translog.durability", "async")//
                .put("index.translog.sync_interval", "120s")//
                .put("index.translog.flush_threshold_size", "2gb")//

                .put("index.merge.scheduler.max_thread_count", 1)//
                .build();

        return indexCreate(indexName, indexType, builder, settings);
    }

    /**
     * 建立複雜索引(類型/mapping)，指定索引的setting和mapping，其中mappingSource是一個json資料字元串。
     * 
     * @param indexName
     *            索引名
     * @param indexType
     *            索引類型名
     * @param builder
     *            索引mapping
     * @param settings
     *            索引settings<br>
     *            setting http://192.168.1.120:9200/twitter/_settings?pretty<br>
     *            "settings":<br>
     *            {<br>
     *            ----"index":<br>
     *            ----{<br>
     *            --------"mapping":<br>
     *            --------{<br>
     *            ------------"ignore_malformed":"true"<br>
     *            --------},<br>
     *            --------"refresh_interval":"60s",<br>
     *            --------"number_of_shards":"4",<br>
     *            --------"translog":<br>
     *            --------{<br>
     *            ------------"flush_threshold_size":"2048m",<br>
     *            ------------"sync_interval":"120s",<br>
     *            ------------"durability":"async"<br>
     *            --------},<br>
     *            --------"provided_name":"indexName",<br>
     *            --------"merge":{<br>
     *            ------------"scheduler":<br>
     *            ------------{<br>
     *            ----------------"max_thread_count":"1"<br>
     *            ------------}<br>
     *            --------},<br>
     *            --------"max_result_window":"500000",<br>
     *            --------"creation_date":"1540781909323",<br>
     *            --------"number_of_replicas":"0",<br>
     *            --------"uuid":"5c079b5tQrGdX0fF23xtQA",<br>
     *            --------"version":{"created":"6020499"}<br>
     *            ----}<br>
     *            }<br>
     */
    public static boolean indexCreate(String indexName, String indexType, XContentBuilder builder, Settings settings) {
        if (indexExists(indexName)) {
            return false;
        }

        // CreateIndexResponse準備建立索引，增加setSetting()方法可以設定setting參數，否則将會按預設設定
        CreateIndexResponse cIndexResponse = getAdminClient().prepareCreate(indexName)//
                .setSettings(settings)// setting
                .addMapping(indexType, builder)// type,mapping 這種方式也可以，經過測試。
                .get();

        return cIndexResponse.isAcknowledged();
    }

如何根據Avro建立動态生成Mapping呢？

/**
     * 重建index
     * 
     * @throws IOException
     * */
    protected void createIndex(String indexName, String indexType) throws IOException {
        Tuple3<List<String>, Map<String, String>, ExpressionEncoder<Row>> src = getTargetSchema(srcSchemaKey, true);

        Map<String, Map<String, String>> extFields = new HashMap<String, Map<String, String>>();
        Map<String, String> insertDateProperty = new HashMap<String, String>();
        insertDateProperty.put("type", "date");
        insertDateProperty.put("format", "yyyy-MM-dd");
        extFields.put("index_date", insertDateProperty);
        Map<String, String> typeProperty = new HashMap<String, String>();
        typeProperty.put("type", "keyword");
        extFields.put("type", typeProperty);

        XContentBuilder mappingSource = getMapping(indexType, src._2(), extFields);

        if (!indexCreate(indexName, indexType, mappingSource)) {
            throw new RuntimeException("重新建立index" + indexName + "時，設定mapping失敗！");
        }
    }
        /**
     * 
     * @param indexType
     *            index類型
     * @param schemaColVsTypeMap
     *            從*.avsc schema檔案中讀取出的字段，格式:colName vs colType
     * @param extFields
     *            新增擴充字段（在*.avsc schema檔案中沒有包含的字段）<br>
     * @return mapping：<br>
     *         {<br>
     *         ----"mrs_rsrp_d_2018.10.26":<br>
     *         ----{<br>
     *         --------"aliases":{},<br>
     *         --------"mappings":<br>
     *         --------{<br>
     *         -----------"_doc":{<br>
     *         -----------"properties":<br>
     *         -----------{<br>
     *         --------------"cgi":{"type":"text","fields":{"keyword":{"type":"keyword","ignore_above":256}}},<br>
     *         --------------"timestamp":{"type":"long"}<br>
     *         -----------}<br>
     *         --------},<br>
     *         --------"settings":{}<br>
     *         ----}<br>
     *         }<br>
     * @throws 生成XContentBuilder時
     *             ,抛出異常。
     */
    public static XContentBuilder getMapping(String indexType, Map<String, String> schemaColVsTypeMap, Map<String, Map<String, String>> extFields)
            throws IOException {
        XContentBuilder builder = XContentFactory.jsonBuilder()//
                .startObject()//
                .startObject(indexType)//
                .startObject("_all").field("enabled", false).endObject()// 是否包一個row中的所有字段作為一個大的索引字段，支援從所有列中查詢
                // .startObject("_source").field("enabled", false).endObject()// 不可以設為false,否則從es中查不到字段（其屬性決定了那些字段存儲到es，預設所有字段都存儲，也可以通過include,exclude指定特定字段存儲與不存儲）
                // .startObject("_field_names").field("enabled", false).endObject()//
                .startObject("properties");

        for (Map.Entry<String, String> kv : schemaColVsTypeMap.entrySet()) {
            String colName = kv.getKey();
            String colType = kv.getValue();

            // "insert_time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss"},
            // "scan_start_time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss"},
            // "scan_stop_time":{"type":"date","format":"yyyy-MM-dd HH:mm:ss"},
            if (colName.equalsIgnoreCase("scan_start_time")//
                    || colName.equalsIgnoreCase("scan_stop_time")//
                    || colName.equalsIgnoreCase("insert_time")) {
                builder.startObject(colName) //
                        .field("type", "date")//
                        .field("format", "yyyy-MM-dd HH:mm:ss")// 也可以 yyyy/MM/dd||yyyy/MM/dd HH:mm:ss
                        .field("index", "true") // not_analyzed|analyzed
                        .endObject();
            }
            // "city_name":{"type":"text","fields":{"keyword":{"type":"keyword"}}},
            // "province_name":{"type":"text","fields":{"keyword":{"type":"keyword"}}},
            // "region_name":{"type":"text","fields":{"keyword":{"type":"keyword"}}},
            else if (colName.equalsIgnoreCase("city_name")//
                    || colName.equalsIgnoreCase("region_name")//
                    || colName.equalsIgnoreCase("province_name")) {
                builder.startObject(colName).field("type", "keyword").endObject();
            } else {
                if (colType.equalsIgnoreCase("long")) {
                    builder.startObject(colName).field("type", "long").endObject();
                } else if (colType.equalsIgnoreCase("string")) {
                    builder.startObject(colName).field("type", "keyword").endObject();
                } else if (colType.equalsIgnoreCase("double")) {
                    builder.startObject(colName).field("type", "double").endObject();
                } else {
                    builder.startObject(colName).field("type", colType).endObject();
                }
            }
        }

        // 追加擴充字段到mapping字段中
        for (Map.Entry<String, Map<String, String>> kv : extFields.entrySet()) {
            String colName = kv.getKey();
            builder.startObject(colName);

            for (Map.Entry<String, String> kvProperty : kv.getValue().entrySet()) {
                builder.field(kvProperty.getKey(), kvProperty.getValue());
            }
            builder.endObject();
        }

        builder.endObject();// end of properties
        builder.endObject();// end of indexType
        builder.endObject();// end of start

        return builder;
    }
    
    /**
     * 傳回 target columns list,column vs column type map,expression encoder
     * */
    protected Tuple3<List<String>, Map<String, String>, ExpressionEncoder<Row>> getTargetSchema(String schemaFilePath, boolean withTimestamp) {
        Broadcast<String> targetSchemaContent = null;
        try {
            String avroContent = getHdfsFileContent(schemaFilePath);
            targetSchemaContent = JavaSparkContext.fromSparkContext(sparkSession.sparkContext()).broadcast(avroContent);
        } catch (Exception e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        }

        Schema.Parser parser = new Schema.Parser();
        Schema targetSchema = parser.parse(targetSchemaContent.getValue());
        List<String> targetColumns = new ArrayList<String>();
        Map<String, String> targetKeyTypeItems = new LinkedHashMap<String, String>();
        for (Field field : targetSchema.getFields()) {
            targetColumns.add(field.name());
            List<Schema> types = targetSchema.getField(field.name()).schema().getTypes();
            String datatype = types.get(types.size() - 1).getName();
            targetKeyTypeItems.put(field.name(), datatype);
        }

        ExpressionEncoder<Row> encoder = SchemaHelper.createSchemaEncoder(targetSchema, withTimestamp);

        return new Tuple3<List<String>, Map<String, String>, ExpressionEncoder<Row>>(targetColumns, targetKeyTypeItems, encoder);
    }
    
    /**
    * 将schema轉化為Encoder
    */
    protected static ExpressionEncoder<Row> createSchemaEncoder(Schema schema, boolean withTimestamp) {
        StructType type = (StructType) SchemaConverters.toSqlType(schema).dataType();

        if (withTimestamp) {
            List<String> fields = java.util.Arrays.asList(type.fieldNames());
            if (!fields.contains("timestamp")) {
                type = type.add("timestamp", DataTypes.TimestampType);
            } else {
                int index = type.fieldIndex("timestamp");
                StructField field = type.fields()[index];
                type.fields()[index] = new StructField(field.name(), DataTypes.TimestampType, field.nullable(), field.metadata());
            }
        }

        ExpressionEncoder<Row> encoder = RowEncoder.apply(type);

        return encoder;
    }

    /**
    * 讀取hdfs上檔案内容
    */
    protected static String getHdfsFileContent(String filePath){
        String content = "";
        try {
            reader = getHDFSFileReader(filePath);
            String line=null;
            while ((line = reader.readLine()) != null) {
                if (!line.startsWith("#") && line.trim().length() > 0) {
                    content+=line.trim();
                }
            }

            reader.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
            throw new RuntimeException("file not found exception:" + this.avroSchemaPath);
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException("reading file while an error was thrown:" + this.avroSchemaPath);
        } finally {
            if (reader != null) {
                try {
                    reader.close();
                } catch (IOException e1) {
                    e1.printStackTrace();
                }
            }
        }
        
        return content;
    }    
    
    protected static BufferedReader getHDFSFileReader(String hdfsFile) {
        try {
            System.out.println("hdfsfile: " + hdfsFile);
            Path configPath = new Path(hdfsFile);

            FileSystem fs = FileSystem.get(new Configuration());

            if (fs.exists(configPath)) {
                return new BufferedReader(new InputStreamReader(fs.open(configPath)));
            } else {
                throw new FileNotFoundException("file(" + configPath + ") not found.");
            }
        } catch (IOException e) {
            e.printStackTrace();
            throw new RuntimeException(e);
        } finally {
        }
    }

所有代碼都在這裡，具體的不加介紹了。

ES API之Update Index：

/**
     * 修改ES索引的mapping屬性
     * */
    public static boolean indexUpdateMapping(String indexName, String indexType, XContentBuilder builder) {
        org.elasticsearch.action.admin.indices.mapping.put.PutMappingRequest mapping = Requests.putMappingRequest(indexName).type(indexType)
                .source(builder);
        PutMappingResponse pMappingResource = getAdminClient().putMapping(mapping).actionGet();

        return pMappingResource.isAcknowledged();
    }

    /**
     * 修改ES索引的settings屬性<br>
     * 更新索引屬性（更新索引的settings屬性，這是更改已經建立的屬性、但有些一旦建立不能更改，需要按照自己的需求來進行選擇使用）
     * */
    public static boolean indexUpdatSettings(String indexName, Map<String, String> settingsMap) {
        Builder settings = Settings.builder();//
        for (Map.Entry<String, String> kv : settingsMap.entrySet()) {
            settings.put(kv.getKey(), kv.getValue());
        }

        return indexUpdatSettings(indexName, settings);
    }

    /**
     * 修改ES索引的settings屬性<br>
     * 更新索引屬性（更新索引的settings屬性，這是更改已經建立的屬性、但有些一旦建立不能更改，需要按照自己的需求來進行選擇使用）
     * */
    public static boolean indexUpdatSettings(String indexName, Builder settings) {
        UpdateSettingsResponse uIndexResponse = getAdminClient().prepareUpdateSettings(indexName)//
                .setSettings(settings)//
                .execute().actionGet();
        return uIndexResponse.isAcknowledged();
    }

/**
     * 修改索引，修改索引的setting。
     * 
     * @param indexName
     *            索引名稱<br>
     *            如果不需要實時精确的查詢結果，可以把每個索引的index.refresh_interval設定為30s，如果在導入大量的資料，可以把這個值先設定為－1，完成資料導入之後在設定回來<br>
     *            如果在用bulk導入大量的資料，可以考慮不要副本，設定index.number_of_replicas:
     *            0。有副本存在的時候，導入資料需要同步到副本，并且副本也要完成分析，索引和段合并的操作，影響導入性能。可以不設定副本導入資料然後在恢複副本。<br>
     *            <b>注意</b>：<br>
     *            有些屬性一旦建立就不可以修改，比如：index.number_of_shards，修改會抛出異常。
     */
    public static boolean indexUpdateSettings(String indexName) {
        Settings settings = Settings.builder() //
                // .put("index.mapping.ignore_malformed", false)//
                .put("index.refresh_interval", "30s") //
                // .put("index.number_of_shards", 4)//
                .put("index.number_of_replicas", 1)//
                // .put("index.max_result_window", 500000)//
                //
                // .put("index.translog.durability", "async")//
                // .put("index.translog.sync_interval", "120s")//
                // .put("index.translog.flush_threshold_size", "2gb")//
                //
                .put("index.merge.scheduler.max_thread_count", 1)//
                .build();
        return indexUpdatSettings(indexName, settings);
    }

ES API之Delete/Open/Close Index：

/**
     * 删除ES中某個或者多個索引
     * */
    public static boolean indexDelete(String... indices) {
        DeleteIndexResponse dIndexResponse = getAdminClient().prepareDelete(indices).execute().actionGet();
        if (dIndexResponse.isAcknowledged()) {
            System.out.println("删除索引成功");
            return true;
        } else {
            System.out.println("删除索引失敗");
            return false;
        }
    }

    /**
     * 關閉ES中某個或者多個索引<br>
     * curl -XPOST "http://127.0.0.1:9200/indexname/_close"
     * */
    public static boolean indexClose(String... indices) {
        CloseIndexResponse cIndexResponse = getAdminClient().prepareClose(indices).execute().actionGet();
        if (cIndexResponse.isAcknowledged()) {
            System.out.println("關閉索引成功");
            return true;
        }
        return false;
    }

    /**
     * 開啟ES中某個或者多個索引<br>
     * curl -XPOST "http://127.0.0.1:9200/indexname/_open"
     * */
    public static boolean indexOpen(String... indices) {
        OpenIndexResponse oIndexResponse = getAdminClient().prepareOpen(indices).execute().actionGet();
        if (oIndexResponse.isAcknowledged()) {
            System.out.println("開啟索引成功");
            return true;
        }
        return false;
    }

Spark2.2+ES6.4.2（三十二）：ES API之index的create/update/delete/open/close（建立index時設定setting，并建立index後根據avro模闆動态設定index的mapping）

準備工作（建立TransportClient，IndicesAdminClient）

ES API之Exists/Create Index：

如何根據Avro建立動态生成Mapping呢？

ES API之Update Index：

ES API之Delete/Open/Close Index：

繼續閱讀

Spark在windows環境裡跑時報錯找不到org.apache.hadoop.fs.FSDataInputStream

Hadoop FSDataInputStream 和FSDataOutputStream 用法

PHP輔導代做程式設計：CS353 Database System

《Hive權威指南》第八章：HiveQL索引8 HiveQL：索引

Spark流式分析系統實作流式實時日志分析系統

自學Zabbix3.10.2-事件通知Notifications upon events-Actions報警配置點選傳回：自學zabbix集錦

HDU 5678 ztr loves trees

Scala和Java二種方式實戰Spark Streaming開發

拓端tecdat|R語言彈性網絡Elastic Net正則化懲罰回歸模型交叉驗證可視化

二叉樹及其應用--二叉樹建立

Spark基礎:Spark簡介及特點,運作模式,安裝Spark,Driver與Executor,Local模式,Standalone模式,Yarn模式,Mesos模式,WordCount案例,HA配置第1章 Spark概述第2章 Spark運作模式第3章案例實操

Spark實作wordcount

Eclipse運作WordCount（詳細版）相關連接配接Eclipse運作WordCount

大資料排錯SparkSpark叢集啟動時候，JAVA_HOME is not sethadoop叢集，某台伺服器jps無任何輸出IDEAkafkahadoopspark sqlfile permissionsIDEA本地測試 - OutOfMemoryError: GC overhead limit exceededhdfs負載均衡

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

詳解STM32單片機的堆棧