java 交叉驗證CrossValidation 完整版設計

2023-04-10 05:50:53

一、認識

交叉驗證(Cross-Validation): 有時亦稱循環估計，是一種統計學上将資料樣本切割成較小子集的實用方法。于是可以先在一個子集上做分析，而其它子集則用來做後續對此分析的确認及驗證。一開始的子集被稱為訓練集。而其它的子集則被稱為驗證集或測試集。WIKI

交叉驗證對于人工智能，機器學習，模式識别，分類器等研究都具有很強的指導與驗證意義。

基本思想是把在某種意義下将原始資料(dataset)進行分組,一部分做為訓練集(train set),另一部分做為驗證集(validation set or test set),首先用訓練集對分類器進行訓練,在利用驗證集來測試訓練得到的模型(model),以此來做為評價分類器的性能名額.

二、設計

package recomendation;

//交叉驗證
public class CrossValidation {
	
	/**
     * The number of rounds of cross validation.交叉驗證的輪數。
     */
    public final int k;
    /**
     * The index of training instances.訓練執行個體的索引。
     */
    public final int[][] train;
    /**
     * The index of testing instances.
     */
    public final int[][] test;

    /**
     * Constructor.構造函數。
     * @param n the number of samples.樣本數。
     * @param k the number of rounds of cross validation.交叉驗證的輪數
     */
    public CrossValidation(int n, int k) {
        if (n < 0) {
            throw new IllegalArgumentException("Invalid sample size: " + n);//樣本數量無效
        }

        if (k < 0 || k > n) {
            throw new IllegalArgumentException("Invalid number of CV rounds: " + k);//無效
        }

        this.k = k;
        
        int[] index = new int[n];

        // insert integers 0..n-1
        for (int i = 0; i < n; i++)
            index[i] = i;

        // shuffle  ,to create permutation of array随機，以建立數組的排列
        for (int i = 0; i < n; i++) {
            int r = (int) (Math.random() * (i+1));     // int between 0 and i  //0至i之間的int
            int swap = index[r];
            index[r] = index[i];
            index[i] = swap;
        }
        
        train = new int[k][];//訓練集
        test = new int[k][];//測試集

        int chunk = n / k;
        for (int i = 0; i < k; i++) {
            int start = chunk * i;
            int end = chunk * (i + 1);
            if (i == k-1) end = n;

            train[i] = new int[n - end + start];
            test[i] = new int[end - start];
            for (int j = 0, p = 0, q = 0; j < n; j++) {
                if (j >= start && j < end) {
                    test[i][p++] = index[j];
                } else {
                    train[i][q++] = index[j];
                }
            }
        }
    }
}

java 交叉驗證CrossValidation 完整版設計

繼續閱讀

關于Gradle配置的小結

Java小案例——随機數猜測随機數猜測

nginx location中斜線的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的簡單使用

neo4j之cypher使用文檔

GitHub連夜封殺！這份阿裡 10W 字内部 Java 字面試手冊到底有多強？

spark/scala關于【資源檔案】加載方法概述外部檔案加載方案測試資源檔案打包入jar包中小結

mybatis_入門程式Mybatis入門

AOP程式設計_Android優雅權限架構(1)概念基礎，2021金三銀四前言正文大綱正文

Effective Java 8:通用程式設計

OOM三種類型

工廠模式-三種類型

【遞歸】高效率求2的n次幂

win10本地scala和spark安裝安裝scala安裝spark

scala (3) Function 和 Method