java 交叉验证CrossValidation 完整版设计

2023-04-10 05:50:53

一、认识

交叉验证(Cross-Validation): 有时亦称循环估计，是一种统计学上将数据样本切割成较小子集的实用方法。于是可以先在一个子集上做分析，而其它子集则用来做后续对此分析的确认及验证。一开始的子集被称为训练集。而其它的子集则被称为验证集或测试集。WIKI

交叉验证对于人工智能，机器学习，模式识别，分类器等研究都具有很强的指导与验证意义。

基本思想是把在某种意义下将原始数据(dataset)进行分组,一部分做为训练集(train set),另一部分做为验证集(validation set or test set),首先用训练集对分类器进行训练,在利用验证集来测试训练得到的模型(model),以此来做为评价分类器的性能指标.

二、设计

package recomendation;

//交叉验证
public class CrossValidation {
	
	/**
     * The number of rounds of cross validation.交叉验证的轮数。
     */
    public final int k;
    /**
     * The index of training instances.训练实例的索引。
     */
    public final int[][] train;
    /**
     * The index of testing instances.
     */
    public final int[][] test;

    /**
     * Constructor.构造函数。
     * @param n the number of samples.样本数。
     * @param k the number of rounds of cross validation.交叉验证的轮数
     */
    public CrossValidation(int n, int k) {
        if (n < 0) {
            throw new IllegalArgumentException("Invalid sample size: " + n);//样本数量无效
        }

        if (k < 0 || k > n) {
            throw new IllegalArgumentException("Invalid number of CV rounds: " + k);//无效
        }

        this.k = k;
        
        int[] index = new int[n];

        // insert integers 0..n-1
        for (int i = 0; i < n; i++)
            index[i] = i;

        // shuffle  ,to create permutation of array随机，以创建数组的排列
        for (int i = 0; i < n; i++) {
            int r = (int) (Math.random() * (i+1));     // int between 0 and i  //0至i之间的int
            int swap = index[r];
            index[r] = index[i];
            index[i] = swap;
        }
        
        train = new int[k][];//训练集
        test = new int[k][];//测试集

        int chunk = n / k;
        for (int i = 0; i < k; i++) {
            int start = chunk * i;
            int end = chunk * (i + 1);
            if (i == k-1) end = n;

            train[i] = new int[n - end + start];
            test[i] = new int[end - start];
            for (int j = 0, p = 0, q = 0; j < n; j++) {
                if (j >= start && j < end) {
                    test[i][p++] = index[j];
                } else {
                    train[i][q++] = index[j];
                }
            }
        }
    }
}

java 交叉验证CrossValidation 完整版设计

继续阅读

关于Gradle配置的小结

Java小案例——随机数猜测随机数猜测

nginx location中斜线的位置的重要性

27 Best Free Eclipse Plug-ins for Java Developer to be ProductiveCode Quality PluginsText Editor PluginsDependency ManagementVersion Control Integration PluginsFramework Development Continuous Integration Related PluginsOther Utility Plugins

Java String.format方法的简单使用

neo4j之cypher使用文档

GitHub连夜封杀！这份阿里 10W 字内部 Java 字面试手册到底有多强？

spark/scala关于【资源文件】加载方法概述外部文件加载方案测试资源文件打包入jar包中小结

mybatis_入门程序Mybatis入门

AOP编程_Android优雅权限框架(1)概念基础，2021金三银四前言正文大纲正文

Effective Java 8:通用程序设计

OOM三种类型

工厂模式-三种类型

【递归】高效率求2的n次幂

win10本地scala和spark安装安装scala安装spark

scala (3) Function 和 Method