cut_sample
代码说明:
说明: 使用二分法对样本集进行剪辑,剪辑法的思想,就是将样本集分成训练集与考试集, 利用训练集样本对考试集的样本进行分类(使用近邻法),如果考试集中某个样本分类错误的话,将这个样本删除。在该函数中,使用最近邻法,只进行一次剪辑(遍历完考试集中的样本以后退出)。还有一种重复剪辑法(适用于样本比较多的情况),把样本随机分为多个样本集,将相邻的两个样本前一个作为考试集,后一个作为训练集,调用二分剪辑。所有的样本子集剪辑完毕以后,在递归调用,直到没有样本被剪辑掉,没得讲,重复剪辑的效果肯定好一些。 (The sample sets using the dichotomy of the clips, editing law idea is to sample set into training set and test set, using the training set samples to the test set of samples (using the nearest neighbor algorithm), if the examination focused on a sample of classification error then delete this sample. In the function, use the nearest neighbor method, only once clips (focused on a sample of traversal finished after the exit exam). There is also a repeat clip method (samples are more applicable to the case), the samples were randomly divided into multiple sample set, the two adjacent samples of the former one as a test set, the latter one as a training set, call the two sub-clips. All the samples were a subset of the clips after they have finished, in the recursive call, until there are no samples were clip out, did not have to say that the effect of repeated clips definitely better.)
下载说明:请别用迅雷下载,失败请重下,重下不扣分!