登录
首页 » Others » raw

raw

于 2021-01-06 发布
0 249
下载积分: 1 下载次数: 19

代码说明:

说明:  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • 共现矩阵
    说明:  将高维数据组转换为二维数据组,方便数据处理工作人员的数据分析,并包含自然语言处理(The multi-dimensional co-occurrence matrix is transformed into two-dimensional array form, and the high-dimensional data group is transformed into two-dimensional data group, which is convenient for data processing staff to analyze data, and includes natural language processing)
    2020-07-02 16:56:12下载
    积分:1
  • lucene
    java中lucene的源代码,用于文本分类的一个很好的工具,是由一个著名的语言研究者编写的(lucene code for java)
    2009-03-30 17:28:22下载
    积分:1
  • tranditionized
    中文简繁转换 GreenBrowser/TheWorld2.0插件(Tranditional Chinese Script Conversion GreenBrowser/TheWorld2.0 Plug-in)
    2010-02-24 19:20:05下载
    积分:1
  • Reader
    在中文分词之前,要对文档进行读取,本代码是实现了从磁盘读取的任务。(In the Chinese word prior to reading the document, the code is read from the disk to achieve the task.)
    2013-09-10 11:09:28下载
    积分:1
  • HanLP-master
    NamedEntityRecognition github
    2018-01-31 01:47:04下载
    积分:1
  • JAVAe-book
    MVC构架,JAVA电子留言簿,又喜欢的可以下载(MVC framework, JAVA E-book, but also like to download)
    2008-05-14 13:23:49下载
    积分:1
  • ppp
    说明:  各种去电离层相位污染算法的比较,文章提到了各种不同的算法以及不同算法的性能比较包括PWVD 最大熵法,相位分段多项式法等(Comparison and anyalysis of ionospheric phase decontamination methods for backscattered signals)
    2009-08-14 12:51:39下载
    积分:1
  • 4305685
    应用中文分词源码程序,结合易语言模块彗星HTTP应用模块.ec,实现中文分词的效果。(Application of Chinese Word source program, combined with easy language module Comet HTTP application modules .ec, realize the effect of the Chinese word .)
    2017-01-11 23:13:31下载
    积分:1
  • PC2MDB_JAVA
    The MDB interface brings the information technology and vending machine technology together. At the vending machine interface connector are four serial ports to connect any combination of vending machine controllers and MDB peripherals. It offers a MDB master port, MDB slave port and two serial RS-232 ports.
    2017-06-13 05:06:43下载
    积分:1
  • ACWPS
    词是最小的能够独立活动的有意义的语言成分。 但汉语是以字为基本的书写单位,词语之间没有明显的区分标记,因此,中文词语分析是中文信息处理的基础与关键。(The word is the smallest independent activities meaningful language component. But Chinese is the word as the basic unit of writing, there is no obvious mark of distinction between the words, so Chinese word analysis is the foundation of Chinese information processing and critical.)
    2013-04-03 10:22:22下载
    积分:1
  • 696518资源总数
  • 104287会员总数
  • 49今日下载