登录
首页 » Others » raw

raw

于 2021-01-06 发布
0 234
下载积分: 1 下载次数: 19

代码说明:

说明:  10个中文分词数据集,用于训练中文分词模型(Ten Chinese Word Segmentation Datasets for Training Chinese Word Segmentation Model)

文件列表:

raw, 0 , 2019-02-10
raw\other, 0 , 2019-02-10
raw\other\zx, 0 , 2019-02-10
raw\other\zx\test.zhuxian.wordpos, 280885 , 2019-02-10
raw\other\zx\train.zhuxian.wordpos, 559793 , 2019-02-10
raw\other\zx\dev.zhuxian.wordpos, 166113 , 2019-02-10
raw\other\cnc, 0 , 2019-02-10
raw\other\cnc\dev.txt, 5581923 , 2019-02-10
raw\other\cnc\train.txt, 44824963 , 2019-02-10
raw\other\cnc\test.txt, 5571735 , 2019-02-10
raw\other\udc, 0 , 2019-02-10
raw\other\udc\dev.conll, 422116 , 2019-02-10
raw\other\udc\test.conll, 400684 , 2019-02-10
raw\other\udc\train.conll, 3282103 , 2019-02-10
raw\other\wtb, 0 , 2019-02-10
raw\other\wtb\dev.conll, 49336 , 2019-02-10
raw\other\wtb\test.conll, 49702 , 2019-02-10
raw\other\wtb\train.conll, 393054 , 2019-02-10
raw\other\sxu, 0 , 2019-02-10
raw\other\sxu\train.txt, 3600697 , 2019-02-10
raw\other\sxu\test.txt, 776035 , 2019-02-10
raw\other\ctb, 0 , 2019-02-10
raw\other\ctb\ctb6.dev.seg, 300375 , 2019-02-10
raw\other\ctb\ctb6.train.seg, 4030528 , 2019-02-10
raw\other\ctb\ctb6.test.seg, 312025 , 2019-02-10
raw\sighan2005, 0 , 2019-02-10
raw\sighan2005\cityu_test_gold.utf8, 239427 , 2019-02-10
raw\sighan2005\msr_training.utf8, 16804586 , 2019-02-10
raw\sighan2005\cityu_training.utf8, 8499903 , 2019-02-10
raw\sighan2005\as_test_gold.utf8, 711891 , 2019-02-10
raw\sighan2005\pku_test_gold.utf8, 716386 , 2019-02-10
raw\sighan2005\as_training.utf8, 30558193 , 2019-02-10
raw\sighan2005\msr_test_gold.utf8, 762801 , 2019-02-10
raw\sighan2005\pku_training.utf8, 7709182 , 2019-02-10

下载说明:请别用迅雷下载,失败请重下,重下不扣分!

发表评论

0 个回复

  • CIPP_JSsetup
    可以实现自动分词功能,支持自动标引,是处理中文自然语言的良好工具(Can achieve automatic word segmentation function, support for automatic indexing is a good tool to deal with Chinese natural language)
    2020-09-24 19:27:48下载
    积分:1
  • HanLP-master
    NamedEntityRecognition github
    2018-01-31 01:47:04下载
    积分:1
  • tranditionized
    中文简繁转换 GreenBrowser/TheWorld2.0插件(Tranditional Chinese Script Conversion GreenBrowser/TheWorld2.0 Plug-in)
    2010-02-24 19:20:05下载
    积分:1
  • wordsegmentation
    一种基于自动机的分词方法,可进行中文分词及统计(Based method of automatic machine word)
    2011-09-21 11:38:57下载
    积分:1
  • Leza
    it s a good code for troias project
    2009-06-04 06:50:59下载
    积分:1
  • 4305685
    应用中文分词源码程序,结合易语言模块彗星HTTP应用模块.ec,实现中文分词的效果。(Application of Chinese Word source program, combined with easy language module Comet HTTP application modules .ec, realize the effect of the Chinese word .)
    2017-01-11 23:13:31下载
    积分:1
  • ViewPage
    联系人拖动后动态显示滑动到的拼音的首字母(Dynamic display after the first letter of the sliding contact to drag Pinyin)
    2014-01-11 18:14:24下载
    积分:1
  • pipe
    这可是全球著名IT公司ILog的APS高级排产优化引擎,就连SAP、Oracle等ERP中的物料需求计划与生产计划算法都来源于ILog。我研究了好久,中间的性线求解算法可真谓难呀。(This is the world s leading IT companies ILog the APS Senior Scheduling optimization engine, and even SAP, Oracle and other ERP s MRP and production planning algorithm are derived from the ILog. I have studied for a long time, Central and line algorithm that can be really difficult for me.)
    2008-04-27 23:08:23下载
    积分:1
  • 201411149222244
    随便下载一篇中文的文本文档,通过这个程序可以将文档进行分词处理,还能够统计词语出现的次数(To download a Chinese text documents, through this program can be word processing document, will also be able to statistics the number of occurrences of words and phrases)
    2015-10-23 10:53:54下载
    积分:1
  • GB2312
    列出gbk2312中的每一个字符,并给出对应的编号(Listed gbk2312 in each character, and gives the corresponding number)
    2012-07-04 16:07:46下载
    积分:1
  • 696518资源总数
  • 104228会员总数
  • 45今日下载