Archives:

中文分词工具的比较

五款中文分词工具的比较,尝试的有jieba,SnowNLP,thulac(清华大学自然语言处理与社会人文计算实验室),StanfordCoreNLP,pyltp(哈工大语言云),环境是Win10,anaconda3.7 安装 Jieba: pip install jieba SnowNLP: pip install snownlp thulac: pip install thulac StanfordCoreNLP: pip install stanfordcorenlp 下载CoreNLP并解压,将中文包下载并解压至CoreNLP文件夹 pyltp: pip install pyltp,安装失败提示c++14 missing,手动编译失败,换成centos安装依然失败,最终因为安装太麻烦放弃 运行 a = ‘Jimmy你怎么看’ import jieba.posseg as pseg ws = pseg.cut(a) for i in ws:     print(i) import thulac thu1 = thulac.thulac() text = thu1.cut(a) print(text) from stanfordcorenlp import StanfordCoreNLP nlp • Read More »