首先是安装,看了网上各种教程,需要先按顺序安装numpy、scipy以及smartopen,最后才是gensim,另外有博主说numpy需要mkl版本。
不过我自己电脑上已经有各种所需要的库了,直接pip install gensim就行了。
中途碰到过问题:
①模型训练参数没有“size”的属性,目前是采取去掉这一参数
②gensim导入出现scipy报错:cannot import name '_ccallback_c' from 'scipy._lib';反复卸载重装都没用,最终将E盘(Python所在的盘)中安装的上述三个库全部卸载,同时保有对应Python project中虚拟环境中三个库,就能好好运行了。
简单使用:
from gensim.models import word2vec
import gensim
sentences = word2vec.LineSentence("E:\文档\研二\小论文\医药制造业\医药制造业年报gensim整合.txt")
model = word2vec.Word2Vec(sentences, hs=1, min_count=1, window=3)
model.save('model') # 保存模型
model = word2vec.Word2Vec.load('model') # 加载模型
for val in model.wv.similar_by_word("化工企业", topn=100):
val_list = [val]
print(val_list)
pass
最终得到我想要的结果:“化工企业”对应的前100个相似词
[('具体意见', 0.6623771786689758)]
[('钙胺', 0.6078303456306458)]
[('环丙沙星', 0.5947628021240234)]
[('齐飞', 0.5804009437561035)]
[('或甲氧苄', 0.5790262222290039)]
[('安乃近', 0.5770139098167419)]
[('母仔', 0.5748052597045898)]
[('恒大', 0.5634693503379822)]
[('肌松药', 0.5609520077705383)]
[('地瑞', 0.5579310059547424)]
[('有赖于', 0.5578207969665527)]
[('.%.%.%.%.%', 0.546883761882782)]
[('相互合作', 0.5424500107765198)]
[('新颖', 0.5307881832122803)]
[('西莱美片', 0.530393660068512)]
[('内多', 0.5262453556060791)]
[('工作思路', 0.5248793959617615)]
[('宝贵财富', 0.5232816338539124)]
[('芙朴', 0.5214694738388062)]
[('吴以', 0.5199228525161743)]
[('右佐匹', 0.5179560780525208)]
[('样板工程', 0.5154135227203369)]
[('内外科', 0.5133787393569946)]
[('铬', 0.5131563544273376)]
[('矢志不移', 0.5130568146705627)]
[('明白', 0.5120072364807129)]
[('活酶', 0.5114578008651733)]
[('转折点', 0.5108917355537415)]
[('创收', 0.5102124810218811)]
[('推力', 0.5097854137420654)]
[('以商', 0.5085864067077637)]
[('重报', 0.5077459812164307)]
[('引进技术', 0.5050455927848816)]
[('车间主任', 0.5039330720901489)]
[('百余年', 0.5008108019828796)]
[('肌松', 0.5000325441360474)]
[('立足点', 0.49821069836616516)]
[('装车', 0.4976477026939392)]
[('吡嗪', 0.49611279368400574)]
[('天济嘉鑫', 0.4932640492916107)]
[('证明文件', 0.4918726086616516)]
[('重要文件', 0.4908128082752228)]
[('卡马西平', 0.49078837037086487)]
[('片未', 0.4895298480987549)]
[('发粒', 0.48796600103378296)]
[('肝贝科能', 0.4866732954978943)]
[('进他', 0.4864782691001892)]
[('前三大', 0.4860941469669342)]
[('孕中', 0.48556211590766907)]
[('响水', 0.4852599501609802)]
[('胃肠炎', 0.4847312867641449)]
[('韦仑', 0.48326951265335083)]
[('长期性', 0.48318877816200256)]
[('原名', 0.4831697642803192)]
[('糖衣', 0.48189377784729004)]
[('救人', 0.4817085266113281)]
[('不以', 0.48155736923217773)]
[('招股', 0.48098814487457275)]
[('大禹', 0.48022300004959106)]
[('公楼', 0.4799407422542572)]
[('皮肤科', 0.47966402769088745)]
[('AG', 0.4795415997505188)]
[('脉冲', 0.4790913164615631)]
[('文飞', 0.4769313633441925)]
[('五官科', 0.47676241397857666)]
[('抗艾', 0.47615835070610046)]
[('奥通', 0.4759964942932129)]
[('OneStepOvulationUrineTest', 0.47592708468437195)]
[('妇', 0.474235475063324)]
[('代言人', 0.4742341935634613)]
[('止损', 0.47192856669425964)]
[('硫唑嘌呤', 0.4717518985271454)]
[('交房', 0.47154587507247925)]
[('围着', 0.4698885977268219)]
[('东指', 0.46862363815307617)]
[('版起', 0.46847641468048096)]
[('战略意义', 0.4684240520000458)]
[('壅', 0.46834033727645874)]
[('天伟', 0.4683018922805786)]
[('推介会', 0.4680977463722229)]
[('苯丙氨酸', 0.4680188000202179)]
[('比中', 0.4679011404514313)]
[('天利应', 0.4678548574447632)]
[('司太立', 0.4677242338657379)]
[('附加税', 0.4671434164047241)]
[('天舒片', 0.46707549691200256)]
[('紧紧抓住', 0.46669015288352966)]
[('或服', 0.4663790464401245)]
[('同防', 0.4661034047603607)]
[('比较突出', 0.46559906005859375)]
[('兴医', 0.4654051959514618)]
[('WondfoCocaineUrine', 0.4652007222175598)]
[('出口商', 0.46511998772621155)]
[('昔洛', 0.4649103581905365)]
[('阵列', 0.46476802229881287)]
[('恋康', 0.4641551077365875)]
[('优良传统', 0.46265724301338196)]
[('兴钱', 0.4626224637031555)]
[('尼群地平', 0.4624794125556946)]
[('已登记', 0.4621630012989044)]
|