IT数码 购物 网址 头条 软件 日历 阅读 图书馆
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
图片批量下载器
↓批量下载图片,美女图库↓
图片自动播放器
↓图片自动播放器↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁
 
   -> 人工智能 -> 用Python生成人人贷借款理由词云图 -> 正文阅读

[人工智能]用Python生成人人贷借款理由词云图

作者:https://csdnimg.cn/release/blogv2/dist/components/js/pc_wap_commontools-fbe92c420e.min.js

目录

一、写在前面

二、关于代码

三、一些词云图

3.1 筛选条件:无

3.2 筛选条件:性别-男

3.3 筛选条件:性别-女

3.4 筛选条件:教程程度-研究生及其以上

3.5 筛选条件:教育程度-本科?

3.6 筛选条件:籍贯-福建

3.7 筛选条件:籍贯-广东

3.8 筛选条件:借款理由-含“苹果”两字

?四、代码

4.1 导入库

4.2 导入数据

4.3 设置停用词

4.4 生成词云图代码

五、写在最后


一、写在前面

关于人人贷的历史博文:人人贷散标爬虫实例_小zhan柯基-CSDN博客_人人贷爬虫人人贷散标爬虫实例进阶-使用异步io_小zhan柯基-CSDN博客用python处理28万条人人贷数据,告诉你最详细的借款人结构分布情况_小zhan柯基-CSDN博客

上篇关于人人贷的博文中提到3点,一是可以继续挖掘数据,比如分析各个年龄段的学历分布什么的;二是可以利用人人贷的数据训练信用评价的神经网络模型;三是可以利用借款理由这一列数据生成词云图。

由于最近忙着进行区块链与供应链金融的相关研究,所以这次就先挑软柿子捏吧,生成个词云图还是特别快的。

最后,有需要人人贷贷款数据的私信我!

二、关于代码

生成词语图的方法就不赘述啦哈,网上一搜教程一大堆,例如Python制作炫酷的词云图(包含停用词、词频统计)!!!_gjgfjgy的博客-CSDN博客_停用词分析、绘制词云图
EDG夺冠,用Python分析一波:粉丝都炸锅了_数据分析与统计学之美-CSDN博客
这里提一点关于

pandas一个比较常用的用法:筛选包含某个关键词的行/列!

首先数据如上图,共含有284316条借款理由的数据,如果我要找出借款理由里含有“苹果”两个字的数据应该怎么做呢?

conciseData[conciseData["借款理由"].str.contains("苹果",na=False)]["借款理由"]

从上图可以看出,借款买苹果手机的数据只有646条,占比0.23%,看来买借款买苹果手机的并不多哈哈哈哈。

三、一些词云图

3.1 筛选条件:无

3.2 筛选条件:性别-男

3.3 筛选条件:性别-女

3.4 筛选条件:教程程度-研究生及其以上

3.5 筛选条件:教育程度-本科?

3.6 筛选条件:籍贯-福建

3.7 筛选条件:籍贯-广东

3.8 筛选条件:借款理由-含“苹果”两字

?四、代码

4.1 导入库

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
            
import matplotlib.ticker as ticker
import mpl_toolkits.axisartist as AA
from mpl_toolkits.axisartist.axislines import SubplotZero
import pylab

import jieba   
from wordcloud import WordCloud

pylab.mpl.rcParams['font.sans-serif'] = ['SimHei'] #显示中文
plt.rcParams['axes.unicode_minus']=False  #用于解决不能显示负号的问题

4.2 导入数据

data = pd.read_csv("all.csv",encoding="gbk",header=None,parse_dates=True)
data.columns = ["id","借款时间(月)","剩余还款时间(月)","借款金额","notPayInterest","productRepayType",
               "贷款类型","利率","性别","籍贯","出生日期","教育程度","工作单位","行业","公司规模","职位","收入",
               "车贷","汽车数量","婚姻状况","房贷","房子数量","信用等级","none","none","none","借款理由"]

conciseData = data[["id","借款时间(月)","剩余还款时间(月)","借款金额","贷款类型","利率","性别","籍贯","出生日期","教育程度","工作单位","行业","公司规模","职位","收入",
               "车贷","汽车数量","婚姻状况","房贷","房子数量","信用等级","借款理由"]]
conciseData = conciseData.set_index("id")
conciseData = conciseData.dropna(how="all")

4.3 设置停用词

stopWords = ["人人","真实有效","同时","符合","借款人","提供","上述","考察","实地",
    "已经","希望","大家","认证","审核","此次","公司","众信","借款","谢谢","比较","第一次","压力",
        "贷","的","标准","方友","业","还款","收入","用于","信息","以上","问题","好","一下","通过",
            "稳定","全国","企业","位于","该","为","自己","现居","工作","单位","但","高","一些","还清",
                "行业","主要","从事","有","无","良好","贷款","累计","自","放心","家里","吱吱","为了","放款",
                    "多","在","年","所","抵押","无担保","服务","本人","多多","小额贷款","想","与","借","给","建立"
                        "支持","至今","安信","良好","最","多","探索","大","小","证大速贷","成立","于","信用","成立",
                            "每月","流水","一家","因为","我","和","是","做","所以","迅速","以来","需"
                                "快速","简便","可以","专门","资料","经","了","也","现在","由于",
                                    "测试","需要","元","也","还","个","月","人","申请","等",
                                        "能","了","及","没有","现在","就","进行","都","各位","急急",
                                            "每个","准备","有限公司","目前","保证","按时","因","可","持续","一个",
                                                "上","到","万","要","现","来","想","个人","左右","不","年底","能力",]

4.4 生成词云图代码

由于28W条数据过多,此处采用步距为3对数据切片!

txt = ""

for each in conciseData[conciseData["性别"]=="男"]["借款理由"][::3]:
    if isinstance(each,str):
            txt += each + "  "
            
words = jieba.cut(txt) #分词

result = ""
for each in words:
    if each not in stopWords:
        result += each + " "
        

wordshow = WordCloud(background_color='black',
                     width=800,
                     height=800,
                     max_words=800,
                     max_font_size=100,
                     font_path="msyh.ttc",    
                     ).generate(result)

wordshow.to_file('男.png')

五、写在最后

众生皆苦,不止你一个,放下即是自在。

  人工智能 最新文章
2022吴恩达机器学习课程——第二课(神经网
第十五章 规则学习
FixMatch: Simplifying Semi-Supervised Le
数据挖掘Java——Kmeans算法的实现
大脑皮层的分割方法
【翻译】GPT-3是如何工作的
论文笔记:TEACHTEXT: CrossModal Generaliz
python从零学(六)
详解Python 3.x 导入(import)
【答读者问27】backtrader不支持最新版本的
上一篇文章      下一篇文章      查看所有文章
加:2021-11-17 12:45:44  更:2021-11-17 12:46:05 
 
开发: C++知识库 Java知识库 JavaScript Python PHP知识库 人工智能 区块链 大数据 移动开发 嵌入式 开发工具 数据结构与算法 开发测试 游戏开发 网络协议 系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑 笔记本 显卡 显示器 固态硬盘 硬盘 耳机 手机 iphone vivo oppo 小米 华为 单反 装机 图拉丁

360图书馆 购物 三丰科技 阅读网 日历 万年历 2024年11日历 -2024/11/28 2:29:10-

图片自动播放器
↓图片自动播放器↓
TxT小说阅读器
↓语音阅读,小说下载,古典文学↓
一键清除垃圾
↓轻轻一点,清除系统垃圾↓
图片批量下载器
↓批量下载图片,美女图库↓
  网站联系: qq:121756557 email:121756557@qq.com  IT数码