天天看点

微博数据处理——处理僵尸用户数据集(二)1.数据集展示2.处理数据集3.展示结果

获取僵尸用户数据集可见上文:https://blog.csdn.net/weixin_43906500/article/details/116447858

1.数据集展示

僵尸数据集如下所示

微博数据处理——处理僵尸用户数据集(二)1.数据集展示2.处理数据集3.展示结果

2.处理数据集

编写处理代码,将所需训练数据保存为csv格式

代码如下:

import json
import csv

csvfile = open("csv_test.csv","w",newline = "")
writer = csv.writer(csvfile)
writer.writerow(["uid","follow_num","fun_num,post_num","is_brief",
                 "like_picture_num","like_music_num","like_movie_num","like_post_num","mark"])
f = open("../users_data/dic_marked_5737286648.json","r")
str = f.read()
data = json.loads(str)
f1 = open("../users_data/dic_marked_6878691599.json","r")
str1 = f1.read()
data1 = json.loads(str1)
data.extend(data1)
for item in data:
    uid = item["uid"]
    follow_num = item["follow_num"]
    fun_num = item["fun_num"]
    post_num = item["post_num"]
    brief = item["brief"]
    if(brief==""):
        is_brief = 0
    else:
        is_brief = 1
    like_picture_num = item["like_picture_num"]
    like_music_num = item["like_music_num"]
    like_movie_num = item["like_movie_num"]
    like_post_num = item["like_post_num"]
    mark = item["mark"]
    writer.writerow([uid,follow_num,fun_num,post_num,is_brief,like_picture_num,like_music_num,like_movie_num,like_post_num,mark])
    print(item)
           

3.展示结果

处理后的结果如下所示

微博数据处理——处理僵尸用户数据集(二)1.数据集展示2.处理数据集3.展示结果

继续阅读