WeChat-Vote-Result-Walker

对微信（WeChat）公众号文章内置投票插件的实时数据抓取的Python爬虫

#Before Use

dowload and install PhantomJS and start PhantomJS with:

$ phantomjs phantomjs_fetcher.js [port]

BTW, if your are in MacOS enviroment, you can easyily Unzip phantomjs-2.1.1-macosx.zip in

current directory, you can find an executable file in bin sub-directory.

install python dependences:

tornado:

$ pip install tornado

pycurl:

$ pip install pycurl

I really recommend you use VirtualEnv to install the dependences, easy, effective and clean!

#How to use

firstly, run PhantomJS:

$ phantomjs phantomjs_fetcher.js [port] , you can use a port like '12306', when you run this

command in terminal, you may get something like a dialog, showing you that this action need a

permission to connect to internet. Just permit it.

change the port in the test.py python file:

Line 11 in test.py, change the proxy url correctly with the port you set in step 1.

run my main python file: test.py in another window of terminal:

$ python run test.py

All Done!

#Docs

首先这是为了解决什么的需求呢？微信现在公众号文章中自带投票插件，但是存在一些问题，比如当有人刷票时，你无法确定究竟是

谁刷票的，因为后台数据只能看到投票的一些数据，但是没有投票数据增长曲线。对于一些明显的刷票行为，比如10min内暴涨1000票

这种的，其实你是没办法判断的，要么只能平时一只监控数据，截图统计数据增长模式，要么就只能发呆了。

所以这个repo是为了解决微信公众号推文内置vote插件的数据实时爬取问题，其实主要是其中解决问题的思想需要记录一下。

1.首先我们确定我们需要爬取的数据，以文章投票 | GamTalk演讲大赛作品公示欢迎投票！为例进行讲解。

我们发现，其实对于urlhttp://mp.weixin.qq.com/s?__biz=MzIwMjcxMDkzNQ==&mid=2247484332&idx=1&sn=ee27b7ae03c9d881418615c96069c3fe&chksm=96dbc954a1ac40421777f00571982fb7b9ceab7c0a2fab6b9e470e9ef423ae33aa39702d196b&mpshare=1&scene=1&srcid=0312k3JRDNp5tZYLH3ZtbVGQ#rd 来说

可以直接在chrome中进行浏览，只是我们由于没有在微信浏览器中，我们的投票已经被禁止了：

但是不慌，其实我们已经拿到投票数据了，只是被微信隐藏掉了。右键 inspect，查看源码：

然而很尴尬的是，查看network下的请求的response，发现其实我们直接用上面那个url，是没有拿到数据的，果然微信是异步加载的投票插件，又是后期js渲染的：

所以查看了后期那大段耗时操作，很容易找到了对应的插件的真实url：

那么这个时候获取投票插件的真实url应该是http://mp.weixin.qq.com/mp/newappmsgvote?action=show&__biz=MzIwMjcxMDkzNQ==&supervoteid=444598550&uin=&key=&pass_ticket=&wxtoken=&mid=2247484332&idx=1:

但是这个时候问题又来了，虽然这个url是对的，但是查看对应response并不是数据，又是一大堆js。然而你把这个url直接用浏览器访问的时候，确实直接渲染出投票插件的：

那么说明是js渲染的，然而我比较懒，懒得去读js代码了，所以直接找个可以渲染成html的包最开心啦～（虽然后面证明自己懒造成了好多不良结果）

那么能渲染js的话，就不能简简单单urllib get response了，突然发现了神器PhantomJS，碰巧有个po主结合tornado和PhantomJS，实现了一个直接fetch的小模块，方便多了hhh

果断参考 Python利用Phantomjs抓取渲染JS后的网页这个文章，和对应repo PhantomjsFetcher ，分分钟撸个代码出来，也就是test.py

在po主demo的例子上，加入了timer，定时循环调用的我的爬取函数，同时按照时间依次存储每次的结果（目前是10s左右调用一次），同时改变一下UA，强行伪装一下，终于撸完了

本以为成功的交差了，尼玛，坑爹呢这是！

让我们仔细的再看一遍撸到的真实网址，突然发现：1.微信你不是取消了必须关注微信号才能投票这个功能么，为什么还有判断（我们要吸粉啊啊）2.这个voteInfo。。。包含了所有的投票结果数据。。。

坑爹呢这是，早知道仔细看代码了！

总结：其实最大的收获是，找到了一个自动渲染js的python轮子，2.不看代码真坑爹 😢

更新

新添加了一个画图代码，将所有处理的html文件，处理成dict，并存储在当前目录下的result.txt中。同时利用matplotlib进行画图，至于import的numpy的包，可以删掉

安装依赖：

$ pip install matplotlib

run：

$ python fetch_data.py

其中有个坑就是matplotlib对中文支持不好，所以全部设置成unicode的了，最后结果如下图，利用matplotlib导出的图片文字不显示出来，貌似是因为没有设置字体，所以直接截图了：

文件介绍：

fetch_data.py : 处理当前目录类似于“vote_*.html”的文件，抽取投票信息，绘图

FileUtl.py : 扫描列举当前目录下所有文件及文件夹的工具类

#Thanks to

License

LICENSE.MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
doc		doc
vote		vote
.gitignore		.gitignore
FileUtl.py		FileUtl.py
LICENSE		LICENSE
README.md		README.md
SimHei.ttf		SimHei.ttf
__init__.py		__init__.py
example.json		example.json
fetch_data.py		fetch_data.py
phantomjs-2.1.1-macosx.zip		phantomjs-2.1.1-macosx.zip
phantomjs_fetcher.js		phantomjs_fetcher.js
test.html		test.html
test.py		test.py
tornado_fetcher.py		tornado_fetcher.py
vote.jpg		vote.jpg
vote_2017_03_12__11_17_25.html		vote_2017_03_12__11_17_25.html
vote_2017_03_12__11_17_29.html		vote_2017_03_12__11_17_29.html
vote_2017_03_12__11_17_32.html		vote_2017_03_12__11_17_32.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WeChat-Vote-Result-Walker

#Before Use

#How to use

#Docs

更新

#Thanks to

License

About

Releases

Packages

Languages

License

MollyBa/WeChat-Vote-Result-Walker

Folders and files

Latest commit

History

Repository files navigation

WeChat-Vote-Result-Walker

#Before Use

#How to use

#Docs

更新

#Thanks to

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages