Preface
Recently, I’ve been working on surveillance-related packages and found that many scripts are based on Python. I heard its great name a long time ago, life is short, I learn Python, this is not a joke. With the rise of Artificial Intelligence, Machine Learning, Deep Learning, most of the AI code on the market is written in Python. So in the age of artificial intelligence, it’s time to learn some Python.
**Advancement Guide
For those who don’t have any language development experience, it is recommended to learn it systematically from the beginning, whether it is a book, video or text tutorial.
For students with development experience in other languages, it is recommended to start with a case study, such as crawling a set of images from a certain website.
Because the language is figured out, grammar and so on, as long as you want to sense of language, the code can basically read a eight or nine.
So it is not recommended that experienced developers learn from scratch, whether it is a video or a book, for the beginning of learning a language is too much time.
Of course, when you go deeper into it, you still need to learn it systematically, which is an afterthought.
Software tools*
Python3
The latest version of Python 3.7.1 is chosen here.
Recommended installation tutorial:
http://www.runoob.com/python3/python3-install.html
Win Download Address:
https://www.python.org/downloads/windows
Linux download address:
https://www.python.org/downloads/source
PyCharm
Visualization development tools:
http://www.jetbrains.com/pycharm
Cases
Realization steps
Take the girl picture as an example, it is actually very simple, divided into the following four steps:
- Get the number of pages on the home page, and create a folder corresponding to the page number
- Get the column address of the page
- into the column, get the column page number (each column has multiple pictures under the page display)
- get to the columns under the pair of tags in the picture and download
**Note **
Crawling process, also need to pay attention to the following points, may be helpful to you:
- guide library, in fact, it is similar to the framework or tools in Java, the bottom are encapsulated
Installation of third-party libraries
1 | # Win下直接装的 python3 |
Importing third-party libraries
1 | # 导入requests库 |
2)Define the method function, a crawler may be several hundred lines, so try not to write a bunch of
1 | def download(page_no, file_path): # 这里写代码逻辑 |
- Define global variables
1 | # 给请求指定一个请求头来模拟chrome浏览器 |
- Anti-theft chain
Some sites have anti-piracy links, the omnipotent python solution
1 | headers = {'Referer': href}img = requests.get(url, headers=headers) |
- Switching Versions
The Linux server is using AliCloud server, the default version python2, python3 install your own
1 | [root@AY140216131049Z mzitu]# python2 -VPython 2.7.5 |
- Abnormal Capture
In the process of crawling there may be an exception page, here we capture, does not affect the subsequent operations
1 | try: # 业务逻辑except Exception as e: print(e) |
Code implementation
Edit script: vi mzitu.py
1 | # coding=utf-8 |
The script runs under the Linux server by executing the following command
1 | python 3 mzitu.py |
Currently only crawled a column of sets of pictures, a total of 17G, 5332 pictures.
1 | [root@itstyle mzitu]# du -sh 17G . |
Below, please keep your eyes open for the cockamamie set moment.
Summary*
As a beginner, the script must have more or less some problems or places to be optimized, such as the encounter Python aunt, but also please more guidance.
In fact, the script is very simple, from the configuration of the environment, the installation of the integrated development environment, write the script to the smooth execution of the entire script, almost spent four or five hours, and ultimately the script a sinewy execution. Limited to the server bandwidth and configuration of the impact of the 17G figure almost downloaded three or four hours, as for the rest of the 83G, partners download it on their own.
—END—