requests的使用

requests

GET请求

import requests

r = requests.get('http://www.baidu.com')
print(r.status_code)
print(r.text)

上面代码实现了get请求,输出了状态码和响应的内容

构造请求链接:利用params参数

import requests

data = {
        'name': 'germey',
        'age': '22'
        }

r = requests.get('http://httpbin.org/get', params=data)
print(r.text)

#结果
{
  "args": {
    "age": "22", 
    "name": "germey"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "origin": "171.9.47.36", 
  "url": "http://httpbin.org/get?name=germey&age=22"
}

用text方法,网页返回的类型是str类型,是JSON格式的,要得到一个字典类型,可调用json()方法

r = r.json()

抓取二进制数据

r.content #返回bytes类型的数据

添加headers:利用headers参数

import requests

headers = {

    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; rv:61.0) Gecko/20100101 Firefox/61.0'
}

r = requests.get('https://www.zhihu.com/explore', headers=headers)

print(r.text)

POST请求

import requests

data = {'name': 'gtf', 'age': 12}
r = requests.post('http://httpbin.org/post', data)
print(r.text)

#结果
{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "age": "12", 
    "name": "gtf"
  }, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "15", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "171.9.47.36", 
  "url": "http://httpbin.org/post"
}

响应

import requests

data = {'name': 'gtf', 'age': 12}
r = requests.post('http://httpbin.org/post', data)
print(r.status_code) #状态码
print(r.headers) #响应头
print(r.headers['flag']) #响应头中'flag'的值
print(r.cookies) #Cookies
print(r.url) #url
print(r.history) #请求历史
print(r.encoding) # 编码

编码处理

使用 apparent_encoding 可以获得真实编码

res.encoding = res.apparent_encoding

文件上传

import requests

files = {'files': open('genie.jpg', 'rb')}
r = requests.post('http://httpbin.org/post', files=files)

print(r.text)

#结果
{
  "args": {}, 
  "data": "", 
  "files": {
    "files": "data:application/octet-stream;base64,/9j/4AAQSkZJRgABAQEAYABgAAD/2wBDAAgGBgcGBQ
  }, 
  "form": {}, 
  "headers": {
    "Accept": "*/*", 
    "Accept-Encoding": "gzip, deflate", 
    "Connection": "close", 
    "Content-Length": "69575", 
    "Content-Type": "multipart/form-data; boundary=e2791ecb1baa47f3a17ea98b876a9978", 
    "Host": "httpbin.org", 
    "User-Agent": "python-requests/2.18.4"
  }, 
  "json": null, 
  "origin": "171.9.47.36", 
  "url": "http://httpbin.org/post"
}

代理设置

proxies = {
    'http': 'http://10.10.1.10:3128'
}
requests.get('http://www.taobao.com',proxies=proxies)

timeout

可以在get或post请求中添加timeout参数; requests 在经过以 timeout 参数设定的秒数时间之后停止等待响应并会报错

https请求

遇到请求的SSL验证,可以直接跳过不验证,将verify=False设置一下即可