一、介紹
#介紹:使用requests可以模擬瀏覽器的請求,比起之前用到的urllib,requests模塊的api更加便捷(本質(zhì)就是封裝了urllib3)#注意:requests庫發(fā)送請求將網(wǎng)頁內(nèi)容下載下來以后,并不會執(zhí)行js代碼,這需要我們自己分析目標站點然后發(fā)起新的request請求#安裝:pip3 install requests#各種請求方式:常用的就是requests.get()和requests.post()>>> import requests>>> r = requests.get('https://api.github.com/events')>>> r = requests.post('http://httpbin.org/post', data = {'key':'value'})>>> r = requests.put('http://httpbin.org/put', data = {'key':'value'})>>> r = requests.delete('http://httpbin.org/delete')>>> r = requests.head('http://httpbin.org/get')>>> r = requests.options('http://httpbin.org/get')#建議在正式學習requests前,先熟悉下HTTP協(xié)議
二、基于GET請求
1、基本請求
import requestsresponse=requests.get('http://dig.chouti.com/')print(response.text)
2、帶參數(shù)的GET請求->params
import requests response=requests.get('https://s.taobao.com/search?q=手機') response=requests.get('https://s.taobao.com/search',params={'q':'美女'})
3、帶參數(shù)的GET請求->headers
#通常我們在發(fā)送請求時都需要帶上請求頭,請求頭是將自身偽裝成瀏覽器的關鍵,常見的有用的請求頭如下HostReferer #大型網(wǎng)站通常都會根據(jù)該參數(shù)判斷請求的來源User-Agent #客戶端Cookie #Cookie信息雖然包含在請求頭里,但requests模塊有單獨的參數(shù)來處理他,headers={}內(nèi)就不要放它了
#添加headers(瀏覽器會識別請求頭,不加可能會被拒絕訪問,比如訪問https://www.zhihu.com/explore)import requestsresponse=requests.get('https://www.zhihu.com/explore')response.status_code #500#自己定制headersheaders={ 'User-Agent':'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.76 Mobile Safari/537.36',}respone=requests.get('https://www.zhihu.com/explore', headers=headers)print(respone.status_code) #200
4、帶參數(shù)的GET請求->cookies
import uuidimport requestsurl = 'http://httpbin.org/cookies'cookies = dict(sbid=str(uuid.uuid4()))res = requests.get(url, cookies=cookies)print(res.json())
三、基于POST請求
1、介紹
#GET請求HTTP默認的請求方法就是GET * 沒有請求體 * 數(shù)據(jù)必須在1K之內(nèi)! * GET請求數(shù)據(jù)會暴露在瀏覽器的地址欄中GET請求常用的操作: 1. 在瀏覽器的地址欄中直接給出URL,那么就一定是GET請求 2. 點擊頁面上的超鏈接也一定是GET請求 3. 提交表單時,表單默認使用GET請求,但可以設置為POST#POST請求(1). 數(shù)據(jù)不會出現(xiàn)在地址欄中(2). 數(shù)據(jù)的大小沒有上限(3). 有請求體(4). 請求體中如果存在中文,會使用URL編碼!#?。?!requests.post()用法與requests.get()完全一致,特殊的是requests.post()有一個data參數(shù),用來存放請求體數(shù)據(jù)
2、發(fā)送post請求,模擬瀏覽器的登錄行為
#對于登錄來說,應該輸錯用戶名或密碼然后分析抓包流程,用腦子想一想,輸對了瀏覽器就跳轉(zhuǎn)了,還分析個毛線,累死你也找不到包
'''一 目標站點分析 瀏覽器輸入https://github.com/login 然后輸入錯誤的賬號密碼,抓包 發(fā)現(xiàn)登錄行為是post提交到:https://github.com/session 而且請求頭包含cookie 而且請求體包含: commit:Sign in utf8:? authenticity_token:lbI8IJCwGslZS8qJPnof5e7ZkCoSoMn6jmDTsL1r/m06NLyIbw7vCrpwrFAPzHMep3Tmf/TSJVoXWrvDZaVwxQ== login:egonlin password:123二 流程分析 先GET:https://github.com/login拿到初始cookie與authenticity_token 返回POST:https://github.com/session, 帶上初始cookie,帶上請求體(authenticity_token,用戶名,密碼等) 最后拿到登錄cookie ps:如果密碼時密文形式,則可以先輸錯賬號,輸對密碼,然后到瀏覽器中拿到加密后的密碼,github的密碼是明文'''import requestsimport re#第一次請求r1=requests.get('https://github.com/login')r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授權)authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁面中拿到CSRF TOKEN#第二次請求:帶著初始cookie和TOKEN發(fā)送POST請求給登錄頁面,帶上賬號密碼data={ 'commit':'Sign in', 'utf8':'?', 'authenticity_token':authenticity_token, 'login':'317828332@qq.com', 'password':'alex3714'}r2=requests.post('https://github.com/session', data=data, cookies=r1_cookie )login_cookie=r2.cookies.get_dict()#第三次請求:以后的登錄,拿著login_cookie就可以,比如訪問一些個人配置r3=requests.get('https://github.com/settings/emails', cookies=login_cookie)print('317828332@qq.com' in r3.text) #True
import requestsimport resession=requests.session()#第一次請求r1=session.get('https://github.com/login')authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁面中拿到CSRF TOKEN#第二次請求data={ 'commit':'Sign in', 'utf8':'?', 'authenticity_token':authenticity_token, 'login':'317828332@qq.com', 'password':'alex3714'}r2=session.post('https://github.com/session', data=data, )#第三次請求r3=session.get('https://github.com/settings/emails')print('317828332@qq.com' in r3.text) #True
3、補充
requests.post(url='xxxxxxxx', data={'xxx':'yyy'}) #沒有指定請求頭,#默認的請求頭:application/x-www-form-urlencoed#如果我們自定義請求頭是application/json,并且用data傳值, 則服務端取不到值requests.post(url='', data={'':1,}, headers={ 'content-type':'application/json' })requests.post(url='', json={'':1,}, ) #默認的請求頭:application/json
四、響應Response
1、response屬性
import requestsrespone=requests.get('http://www.jianshu.com')# respone屬性print(respone.text)print(respone.content)print(respone.status_code)print(respone.headers)print(respone.cookies)print(respone.cookies.get_dict())print(respone.cookies.items())print(respone.url)print(respone.history)print(respone.encoding)
2、編碼問題
#編碼問題import requestsresponse=requests.get('http://www.autohome.com/news')# response.encoding='gbk' #汽車之家網(wǎng)站返回的頁面內(nèi)容為gb2312編碼的,而requests的默認編碼為ISO-8859-1,如果不設置成gbk則中文亂碼print(response.text)
3、獲取二進制數(shù)據(jù)
import requestsresponse=requests.get('https://timgsa.baidu.com/timg?image&quality=80&size=b9999_10000&sec=1509868306530&di=712e4ef3ab258b36e9f4b48e85a81c9d&imgtype=0&src=http%3A%2F%2Fc.hiphotos.baidu.com%2Fimage%2Fpic%2Fitem%2F11385343fbf2b211e1fb58a1c08065380dd78e0c.jpg')with open('a.jpg','wb') as f: f.write(response.content)
#stream參數(shù):一點一點的取,比如下載視頻時,如果視頻100G,用response.content然后一下子寫到文件中是不合理的import requestsresponse=requests.get('https://gss3.baidu.com/6LZ0ej3k1Qd3ote6lo7D0j9wehsv/tieba-smallvideo-transcode/1767502_56ec685f9c7ec542eeaf6eac93a65dc7_6fe25cd1347c_3.mp4', stream=True)with open('b.mp4','wb') as f: for line in response.iter_content(): f.write(line)
4、解析json
#解析jsonimport requestsresponse=requests.get('http://httpbin.org/get')import jsonres1=json.loads(response.text) #太麻煩res2=response.json() #直接獲取json數(shù)據(jù)print(res1 == res2) #True
5、Redirection and History
By default Requests will perform location redirection for all verbs except HEAD.We can use the history property of the Response object to track redirection.The Response.history list contains the Response objects that were created in order to complete the request. The list is sorted from the oldest to the most recent response.For example, GitHub redirects all HTTP requests to HTTPS:>>> r = requests.get('http://github.com')>>> r.url'https://github.com/'>>> r.status_code200>>> r.history[<Response [301]>]If you're using GET, OPTIONS, POST, PUT, PATCH or DELETE, you can disable redirection handling with the allow_redirects parameter:>>> r = requests.get('http://github.com', allow_redirects=False)>>> r.status_code301>>> r.history[]If you're using HEAD, you can enable redirection as well:>>> r = requests.head('http://github.com', allow_redirects=True)>>> r.url'https://github.com/'>>> r.history[<Response [301]>]
import requestsimport re#第一次請求r1=requests.get('https://github.com/login')r1_cookie=r1.cookies.get_dict() #拿到初始cookie(未被授權)authenticity_token=re.findall(r'name='authenticity_token'.*?value='(.*?)'',r1.text)[0] #從頁面中拿到CSRF TOKEN#第二次請求:帶著初始cookie和TOKEN發(fā)送POST請求給登錄頁面,帶上賬號密碼data={ 'commit':'Sign in', 'utf8':'?', 'authenticity_token':authenticity_token, 'login':'317828332@qq.com', 'password':'alex3714'}#測試一:沒有指定allow_redirects=False,則響應頭中出現(xiàn)Location就跳轉(zhuǎn)到新頁面,r2代表新頁面的responser2=requests.post('https://github.com/session', data=data, cookies=r1_cookie )print(r2.status_code) #200print(r2.url) #看到的是跳轉(zhuǎn)后的頁面print(r2.history) #看到的是跳轉(zhuǎn)前的responseprint(r2.history[0].text) #看到的是跳轉(zhuǎn)前的response.text#測試二:指定allow_redirects=False,則響應頭中即便出現(xiàn)Location也不會跳轉(zhuǎn)到新頁面,r2代表的仍然是老頁面的responser2=requests.post('https://github.com/session', data=data, cookies=r1_cookie, allow_redirects=False )print(r2.status_code) #302print(r2.url) #看到的是跳轉(zhuǎn)前的頁面https://github.com/sessionprint(r2.history) #[]