比基尼泳衣美女视频,日本美女小视频

python爬蟲練習，爬取豆瓣最受歡迎的250部電影，并保存至excel

禁忌石 >《python》

2022.11.06 浙江

關注

簡介

目標：使用 BeautifulSoup + Reuqests，爬取豆瓣上評分最高的250部電影，并保存到excel表格中。

requests庫，參考前面文章：python爬蟲之reuqests庫

BeautifulSoup庫，參考前面文章：python爬蟲之Beautiful Soup庫

一、創(chuàng)建xls表格

需要用到xlwt庫，沒安裝的話，安裝即可

pip install xlwt

創(chuàng)建空表格#創(chuàng)建一個excel表格，定義編碼為utf-8，默認為ASCII編碼excl=xlwt.Workbook(encoding='utf-8')movie=excl.add_sheet('movie top 250')movie.write(0,0,'排名')movie.write(0,1,'名稱')movie.write(0,2,'導演演員')movie.write(0,3,'評分')movie.write(0,4,'鏈接')

二、創(chuàng)建請求函數(shù)

這里創(chuàng)建douban_re的函數(shù)，包括鏈接和headers，用于訪問頁面。

def douban_re(url): headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/65.0.3325.162 Safari/537.36'} re=requests.get(url=url,headers=headers) return re.text

三、提取數(shù)據(jù)并存入excel表格

同樣創(chuàng)建一個函數(shù)，BeatifulSoup 解析數(shù)據(jù)，然后用循環(huán)的方式依次追加內容到表中。

需要先分析頁面信息

獲取頁面內容，具體可以參考文章：python爬蟲之Beautiful Soup庫def write_excel(soup): list=soup.find(class_='grid_view').find_all('li') for item in list: item_num=item.find('em').string item_name=item.find(class_='title').string item_act=item.find('p').text.replace(' ','') item_sc=item.find(class_='rating_num').string item_link=item.find('a')['href'] #print('排名：'+item_num,'\n電影名稱：'+item_name,item_act,item_sc,item_link) #獲取內容循環(huán)追加到表中。 global n movie.write(n,0,item_num) movie.write(n,1,item_name) movie.write(n,2,item_act) movie.write(n,3,item_sc) movie.write(n,4,item_link) n = n+1

四、循環(huán)多個頁面的內容

分析網(wǎng)址信息：

#首頁https://movie.douban.com/top250?start=0&filter=#第二頁https://movie.douban.com/top250?start=25&filter=#第三頁https://movie.douban.com/top250?start=50&filter=

可以發(fā)現(xiàn)就start=25的數(shù)字不同，同樣用循環(huán)的方式依次訪問頁面即可。

創(chuàng)建訪問頁面函數(shù)main，在后面調取循環(huán)的頁面數(shù)即可。def main(page): url='https://movie.douban.com/top250?start='+str(page*25)+'&filter=' #url='https://movie.douban.com/top250' html=douban_re(url) soup=BeautifulSoup(html,'lxml') write_excel(soup)if __name__=='__main__': for i in range(0,10): main(i)

五、完整代碼

import requestsfrom bs4 import BeautifulSoupimport xlwtexcl=xlwt.Workbook(encoding='utf-8')movie=excl.add_sheet('movie top 250')movie.write(0,0,'排名')movie.write(0,1,'名稱')movie.write(0,2,'導演演員')movie.write(0,3,'評分')movie.write(0,4,'鏈接')n=1def douban_re(url): headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_3) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/65.0.3325.162 Safari/537.36'} re=requests.get(url=url,headers=headers) return re.textdef write_excel(soup): list=soup.find(class_='grid_view').find_all('li') for item in list: item_num=item.find('em').string item_name=item.find(class_='title').string item_act=item.find('p').text.replace(' ','') item_sc=item.find(class_='rating_num').string item_link=item.find('a')['href'] #print('排名：'+item_num,'\n電影名稱：'+item_name,item_act,item_sc,item_link) global n movie.write(n,0,item_num) movie.write(n,1,item_name) movie.write(n,2,item_act) movie.write(n,3,item_sc) movie.write(n,4,item_link) n = n+1def main(page): url='https://movie.douban.com/top250?start='+str(page*25)+'&filter=' #url='https://movie.douban.com/top250' html=douban_re(url) soup=BeautifulSoup(html,'lxml') write_excel(soup)if __name__=='__main__': for i in range(0,10): main(i)excl.save('movie_top_250.xls')

最后excl.save保存并命令即可，注：xlwt只能創(chuàng)建保存為xls格式的表，不能保存xlsx格式的表格

執(zhí)行結果，得到一個名稱為movie_top_250.xls的表格，打開表格。

本站僅提供存儲服務，所有內容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權內容，請點擊舉報。

打開APP，閱讀全文并永久保存查看更多類似文章

python爬蟲08 | 你的第二個爬蟲，要過年了，爬取豆瓣最受歡迎的250部電影慢慢看

單線程、多線程和協(xié)程的爬蟲性能對比

用python實現(xiàn)一個抓取騰訊電影的爬蟲

python爬取44130條用戶觀影數(shù)據(jù)，分析挖掘用戶與電影之間的隱藏信息！

Python爬蟲入門保姆級教程！看完不會來找我

小白喜提python爬蟲（一）看完即會，大神請繞路或是蒞臨指導！！

更多類似文章 >>

国产一级a片免费看高清,亚洲熟女中文字幕在线视频,黄三级高清在线播放,免费黄色视频在线看