文章詳情頁(yè)

python 如何獲取頁(yè)面所有a標(biāo)簽下href的值

瀏覽：43日期：2022-06-20 14:31:38

看代碼吧~

# -*- coding:utf-8 -*-#python 2.7#http://tieba.baidu.com/p/2460150866#標(biāo)簽操作 from bs4 import BeautifulSoupimport urllib.requestimport re #如果是網(wǎng)址，可以用這個(gè)辦法來(lái)讀取網(wǎng)頁(yè)#html_doc = 'http://tieba.baidu.com/p/2460150866'#req = urllib.request.Request(html_doc) #webpage = urllib.request.urlopen(req) #html = webpage.read() html='''<html><head><title>The Dormouse’s story</title></head><body>The Dormouse’s storyOnce upon a time there were three little sisters; and their names were<a rel='external nofollow' rel='external nofollow' id='xiaodeng'></a>,<a rel='external nofollow' rel='external nofollow' id='link2'>Lacie</a> and<a rel='external nofollow' id='link3'>Tillie</a>;<a rel='external nofollow' rel='external nofollow' id='xiaodeng'>Lacie</a>and they lived at the bottom of a well....'''soup = BeautifulSoup(html, ’html.parser’) #文檔對(duì)象 #查找a標(biāo)簽,只會(huì)查找出一個(gè)a標(biāo)簽#print(soup.a)#<a rel='external nofollow' rel='external nofollow' id='xiaodeng'></a> for k in soup.find_all(’a’): print(k) print(k[’class’])#查a標(biāo)簽的class屬性 print(k[’id’])#查a標(biāo)簽的id值 print(k[’href’])#查a標(biāo)簽的href值 print(k.string)#查a標(biāo)簽的string

如果，標(biāo)簽<a>中含有其他標(biāo)簽，比如..，此時(shí)要提取<a>中的數(shù)據(jù)，需要用k.get_text()

soup = BeautifulSoup(html, ’html.parser’) #文檔對(duì)象#查找a標(biāo)簽,只會(huì)查找出一個(gè)a標(biāo)簽for k in soup.find_all(’a’): print(k) print(k[’class’])#查a標(biāo)簽的class屬性 print(k[’id’])#查a標(biāo)簽的id值 print(k[’href’])#查a標(biāo)簽的href值 print(k.string)#查a標(biāo)簽的string

如果，標(biāo)簽<a>中含有其他標(biāo)簽，比如..，此時(shí)要提取<a>中的數(shù)據(jù)，需要用k.get_text()

通常我們使用下面這種模式也是能夠處理的，下面的方法使用了get()。

html = urlopen(url) soup = BeautifulSoup(html, ’html.parser’) t1 = soup.find_all(’a’) print t1 href_list = [] for t2 in t1: t3 = t2.get(’href’) href_list.append(t3)

補(bǔ)充：python爬蟲(chóng)獲取任意頁(yè)面的標(biāo)簽和屬性（包括獲取a標(biāo)簽的href屬性）

看代碼吧~

# coding=utf-8 from bs4 import BeautifulSoup import requests # 定義一個(gè)獲取url頁(yè)面下label標(biāo)簽的attr屬性的函數(shù) def getHtml(url, label, attr): response = requests.get(url) response.encoding = ’utf-8’ html = response.text soup = BeautifulSoup(html, ’html.parser’); for target in soup.find_all(label): try: value = target.get(attr) except: value = ’’ if value: print(value) url = ’https://baidu.com/’ label = ’a’ attr = ’href’ getHtml(url, label, attr)

python 如何獲取頁(yè)面所有a標(biāo)簽下href的值

以上為個(gè)人經(jīng)驗(yàn)，希望能給大家一個(gè)參考，也希望大家多多支持好吧啦網(wǎng)。如有錯(cuò)誤或未考慮完全的地方，望不吝賜教。

Python 編程

上一條：Python 如何安裝Selenium(推薦)下一條：Python基礎(chǔ)之hashlib模塊詳解

相關(guān)文章：

1. xml中的空格之完全解說(shuō)2. ASP中解決“對(duì)象關(guān)閉時(shí),不允許操作。”的詭異問(wèn)題……3. msxml3.dll 錯(cuò)誤 800c0019 系統(tǒng)錯(cuò)誤:-2146697191解決方法4. html小技巧之td,div標(biāo)簽里內(nèi)容不換行5. XML入門(mén)的常見(jiàn)問(wèn)題(四)6. ASP中if語(yǔ)句、select 、while循環(huán)的使用方法7. 匹配模式 - XSL教程 - 48. 解決ASP中http狀態(tài)跳轉(zhuǎn)返回錯(cuò)誤頁(yè)的問(wèn)題9. WML語(yǔ)言的基本情況10. CSS3中Transition屬性詳解以及示例分享

排行榜

					
					Java 生成帶Logo和文字的二維碼
Java搜索引擎Compass Framework 0.8.0 新版發(fā)布
Android 使用騰訊X5瀏覽器上傳圖片的示例
詳解如何實(shí)現(xiàn)phpoffice的excel導(dǎo)入功能解耦
前后端ajax和json數(shù)據(jù)交換方式
Python GUI編程學(xué)習(xí)筆記之tkinter控件的介紹及基本使用方法詳解
IDEA解決src和resource下創(chuàng)建多級(jí)目錄的操作
原生Ajax之全面了解xhr的概念與使用
Python select及selectors模塊概念用法詳解
解決idea 拉取代碼出現(xiàn)的 “ Сannot Run Git Cannot identify version of git executable: no response“的問(wèn)題
在JSP中使用formatNumber控制要顯示的小數(shù)位數(shù)方法
				

国产成人精品亚洲777人妖,欧美日韩精品一区视频,最新亚洲国产,国产乱码精品一区二区亚洲

python 如何獲取頁(yè)面所有a標(biāo)簽下href的值