国产成人精品亚洲777人妖,欧美日韩精品一区视频,最新亚洲国产,国产乱码精品一区二区亚洲

您的位置:首頁(yè)技術(shù)文章
文章詳情頁(yè)

python識(shí)別驗(yàn)證碼的思路及解決方案

瀏覽:39日期:2022-07-11 13:58:14

1、介紹

在爬蟲(chóng)中經(jīng)常會(huì)遇到驗(yàn)證碼識(shí)別的問(wèn)題,現(xiàn)在的驗(yàn)證碼大多分計(jì)算驗(yàn)證碼、滑塊驗(yàn)證碼、識(shí)圖驗(yàn)證碼、語(yǔ)音驗(yàn)證碼等四種。本文就是識(shí)圖驗(yàn)證碼,識(shí)別的是簡(jiǎn)單的驗(yàn)證碼,要想讓識(shí)別率更高,識(shí)別的更加準(zhǔn)確就需要花很多的精力去訓(xùn)練自己的字體庫(kù)。

識(shí)別驗(yàn)證碼通常是這幾個(gè)步驟:

(1)灰度處理

(2)二值化

(3)去除邊框(如果有的話)

(4)降噪

(5)切割字符或者傾斜度矯正

(6)訓(xùn)練字體庫(kù)

(7)識(shí)別

這6個(gè)步驟中前三個(gè)步驟是基本的,4或者5可根據(jù)實(shí)際情況選擇是否需要。

經(jīng)常用的庫(kù)有pytesseract(識(shí)別庫(kù))、OpenCV(高級(jí)圖像處理庫(kù))、imagehash(圖片哈希值庫(kù))、numpy(開(kāi)源的、高性能的Python數(shù)值計(jì)算庫(kù))、PIL的 Image,ImageDraw,ImageFile等。

2、實(shí)例

以某網(wǎng)站登錄的驗(yàn)證碼識(shí)別為例:具體過(guò)程和上述的步驟稍有不同。

python識(shí)別驗(yàn)證碼的思路及解決方案

首先分析一下,驗(yàn)證碼是由4個(gè)從0到9等10個(gè)數(shù)字組成的,那么從0到9這個(gè)10個(gè)數(shù)字沒(méi)有數(shù)字只有第一、第二、第三和第四等4個(gè)位置。那么計(jì)算下來(lái)共有40個(gè)數(shù)字位置,如下:

python識(shí)別驗(yàn)證碼的思路及解決方案

那么接下來(lái)就要對(duì)驗(yàn)證碼圖片進(jìn)行降噪、分隔得到上面的圖片。以這40個(gè)圖片集作為基礎(chǔ)。

對(duì)要驗(yàn)證的驗(yàn)證碼圖片進(jìn)行降噪、分隔后獲取四個(gè)類(lèi)似上面的數(shù)字圖片、通過(guò)和上面的比對(duì)就可以知道該驗(yàn)證碼是什么了。

以上面驗(yàn)證碼2837為例:

1、圖片降噪

python識(shí)別驗(yàn)證碼的思路及解決方案

2、圖片分隔

python識(shí)別驗(yàn)證碼的思路及解決方案

3、圖片比對(duì)

通過(guò)比驗(yàn)證碼降噪、分隔后的四個(gè)數(shù)字圖片,和上面的40個(gè)數(shù)字圖片進(jìn)行哈希值比對(duì),設(shè)置一個(gè)誤差,max_dif:允許最大hash差值,越小越精確,最小為0。

python識(shí)別驗(yàn)證碼的思路及解決方案

這樣四個(gè)數(shù)字圖片通過(guò)比較后獲取對(duì)應(yīng)是數(shù)字,連起來(lái),就是要獲取的驗(yàn)證碼。

完整代碼如下:

#coding=utf-8import osimport refrom selenium import webdriverfrom selenium.webdriver.common.keys import Keysimport timefrom selenium.webdriver.common.action_chains import ActionChainsimport collectionsimport mongoDbBaseimport numpyimport imagehashfrom PIL import Image,ImageFileimport datetimeclass finalNews_IE: def __init__(self,strdate,logonUrl,firstUrl,keyword_list,exportPath,codepath,codedir):self.iniDriver()self.db = mongoDbBase.mongoDbBase()self.date = strdateself.firstUrl = firstUrlself.logonUrl = logonUrlself.keyword_list = keyword_listself.exportPath = exportPathself.codedir = codedirself.hash_code_dict ={}for f in range(0,10): for l in range(1,5):file = os.path.join(codedir, 'codeLibrarycode' + str(f) + ’_’+str(l) + '.png')# print(file)hash = self.get_ImageHash(file)self.hash_code_dict[hash]= str(f) def iniDriver(self):# 通過(guò)配置文件獲取IEDriverServer.exe路徑IEDriverServer = 'C:Program FilesInternet ExplorerIEDriverServer.exe'os.environ['webdriver.ie.driver'] = IEDriverServerself.driver = webdriver.Ie(IEDriverServer) def WriteData(self, message, fileName):fileName = os.path.join(os.getcwd(), self.exportPath + ’/’ + fileName)with open(fileName, ’a’) as f: f.write(message) # 獲取圖片文件的hash值 def get_ImageHash(self,imagefile):hash = Noneif os.path.exists(imagefile): with open(imagefile, ’rb’) as fp:hash = imagehash.average_hash(Image.open(fp))return hash # 點(diǎn)降噪 def clearNoise(self, imageFile, x=0, y=0):if os.path.exists(imageFile): image = Image.open(imageFile) image = image.convert(’L’) image = numpy.asarray(image) image = (image > 135) * 255 image = Image.fromarray(image).convert(’RGB’) # save_name = 'D:workpython36_crawlVeriycodemode_5590.png' # image.save(save_name) image.save(imageFile) return image #切割驗(yàn)證碼 # rownum:切割行數(shù);colnum:切割列數(shù);dstpath:圖片文件路徑;img_name:要切割的圖片文件 def splitimage(self, imagePath,imageFile,rownum=1, colnum=4):img = Image.open(imageFile)w, h = img.sizeif rownum <= h and colnum <= w: print(’Original image info: %sx%s, %s, %s’ % (w, h, img.format, img.mode)) print(’開(kāi)始處理圖片切割, 請(qǐng)稍候...’) s = os.path.split(imageFile) if imagePath == ’’:dstpath = s[0] fn = s[1].split(’.’) basename = fn[0] ext = fn[-1] num = 1 rowheight = h // rownum colwidth = w // colnum file_list =[] for r in range(rownum):index = 0for c in range(colnum): # (left, upper, right, lower) # box = (c * colwidth, r * rowheight, (c + 1) * colwidth, (r + 1) * rowheight) if index < 1:colwid = colwidth + 6 elif index < 2:colwid = colwidth + 1 elif index < 3:colwid = colwidth box = (c * colwid, r * rowheight, (c + 1) * colwid, (r + 1) * rowheight) newfile = os.path.join(imagePath, basename + ’_’ + str(num) + ’.’ + ext) file_list.append(newfile) img.crop(box).save(newfile, ext) num = num + 1 index += 1 return file_list def compare_image_with_hash(self, image_hash1,image_hash2, max_dif=0):'''max_dif: 允許最大hash差值, 越小越精確,最小為0推薦使用'''dif = image_hash1 - image_hash2# print(dif)if dif < 0: dif = -difif dif <= max_dif: return Trueelse: return False # 截取驗(yàn)證碼圖片 def savePicture(self):self.driver.get(self.logonUrl)self.driver.maximize_window()time.sleep(1)self.driver.save_screenshot(self.codedir +'Temp.png')checkcode = self.driver.find_element_by_id('checkcode')location = checkcode.location # 獲取驗(yàn)證碼x,y軸坐標(biāo)size = checkcode.size # 獲取驗(yàn)證碼的長(zhǎng)寬rangle = (int(location[’x’]), int(location[’y’]), int(location[’x’] + size[’width’]), int(location[’y’] + size[’height’])) # 寫(xiě)成我們需要截取的位置坐標(biāo)i = Image.open(self.codedir +'Temp.png') # 打開(kāi)截圖result = i.crop(rangle) # 使用Image的crop函數(shù),從截圖中再次截取我們需要的區(qū)域filename = datetime.datetime.now().strftime('%M%S')filename =self.codedir +'Temp_code.png'result.save(filename)self.clearNoise(filename)file_list = self.splitimage(self.codedir,filename)verycode =’’for f in file_list: imageHash = self.get_ImageHash(f) for h,code in self.hash_code_dict.items():flag = self.compare_image_with_hash(imageHash,h,0)if flag: # print(code) verycode+=code breakprint(verycode)self.driver.close() def longon(self):self.driver.get(self.logonUrl)self.driver.maximize_window()time.sleep(1)self.savePicture()accname = self.driver.find_element_by_id('username')# accname = self.driver.find_element_by_id('//input[@id=’username’]')accname.send_keys(’ctrchina’)accpwd = self.driver.find_element_by_id('password')# accpwd.send_keys(’123456’)code = self.getVerycode()checkcode = self.driver.find_element_by_name('checkcode')checkcode.send_keys(code)submit = self.driver.find_element_by_name('button')submit.click()

實(shí)例補(bǔ)充:

# -*- coding: utf-8 -*import sysreload(sys)sys.setdefaultencoding( 'utf-8' )import reimport requestsimport ioimport osimport jsonfrom PIL import Imagefrom PIL import ImageEnhancefrom bs4 import BeautifulSoupimport mdataclass Student: def __init__(self, user,password): self.user = str(user) self.password = str(password) self.s = requests.Session() def login(self): url = 'http://202.118.31.197/ACTIONLOGON.APPPROCESS?mode=4' res = self.s.get(url).text imageUrl = ’http://202.118.31.197/’+re.findall(’<img src='http://www.intensediesel.com/bcjs/(.+?)' width='55'’,res)[0] im = Image.open(io.BytesIO(self.s.get(imageUrl).content)) enhancer = ImageEnhance.Contrast(im) im = enhancer.enhance(7) x,y = im.size for i in range(y): for j in range(x): if (im.getpixel((j,i))!=(0,0,0)): im.putpixel((j,i),(255,255,255)) num = [6,19,32,45] verifyCode = '' for i in range(4): a = im.crop((num[i],0,num[i]+13,20)) l=[] x,y = a.size for i in range(y): for j in range(x): if (a.getpixel((j,i))==(0,0,0)): l.append(1) else: l.append(0) his=0 chrr=''; for i in mdata.data: r=0; for j in range(260): if(l[j]==mdata.data[i][j]): r+=1 if(r>his): his=r chrr=i verifyCode+=chrr # print '輔助輸入驗(yàn)證碼完畢:',verifyCode data= { ’WebUserNO’:str(self.user), ’Password’:str(self.password), ’Agnomen’:verifyCode, } url = 'http://202.118.31.197/ACTIONLOGON.APPPROCESS?mode=4' t = self.s.post(url,data=data).text if re.findall('images/Logout2',t)==[]: l = ’[0,'’+re.findall(’alert((.+?));’,t)[1][1][2:-2]+’']’+' '+self.user+' '+self.password+'n' # print l # return ’[0,'’+re.findall(’alert((.+?));’,t)[1][1][2:-2]+’']’ return [False,l] else: l = ’登錄成功 ’+re.findall(’!&nbsp;(.+?)&nbsp;’,t)[0]+' '+self.user+' '+self.password+'n' # print l return [True,l] def getInfo(self): imageUrl = ’http://202.118.31.197/ACTIONDSPUSERPHOTO.APPPROCESS’ data = self.s.get(’http://202.118.31.197/ACTIONQUERYBASESTUDENTINFO.APPPROCESS?mode=3’).text #學(xué)籍信息 data = BeautifulSoup(data,'lxml') q = data.find_all('table',attrs={’align’:'left'}) a = [] for i in q[0]: if type(i)==type(q[0]) : for j in i : if type(j) ==type(i): a.append(j.text) for i in q[1]: if type(i)==type(q[1]) : for j in i : if type(j) ==type(i): a.append(j.text) data = {} for i in range(1,len(a),2): data[a[i-1]]=a[i] # data[’照片’] = io.BytesIO(self.s.get(imageUrl).content) return json.dumps(data) def getPic(self): imageUrl = ’http://202.118.31.197/ACTIONDSPUSERPHOTO.APPPROCESS’ pic = Image.open(io.BytesIO(self.s.get(imageUrl).content)) return pic def getScore(self): score = self.s.get(’http://202.118.31.197/ACTIONQUERYSTUDENTSCORE.APPPROCESS’).text #成績(jī)單 score = BeautifulSoup(score, 'lxml') q = score.find_all(attrs={’height’:'36'})[0] point = q.text print point[point.find(’平均學(xué)分績(jī)點(diǎn)’):] table = score.html.body.table people = table.find_all(attrs={’height’ : ’36’})[0].string r = table.find_all(’table’,attrs={’align’ : ’left’})[0].find_all(’tr’) subject = [] lesson = [] for i in r[0]: if type(r[0])==type(i): subject.append(i.string) for i in r: k=0 temp = {} for j in i: if type(r[0])==type(j): temp[subject[k]] = j.string k+=1 lesson.append(temp) lesson.pop() lesson.pop(0) return json.dumps(lesson) def logoff(self): return self.s.get(’http://202.118.31.197/ACTIONLOGOUT.APPPROCESS’).textif __name__ == '__main__': a = Student(20150000,20150000) r = a.login() print r[1] if r[0]: r = json.loads(a.getScore()) for i in r: for j in i: print i[j], print q = json.loads(a.getInfo()) for i in q: print i,q[i] a.getPic().show() a.logoff()

到此這篇關(guān)于python識(shí)別驗(yàn)證碼的思路及解決方案的文章就介紹到這了,更多相關(guān)python識(shí)別驗(yàn)證碼的思路是什么內(nèi)容請(qǐng)搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持好吧啦網(wǎng)!

標(biāo)簽: Python 編程
相關(guān)文章:
主站蜘蛛池模板: 灵台县| 怀集县| 海原县| 道孚县| 灵宝市| 泰来县| 邻水| 吉木萨尔县| 新巴尔虎右旗| 南开区| 堆龙德庆县| 合作市| 潍坊市| 滦南县| 滨海县| 丽江市| 无锡市| 石泉县| 通城县| 岑溪市| 伊通| 长海县| 金秀| 会泽县| 黔江区| 灵璧县| 斗六市| 揭西县| 剑阁县| 江油市| 留坝县| 万州区| 慈溪市| 遵义市| 无极县| 永丰县| 平塘县| 大方县| 同德县| 策勒县| 白城市|