文章詳情頁

python3.x - python 中的maketrans在utf-8文件中該怎么使用

瀏覽：180日期：2022-07-05 10:59:36

問題描述

我寫了一個(gè)處理文本的文件就是把文本中所有的符號(hào)都替換掉，替換成空格。用的python中maketrans和translate。其中在使用對(duì)于ASCII編碼的文件時(shí)是正常的，但對(duì)于utf-8文件時(shí)，就報(bào)錯(cuò)，提示maketrans中的參數(shù)不等長，但是明明是一樣長的啊：

File '/Users/lgq/Desktop/p3.py', line 10, in text_to_words

'abcdefghijklmnopqrstuvwxyz ')

ValueError: the first two maketrans arguments must have equal length

我查了一下說是maketrans在utf-8下不能用，那我在utf-8下該怎么替換掉字符呢，求各位大神指點(diǎn)。

def text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdsdef get_words_in_book(filename): ''' Read a book from filename, and return a list of its words.''' f = open(filename, 'r', encoding = 'utf-8') content = f.read() f.close() wds = text_to_words(content) return wdsbook_words = get_words_in_book('alice.txt')print('There are {0} words in the book, the first 100 aren{1}'.format(len(book_words), book_words[:100]))

問題解答

回答1：

首先這兩個(gè)字符串長度不相等， ' 是一個(gè)字符，也是一個(gè)字符你可以用 len() 查看。然后關(guān)于字符串什么的問題，最好說明 python 的版本

maketrans 參數(shù)長度不相等

my_substitutions = the_text.maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ')

測(cè)試代碼：

from string import translate, maketransdef text_to_words(the_text): ''' Return a list of words with all punctuation removed,and all in lowercase. ''' my_substitutions = maketrans(# If you find any of these'ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’',# Replace them by these'abcdefghijklmnopqrstuvwxyz ') # Translate the text now. cleaned_text = the_text.translate(my_substitutions) wds = cleaned_text.split() return wdstext_to_words(’ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!'#$%&()*+,-./:;<=>?@[]^_`{|}~’測(cè)試’)

output

[’abcdefghijklmnopqrstuvwxyz’, ’xe6xb5x8bxe8xafx95’]

這是 python2 的運(yùn)行結(jié)果

Python 編程

上一條：python - 含中文JSON未能按期待進(jìn)行dumps，(\xxx\xxx\xxx)?下一條：python - subprocess模塊怎樣返回執(zhí)行文件內(nèi)容？

相關(guān)文章：

1. css - 新手做響應(yīng)式布局，斷點(diǎn)過后右側(cè)出現(xiàn)空白，求幫助，謝謝。2. python - 啟動(dòng)Eric6時(shí)報(bào)錯(cuò)：’qscintilla_zh_CN’ could not be loaded3. javascript - vue 怎么渲染自定義組件4. javascript - 關(guān)于<a>元素與<input>元素的JS事件運(yùn)行問題5. mysql - 查詢字段做了索引為什么不起效,還有查詢一個(gè)月的時(shí)候數(shù)據(jù)都是全部出來的，如果分拆3次的話就沒問題，為什么呢。6. mysql - 記得以前在哪里看過一個(gè)估算時(shí)間的網(wǎng)站7. 大家好，我想請(qǐng)問一下怎么做搜索欄能夠搜索到自己網(wǎng)站的內(nèi)容。8. ID主鍵不是自增的嗎為什么還要加null9. MySQL中的enum類型有什么優(yōu)點(diǎn)？10. python - 安裝pyspider時(shí)出錯(cuò)：No space left on device:

排行榜

					
					在windows下安裝docker  Toolbox 啟動(dòng)Docker Quickstart Terminal 失敗！
javascript - nodejs 如何同步執(zhí)行某些模塊函數(shù)?
大家好，我想請(qǐng)問一下怎么做搜索欄能夠搜索到自己網(wǎng)站的內(nèi)容。
css - 新手做響應(yīng)式布局，  斷點(diǎn)過后右側(cè)出現(xiàn)空白，求幫助，謝謝。
mysql - 查詢字段做了索引為什么不起效,還有查詢一個(gè)月的時(shí)候數(shù)據(jù)都是全部出來的，如果分拆3次的話就沒問題，為什么呢。
MySQL中的enum類型有什么優(yōu)點(diǎn)？
javascript - vue 怎么渲染自定義組件
css3 - 純css實(shí)現(xiàn)點(diǎn)擊特效
android下css3動(dòng)畫非常卡，GPU也不差啊
android - rxjava多線程并發(fā)怎么控制順序
python - 安裝pyspider時(shí)出錯(cuò)：No space left on device:
				

熱門標(biāo)簽

国产成人精品亚洲777人妖,欧美日韩精品一区视频,最新亚洲国产,国产乱码精品一区二区亚洲

python3.x - python 中的maketrans在utf-8文件中該怎么使用