突发奇想想将读取pdf做anki卡组,以为会很简单,没想到最后卡在读取word上,赶紧记在小本本上,下回读取word就不怕啦!
还是用docx库,具体安装之前写过,直接跳过写怎么用啦。
读取文档对象
1 2 3 4 5
| import docx
path = "C:\test.docx"
file = docx.Document(path)
|
读取文本
1 2 3 4 5 6 7
| content = file.paragraphs print(content.text)
for f in file.paragraphs: print(f.text)
|
获取属性
以获取文本颜色为例
1 2 3 4
| for p in file.paragraphs: for n in p.runs: color = str(n.font.color.rgb)
|
Reference
Python读取word文档识别字段颜色,解析字段