[Python知识库] Python学习笔记：20 Python读写Word文件和PDF文件

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> Python学习笔记：20 Python读写Word文件和PDF文件 -> 正文阅读

[Python知识库]Python学习笔记：20 Python读写Word文件和PDF文件

写Word文档

使用Python写Word文档需要安装docx三方库，如下示例写了一个简单的Word文档

from docx import Document
from docx.shared import Cm, Pt
from docx.document import Document as Doc
# 创建一个word对象
document = Document()  # type:Doc

# font = document.styles['Normal'].font
# font.size = Pt(22)
# 添加顶级标题
document.add_heading('快快乐乐学Python', 0)
# 添加段落
p = document.add_paragraph('Python是一门非常流行的编程语言，它')
run = p.add_run('简单')
run.bold = True
# 设置字体大小
run.font.size = Pt(18)
run = p.add_run('而且')
# 设置字体
run.font.name = 'HYj1gf'
p.add_run('优雅。').italic = True
# 添加一级标题
document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='Intense Quote')
# 带上小圆圈
document.add_paragraph(
    'first item in unordered list', style='List Bullet'
)
# 带上数字
document.add_paragraph(
    'first item in ordered list', style='List Number'
)
# 添加图片
document.add_picture('resources/beauty.png', width=Cm(3.2))
# 加分节符
document.add_section()

records = (
    ('小龙', '男', '1999-02-15'),
    ('小英', '女', '2000-10-20'),
    ('小白', '女', '1998-07-18')
)

table = document.add_table(rows=1, cols=3)
# 使用表格模板
table.style = 'Colorful List Accent 1'
hdr_cells = table.rows[0].cells
hdr_cells[0].text = '姓名'
hdr_cells[1].text = '性别'
hdr_cells[2].text = '生日'
for name, sex, birthday in records:
    row_cells = table.add_row().cells
    row_cells[0].text = name
    row_cells[1].text = sex
    row_cells[2].text = birthday
# 加分页符
document.add_page_break()

document.save('resources/demo.docx')

生活中，可能需要批量地写一些文档，这时候可以利用Python来帮助我们完成重复性的工作，例如要批量地写一些离职证明，我们可以通过读入离职证明模板，通过录入一些需要改动的信息，生成不同人的离职证明，代码如下所示：

from docx import Document
from docx.document import Document as Doc

employees = [
    {
        'name': '小龙',
        'id': '100200198011280001',
        'sdate': '2008年3月1日',
        'edate': '2012年2月29日',
        'department': '产品研发',
        'position': '架构师'
    },
    {
        'name': '小青',
        'id': '510210199012125566',
        'sdate': '2019年1月1日',
        'edate': '2021年4月30日',
        'department': '产品研发',
        'position': 'Python开发工程师'
    }
]


for emp_dict in employees:
    doc = Document('resources/离职证明模板.docx')  # type: Doc
    for p in doc.paragraphs:
        if '{' not in p.text:
            continue
        for run in p.runs:
            if '{' not in run.text:
                continue
            # 将占位符换成实际内容
            start, end = run.text.find('{'), run.text.find('}')
            key, place_holder = run.text[start + 1:end], run.text[start:end + 1]
            run.text = run.text.replace(place_holder, emp_dict[key])
    doc.save(f'resources/{emp_dict["name"]}离职证明.docx')

在模板中，需要录入信息的地方使用占位符{}括起来

操作PDF文件

读取PDF并提取文字

在Python中，可以使用名为PyPDF2的三方库来读取PDF文件。

import PyPDF2

from PyPDF2.pdf import PageObject

reader = PyPDF2.PdfFileReader('resources/XGBoost.pdf')
writer = PyPDF2.PdfFileWriter()
for page_num in range(reader.numPages):
    current_page = reader.getPage(page_num)  # type:PageObject
    print(current_page.extractText())
    current_page.rotateClockwise(90)   # 顺时针旋转90度
    writer.addPage(current_page)
    writer.addBlankPage()   # 添加空白页
with open('resources/XGBoost-modified.pdf', 'wb') as file:
    writer.write(file)

给PDF文件添加密码

使用encrypt函数可以实现PDF文件的加密，这样别人想打开你的文件时，只有输入正确的密码，才能打开。

import PyPDF2

reader = PyPDF2.PdfFileReader('resources/XGBoost.pdf')
writer = PyPDF2.PdfFileWriter()
for page_num in range(reader.numPages):
    writer.addPage(reader.getPage(page_num))
# 加密PDF文件
writer.encrypt('123456')
with open('resources/XGBoost-encrypted.pdf', 'wb') as file:
    writer.write(file)

给PDF文件添加水印

原理是将水印文件合并到需要添加的PDF文件的每一页上面。使用mergePage函数可以实现页面的合并（重叠）。

import PyPDF2

from PyPDF2.pdf import PageObject

reader1 = PyPDF2.PdfFileReader('resources/XGBoost.pdf')
reader2 = PyPDF2.PdfFileReader('resources/watermark.pdf')
writer = PyPDF2.PdfFileWriter()

watermark_page = reader2.getPage(0)
for page_num in range(reader1.numPages):
    current_page = reader1.getPage(page_num)  # type: PageObject
    current_page.mergePage(watermark_page)
    writer.addPage(current_page)

with open('resources/XGBoost-watermarked.pdf', 'wb') as file:
    writer.write(file)

Python知识库最新文章

Python中String模块

【Python】 14-CVS文件操作

python的panda库读写文件

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

加:2021-08-15 15:29:40 更:2021-08-15 15:30:41

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/9 13:47:46-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码