PDF的页操作
这里主要用的是pypdf4。虽然pypdf2更加热门,但是它已经停止了维护。目前最新的版本是pypdf4,希望作者可以一直维护下去。 安装:pip install PyPDF4 github:https://github.com/claird/PyPDF4 pypi:https://www.cnpython.com/pypi/pypdf4 目前的版本是1.27.0,与pypdf2基本一致。 pypdf2文档:https://pythonhosted.org/PyPDF2/
pypdf4有两大类,分别是PdfFileReader 和PdfFileWriter 。顾名思义,前者用来读取PDF,后者用来写入PDF
PdfFileReader
读取PDF
import os
from PyPDF4 import PdfFileReader
pdf_path = os.path(r"F:\test.pdf")
pdf = PdfFileReader(pdf_path, 'rb')
一些方法
pdf.getDocumentInfo()
pdf.getIsEncrypted()
pdf.getNumPages()
pdf.getPage(index)
pdf.getOutlines()
PdfFileWriter
写入PDF
from PyPDF4 import PdfFileWriter
output = PdfFileWriter()
with open(r'F:\output.pdf','rb') as f:
output.write(f)
一些方法
output.addpage(Page)
output.addBlankPage()
output.addBookmark(title, pagenum)
output.cloneDocumentFromReader(PdfFileReader)
output.insertBlankPage(index=pos)
output.insertPage(page, pos)
output.getNumPages()
output.getPage(index)
output.getOutlines()
output.encrypt(user_pwd)
PdfFileMerger
可以用来合并多个PDF文件的类,主要方法是merge 和append 。目前还没看明白。
样例
删除指定页
import os
from PyPDF4 import PdfFileWriter, PdfFileReader
path = r"F:"
index = 1
infile = PdfFileReader(os.path.join(path,'test.pdf'),'rb')
output = PdfFileWriter()
for i in range(infile.getNumPages()):
if i != index:
p = infile.getPage(i)
output.addPage(p)
with open(os.path.join(path,'new_test.pdf'),'wb') as f:
output.write(f)
合并多个PDF
import os
from PyPDF4 import PdfFileWriter, PdfFileReader
path = r"F:"
pdf_list = os.listdir(path)
output = PdfFileWriter()
for pdf in pdf_list:
infile = PdfFileReader(os.path.join(path,pdf),'rb')
for i in range(infile.getNumPages()):
p = infile.getPage(i)
output.addPage(p)
with open(os.path.join(path,'new.pdf'),'wb') as f:
output.write(f)
|