[Python知识库] Python3学习20--正则表达式

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> Python知识库 -> Python3学习20--正则表达式 -> 正文阅读

[Python知识库]Python3学习20--正则表达式

本系列博文基于廖雪峰老师的官网Python教程，笔者在大学期间已经阅读过廖老师的Python教程，教程相当不错，官网链接：廖雪峰官方网站.请需要系统学习Python的小伙伴到廖老师官网学习，笔者的编程环境是Anaconda+Pycharm，Python版本：Python3.

1.简介

# 正则表达式：用来匹配字符串的武器；
# 设计思想：用一种描述性的语言来给字符串定义一个规则，凡是符合规则的字符串，认为匹配，否则，该字符串是不合法的；

# 实例：判断一个字符串是否是合法的Email方法：
# 1.创建一个匹配Email的正则表达式；
# 2.用该正则表达式去匹配用户的输入来判断是否合法；

# 如：\d可以匹配一个数字，\w可以匹配一个字母或数字；
# a. "00\d"可以匹配"008"，但无法匹配"00A";
# b. "\d\d\d"可以匹配"009";
# c. "\w\w\d"可以匹配"py3";

# 如： .匹配任意字符
# a. "py."可以匹配"pyc"、"pyt"等；

# 匹配变长的字符：
# a.用*表示任意个字符(包括0个)；
# b.用+表示至少一个字符；
# c.用?表示0个或1个字符；
# d.用{n}表示n个字符；
# e.用{n,m}表示n-m个字符；

# 实例：\d{2}\s+\d{3,6}
# a.\d{2}表示匹配2个数字，如："52";
# b.\s可以匹配一个空格，\s+表示至少有一个空格，如：匹配" "等；
# c.\d{3,6}表示3-6个数字，如："584520";

# 精准匹配，用[]表示范围
# a.[0-9a-zA-Z\_]表示可以匹配一个数字、字母、下划线;
# b.[0-9a-zA-Z\_]+表示可以匹配至少由一个数字、字母或下划线组成的字符串，如："Py20";
# c.[a-zA-Z\_][0-9a-zA-Z\_]*表示匹配由字母或下划线开头，后接任意个由一个数字、字母或下划线组成的字符串；
# d.[a-zA-Z\_][0-9a-zA-Z\_]{0,19}限制变量长度为1-20个字符；
# e.A|B表示匹配A或B,如：(W|w)illard匹配"Willard"或"willard";
# f.^表示行的开头，^\d表示必须以数字开头；
# g.$表示行的结束，\d$表示必须以数字结束；

# re模块：
import re

print("匹配成功，返回一个Match对象：")
print(re.match(r"^\d{3}\-\d{3,8}$", "020-6722053"))
print("----------------------------------------------------")

print("匹配失败，返回一个None：")
print(re.match(r"^\d{3}\-\d{3,8}$", "020 6722053"))
print("----------------------------------------------------")

user_input = input("请输入测试字符串：")

if re.match(r"^W|w{1-10}", user_input):
    print("It's OK.")
else:
    print("Failed.")

# 结果输出：
匹配成功，返回一个Match对象：
<re.Match object; span=(0, 11), match='020-6722053'>
----------------------------------------------------
匹配失败，返回一个None：
None
----------------------------------------------------
请输入测试字符串：Willard584520
It's OK.

2.切分字符串

import re

str_input = input("Please input test string：")

# 通过空格切分字符串
print(re.split(r"\s+", str_input))

# 结果输出：
# Please input test string：Hello Python.
# ['Hello', 'Python.']

import re

str_input = input("Please input test string：")

print(re.split(r"[\s\,]+", str_input))

# 结果输出：
# Please input test string：Hello Willard,welcome to FUXI Technology.
# ['Hello', 'Willard', 'welcome', 'to', 'FUXI', 'Technology.']

import re

str_input = input("Please input test string：")

print(re.split(r"[\s\,\.\;]+", str_input))

# 结果输出：
# Please input test string：Hello;I am Willard.Welcome to FUXI Technology.
# ['Hello', 'I', 'am', 'Willard', 'Welcome', 'to', 'FUXI', 'Technology', '']

3.分组

# ()表示要提取的分组(Group)
# ^(\d{3})-(\d{3,8})$分别定义了两个组
import re

match_test = re.match(r"^(\d{3})-(\d{3,8})$","020-6722053")
print("match_test：", match_test)
print("match_group(0)：", match_test.group(0))
print("match_group(1)：", match_test.group(1))
print("match_group(2)：", match_test.group(2))
print("---------------------------------------------------------")

website_match_test = re.match(r"(\w{3}).(\w{5}).(\w{3})", "www.baidu.com")

print("website_match_test：", website_match_test)
print("website_match_test_group(0)：", website_match_test.group(0))
print("website_match_test_group(1)：", website_match_test.group(1))
print("website_match_test_group(2)：", website_match_test.group(2))
print("website_match_test_group(3)：", website_match_test.group(3))

# 结果输出：
match_test： <re.Match object; span=(0, 11), match='020-6722053'>
match_group(0)： 020-6722053
match_group(1)： 020
match_group(2)： 6722053
---------------------------------------------------------
website_match_test： <re.Match object; span=(0, 13), match='www.baidu.com'>
website_match_test_group(0)： www.baidu.com
website_match_test_group(1)： www
website_match_test_group(2)： baidu
website_match_test_group(3)： com

4.贪婪匹配

# 贪婪匹配：匹配尽可能多的字符；
import re

string_input =  input("Please input string：")
print("采用贪婪匹配：")
print(re.match(r"^(\d+)(0*)$", string_input).groups())
print("---------------------")

print("采用非贪婪匹配：")
print(re.match(r"^(\d+?)(0*)$", string_input).groups())

Please input string：1008600
采用贪婪匹配：
('1008600', '')
---------------------
采用非贪婪匹配：
('10086', '00')

5.编译

# 使用正则表达式，re模块内部：
# a.编译正则表达式，如果正则表达式的字符串本身不合法，抛出错误；
# b.用编译后的正则表达式去匹配字符串；
# c.如果一个正则表达式要重复使用几千次，考虑效率，
# 可以预编译正则表达式，重复使用时，不需要编译这个步骤，直接匹配；
import re

# 编译
re_telephone = re.compile(r"^(\d{3})-(\d{3,8})$")

# 使用
telephone_input1 = input("Willard，please input your telphone number：")
telephone_input2 = input("Chen，Please input your telphone number：")

print("match：020-6722053，", re_telephone.match(telephone_input1).groups())
print("match：020-6722066，", re_telephone.match(telephone_input2).groups())

# 结果输出:
Willard，please input your telphone number：020-6722053
Chen，Please input your telphone number：020-6722066
match：020-6722053， ('020', '6722053')
match：020-6722066， ('020', '6722066')

Python知识库最新文章

Python中String模块

【Python】 14-CVS文件操作

python的panda库读写文件

使用Nordic的nrf52840实现蓝牙DFU过程

【Python学习记录】numpy数组用法整理

Python学习笔记

python字符串和列表

python如何从txt文件中解析出有效的数据

Python编程从入门到实践自学/3.1-3.2

python变量

加:2022-03-16 22:18:55 更:2022-03-16 22:20:34

360图书馆购物三丰科技阅读网日历万年历 2024年11日历

-2024/11/15 19:51:53-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码