学习过程中的笔记记录
前言
基于匹配的模式机器人的实现,可以进行单个词语匹配,也可以同时进行多个词语匹配。
一、什么是占位符?
1、定义
为了实现模板的判断和定义,我们需要定义一个特殊的符号类型,这个符号类型就叫做"variable", 这个"variable"用来表示是一个占位符。例如,定义一个目标: “I want X”, 我们可以表示成 “I want ?X”, 意思就是?X是一个用来占位的符号。 如果输入了"I want holiday", 在这里 ‘holiday’ 就是 ‘?X’
2、判断占位符–True or False
def is_variable(pat):
return pat.startswith('?') and all(s.isalpha() for s in pat[1:])
def pat_match(pattern, saying):
if is_variable(pattern[0]): return True
else:
if pattern[0] != saying[0]: return False
else:
return pat_match(pattern[1:], saying[1:])
print(pat_match('I want ?X'.split(), "I want holiday".split()))
print(pat_match('I have dreamed a ?X'.split(), "I dreamed about dog".split()))
True
False
3、输出占位符所对应的内容-获得匹配变量
前提是要求两个字符串结构完全一样: eg: I want ?X ==> I want holiday
def pat_match_content(pattern, saying):
if is_variable(pattern[0]):
return pattern[0], saying[0]
else:
if pattern[0] != saying[0]: return False
else:
return pat_match_content(pattern[1:], saying[1:])
pattern = 'I want ?X'.split()
saying = "I want holiday".split()
print(pat_match_content(pattern, saying))
('?X', 'holiday')
4、当句子中存在两个占位符时候
def pat_match_multivarible(pattern, saying):
if not pattern or not saying: return []
if is_variable(pattern[0]):
return [(pattern[0], saying[0])] + pat_match_multivarible(pattern[1:], saying[1:])
else:
if pattern[0] != saying[0]: return []
else:
return pat_match_multivarible(pattern[1:], saying[1:])
print(pat_match_multivarible("?X greater than ?Y".split(), "3 greater than 2".split()))
('?X', '3'), ('?Y', '2')]
二、占位符替换
因为已经知道占位符所对应的内容是什么,所以只需要在新的句子中进行占位符替换,即可得到含有占位符所表示的内容的新句子。
1、形成字典{‘占位符’:‘占位符所对应的内容’}
def pat_to_dict(patterns):
'''
其中patterns 是list 形式
'''
return {k: v for k, v in patterns}
print(pat_to_dict([('?X', 'iPhone')]))
{'?X': 'iPhone'}
2、进行句子替换–单个占位符替换
def subsitite(rule,parsed_rules):
if not rule:return []
else:
return [parsed_rules.get(rule[0],rule[0])]+subsitite(rule[1:],parsed_rules)
print(subsitite("What if you mean if you got a ?X".split(), pat_to_dict(got_patterns)))
['What', 'if', 'you', 'mean', 'if', 'you', 'got', 'a', 'iPhone']
' '.join(s_list)
'What if you mean if you got a iPhone'
3、进行句子替换–两个占位符替换
john_pat = pat_match_multivarible('?P needs ?X'.split(), "John needs vacation".split())
subsitite("Why does ?P need ?X ?".split(), pat_to_dict(john_pat))
' '.join(subsitite("Why does ?P need ?X ?".split(), pat_to_dict(john_pat)))
4、占位符和部分片段进行匹配
输出占位符匹配的内容:
def is_variable(pat):
return pat.startswith('?') and all(s.isalpha() for s in pat[1:])
def is_pattern_segment(pattern):
return pattern.startswith('?*') and all(a.isalpha() for a in pattern[2:])
def seqment_match_long(pattern,saying):
saq_pat,rest=pattern[0],pattern[1:]
saq_pat=saq_pat.replace("?*","?")
if not rest: return (saq_pat,saying),len(saying)
for i, token in enumerate(saying):
if token==rest[0] and is_match(rest[1:],saying[(i+1):]):
return (saq_pat,saying[:i]),i
return (saq_pat,saying),len(saying)
def is_match(rest,saying):
if not rest and not saying:
return True
if not all(a.isalpha()for a in rest[0]):
return True
if rest[0]!=saying[0]:
return False
return is_match(rest[1:],saying[1:])
def match_var_pattern(pattern,saying):
if not pattern or not saying:return []
pat=pattern[0]
if is_variable(pat):
return [(pat,saying[0])]+ match_var_pattern(pattern[1:],saying[1:])
elif is_pattern_segment(pat):
match,index=seqment_match_long(pattern,saying)
return [match]+match_var_pattern(pattern[1:],saying[index:])
elif pat==saying[0]:
return match_var_pattern(pattern[1:],saying[1:])
else:
return False
match_var_pattern('?*P is very good and ?*X'.split(), "My dog is very good and my cat is very cute".split())
[('?P', ['My', 'dog']), ('?X', ['my', 'cat', 'is', 'very', 'cute'])]
三、模式对话
1、单个占位符或两个占位符对话模式
步骤:找到占位符所对应的内容===》进行内容替换===》并回答新的句子(response)
defined_patterns = {
"I need ?X": ["Image you will get ?X soon", "Why do you need ?X ?"],
"My ?X told me something": ["Talk about more about your ?X", "How do you think about your ?X ?"],
}
from random import choice
def get_response(saying, rules):
"""" please implement the code, to get the response as followings:
>>> get_response('I need iPhone')
>>> Image you will get iPhone soon
>>> get_response("My mother told me something")
>>> Talk about more about your monther.
"""
for rule in rules:
john_pat = pat_match_multivarible(rule.split(), saying.split())
if john_pat:
return ' '.join(subsitite(choice(rules[rule]).split(), pat_to_dict(john_pat)))
print(get_response('I need iPhone', defined_patterns))
print(get_response("My mother told me something", defined_patterns))
Image you will get iPhone soon
How do you think about your mother ?
2、占位符匹配片段内容的对话模式
def pat_to_dict(patterns):
return {k: ' '.join(v) if isinstance(v, list) else v for k, v in patterns}
response_pair = {
'I need ?*X': [
"Why do you neeed ?X"
],
"I dont like my ?*X": ["What bad things did ?X do for you?"]
}
def sequment_match(saying,response_pairs):
for question in response_pair:
pat=match_var_pattern(question.split(),saying.split())
if pat:
return ' '.join(subsitite(choice(response_pairs[question]).split(),pat_to_dict(pat)))
else:
continue
return '词库容量太小,没有该匹配模式'
print(sequment_match("I need an iPhone",response_pair))
print(sequment_match("I dont like my study hobbies",response_pair))
print(sequment_match("I like an apple and a banana",response_pair))
Why do you neeed an iPhone
What bad things did study hobbies do for you?
词库容量太小,没有该匹配模式
3、进阶版–主要是想确保saying和rule的形式完全一致
即,除了占位符,占位符所表示的内容,其他应该完全一样:因为这样可以避免占位符出现在句首时,而出现?匹配:(当出现在拒收,导致可能将saying的所有字符都通过占位符进行表示)
rules = {
"?*X hello ?*Y": ["Hi, how do you do?"],
"I was ?*X": ["Were you really ?X ?", "I already knew you were ?X ."]
}
def get_response(saying,rules):
for question in rules :
flag=True
for i in question.split():
if i.startswith("?*"):
continue
if i not in saying:
flag=False
if flag:
pat=match_var_pattern(question.split(),saying.split())
if pat:
return ' '.join(subsitite(choice(rules[question]).split(),pat_to_dict(pat)))
else:
continue
return '词库容量太小,没有该匹配模式'
print(get_response("I am Mike, hello ", rules))
print(get_response("I am Chen, hello nice to meet you!", rules))
print(get_response("I was happy", rules))
print(get_response("I am mike, hi", rules))
Hi, how do you do?
Hi, how do you do?
Were you really happy ?
词库容量太小,没有该匹配模式
4.占位符匹配中文,并且进行模拟对话
1、利用jieba进行分词 2、占位符匹配中文 3、并将回答中的占位符替换成占位符所表示的内容,并回答 代码如下(示例):
import jieba
import re
def get_response_chinese(saying, response_rules):
for rule in response_rules:
p = re.compile('[\u4e00-\u9fa5]+')
rule_cut = p.sub(lambda x:' ' + ' '.join(jieba.lcut(x.group())) + ' ', rule)
saying = ' '.join(jieba.lcut(saying))
flag = True
for i in rule_cut.split():
if i.startswith('?*'):
continue
if i not in saying:
flag = False
if flag:
john_pat = match_var_pattern(rule_cut.split(), saying.split())
if john_pat:
response_cut = p.sub(lambda x:' ' + ' '.join(jieba.lcut(x.group())) + ' ', choice(response_rules[rule]))
return ''.join(subsitite(response_cut.split(), pat_to_dict(john_pat))).replace(' ','')
chinese_rules = {
'?*x你好?*y': ['你好呀', '请告诉我你的问题'],
'?*x我想?*y': ['你觉得?y有什么意义呢?', '为什么你想?y', '你可以想想你很快就可以?y了'],
'?*x喜欢?*y': ['喜欢?y的哪里?', '?y有什么好的呢?', '你想要?y吗?'],
}
print(get_response_chinese("老师你好有问题请教", chinese_rules))
print(get_response_chinese("未来我想成为一名算法工程师", chinese_rules))
print(get_response_chinese("夏天我喜欢游泳", chinese_rules))
请告诉我你的问题
为什么你想成为一名算法工程师
游泳有什么好的呢?
总结
1、模式匹配中除了占位符,其余形式保持一致(单词),否则容易在占位符–》片段句子中,报错 2、正则表达式还是难点 3、中文匹配也是难点
|