??????构造高级语言的词法分析程序,模拟词法分析过程。程序要求能对输入的字符串流进行词法分析,在实验的过程中,学会应用单词分析的方法——NFA(非确定有穷自动机)和DFA(确定有穷自动机),加深对词法分析原理的理解。
- 词法分析程序能够识别关键字、运算符号和定界符
- 识别其它单词是标识符(id)和整型常数(num),通过下列正规式定义。
id=letter(letter|digit)* num=digit digit* letter=a |…|z|A|…|Z ,digit=0|…|9 小写和大写字母是有区别的。 - 词法分析程序的功能
(1)输入为所给文法的源程序字符串。 (2)程序的输出形式为二元组单词串的输出形式。
示例: 测试数据: 输入的文件为: a.txt int main() {int abc=3; }
程序执行结果为: 单词 单词类别 int 关键字 main 关键字 ( 定界符 ) 定界符 { 定界符 abc 标识符 = 运算符 3 整数 } 定界符
import re
#关键字,百度百科上复制来的63个关键字……
key_word = ['asm','do','if','return','typedef','auto','double','inline','short','typeid','bool',
'dynamic_cast','int','signed','typename','break','else','long','sizeof','union','case',
'enum','mutable','static','unsigned','catch','explicit','namespace','static_cast',
'using','char','export','new','struct','virtual','class','extern','operator','switch',
'void','const','false','private','template','volatile','const_cast','float','protected',
'this','wchar_t','continue','for','public','throw','while','default','friend','register'
'true','delete','goto','reinterpret_cast','try','main']
#一些常用函数,不然老被识别为标识符,目前是16个
function_word = ['cin','cout','scanf','printf','abs','sqrt','isalpha','isdigit','tolower','toupper'
'strcpy','strlen','time','rand','srand','exit']
#运算符
operator = ['+','-','*','/',':',':=','<','<>','<=','>','>=','=','%']
#定界符
delimiters =[';','(',')','#','==','{','}',',','&','[',']',"'","."]
with open('cpp.txt', 'r') as file:
#预处理,增加了去除字符串的功能,毕竟字符串肯定不是标识符啊……
txt = ' '.join(file.readlines())
deal_txt = re.sub(r'/\*(.|[\r\n])*?\*/|//.*', ' ', txt)
deal_txt = re.sub(r'\"(.|[\r\n])*?\"', ' ', txt)
deal_txt = deal_txt.strip()
deal_txt = deal_txt.replace('\t', ' ').replace('\r', ' ').replace('\n', ' ')
#词法分析,标识符识别规则加入了_
keyword = []
funword = []
opeword = []
idword = []
numword = []
deword=[]
errword = []
#把不同类型字符分类
#标识符型
pha = re.findall(r'[a-zA-Z_][a-zA-Z0-9_]*', deal_txt)
#数字型
num = re.findall(r'\d+',deal_txt)
#符号型
str = re.findall(r'[^\w]', deal_txt)
#从标识符型里找出关键字、保留字、自定义标识符
for p in pha:
if p in key_word:
keyword.append(p)
elif p in function_word:
funword.append(p)
else:
idword.append(p)
#找出数字型
for n in num:
numword.append(n)
#找出运算符和分隔符
for s in str:
if s in operator:
opeword.append(s)
elif s in delimiters:
deword.append(s)
elif s != ' ':
errword.append({s : 'ERROR'})
print("关键字:\n", keyword)
print('函数:\n', funword)
print("标识符:\n", idword)
print("数字:\n", numword)
print("运算符:\n", opeword)
print("界符:\n",deword)
if len(errword) != 0:
print("其他:\n", errword)
运行结果:
|