简单网页制作:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>我要自学网</title>
</head>
<body>
<h1一级标题1>这是一级标题</h1一级标题1>
<h2>这是二级标题</h2>
<p>这是一段文字</p>
<img src="突破.jpg">
<div>
<ul>hello world</ul>
<ul>hello world</ul>
<ul>hello world</ul>
</div>
<h3>这是三级标题</h3>
<div id="list">
<p>这是一段文字</p>
<p>java</p>
<p class="hadoop">hadoop</p>
</div>
</body>
</html>
lxml插件爬取代码:
插件导入:pip3 install lxml
from lxml.html import fromstring
with open('C:/Users/Administrator/PycharmProjects/pythonProject4/index.html','r',encoding='utf-8') as f:
data=f.read()
selector = fromstring(data)
hl=selector.xpath('//h1/text()')[0]
p=selector.xpath('//body/p/text()')[0]
div_ul=selector.xpath('//div/ul/text()')[0]
#div_p=selector.xpath('//div[@id="list"]/p/text()')[1]
#div_p=selector.xpath('//div[@id="list"]/p[last()]/text()')[0]
div_p=selector.xpath('//div[@id="list"]/p[@class="hadoop"]/text()')[0]
pass
#C:\Users\Administrator\PycharmProjects\pythonProject4\venv
#C:/Users/Administrator/PycharmProjects/pythonProject4/index.html
如果提示报错:按一下操作
?
?
|