[网络协议] 20220115学习记录-py4e lesson 12 HTTP 4-6

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 网络协议 -> 20220115学习记录-py4e lesson 12 HTTP 4-6 -> 正文阅读

[网络协议]20220115学习记录-py4e lesson 12 HTTP 4-6

1.ASCII（一个字节代表一个）是一种UTF8（动态的1-4字节代表一个）

2. 使用urllib请求网址后，就可以当作普通文件一样操作

***记得decode()

3.课后作业3：PY4E - Python for Everybody PY4E - Python for Everybody PY4E - Python for Everybody

题目：In this assignment you will write a Python program that expands on?http://www.py4e.com/code3/urllinks.py. The program will use?urllib?to read the HTML from the data files below, extract the href= vaues from the anchor tags, scan for a tag that is in a particular position relative to the first name in the list, follow that link and repeat the process a number of times and report the last name you find.

使用re获取网址结果超时timeout..

获取soup中所有锚点tags获得一个数组，直接使用[num]就可以调用第num项！

# To run this, download the BeautifulSoup zip file
# http://www.py4e.com/code3/bs4.zip
# and unzip it in the same directory as this file

import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import ssl
import re
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
count = 7
url = 'http://py4e-data.dr-chuck.net/known_by_Kayam.html'
for i in range(count):

    html = urllib.request.urlopen(url, context=ctx).read()
    soup = BeautifulSoup(html, 'html.parser')
    tags = soup('a')
    #url = re.findall('h.*html',tags[2].get('href', None))
    url = tags[17].get('href',None)
    print(url)

运行结果：

*Monty为我的正确答案（每个人不一样）