[python每日一练]--0009:找出html里的链接

发表于 10-20-2017 | 分类于编程， python ， python每日一练 |

159 字 | 1 分钟

| 本文被看了次

题目链接：https://github.com/Show-Me-the-Code/show-me-the-code
我的github链接：https://github.com/wjsaya/python_spider_learn/tree/master/python_daily
个人博客地址：https://wjsaya.github.io
第 0009 题：一个HTML文件，找出里面的链接。

思路：

打开html文件；
逐行读取文件;
通过正则表达式匹配http://之类的开头的链接即可。

代码：

#!/usr/bin/env python3
#coding: utf-8
#Auther: wjsaya
#第009题，一个HTML文件，找出里面的链接。
import re
import os
def analyze(file_name):
    #print (os.listdir())
    print (os.getcwd())
    line = open(file_name,'r',encoding='utf-8').read()
    R = (r'([hftps]+://[^\s]*)"')
    for i in  (re.findall(R, line)):
        print (i)
if __name__ == "__main__": 
    html = "./test.html"
    analyze(html)

效果图：

wjsaya

人生如逆旅，我亦是行人。

Github CSDN Mail

1. 思路：
2. 代码：
3. 效果图：

0%