example_link = r'https://www.football-data.co.uk/data.php'

example_html = get_html(example_link, bs=False)

example_html[:10], example_html[-10:]

('<HTML>\n\n<H', '\n</HTML>\n\n')

For testing purposes we should reduce the amount of actual http traffic, so we'll cache sites that are only used for testing the library.

chache_html = cache(example_link, 'cache_example')

assert (CACHE_DIR/'cache_example').is_file()

example_bs = BeautifulSoup(example_html)

all_links = find_links_by_func(example_bs)
print(f'All links: {len(all_links)}')

All links: 209

absolute_in_site_links = find_links_by_func(example_bs, lambda link: 'www.football-data.co.uk' in link)
print(f'Absolute links in site: {len(absolute_in_site_links)}')

Absolute links in site: 102

print(f'return_href=True: {find_links_by_func(example_bs)[0]}')
print(f'return_href=False: {find_links_by_func(example_bs, return_href=False)[0]}')

return_href=True: https://www.football-data.co.uk/
return_href=False: <a href="https://www.football-data.co.uk/"><img alt="Football Betting - Football Results - Free Bets" border="0" src="https://www.football-data.co.uk/logo2.jpg"/></a>

number_links = find_links_by_pattern(example_bs, r'.+\d+.+')
print(f'Links containg at least one number: {len(number_links)}')

Links containg at least one number: 28

Scraping

`get_html`[source]

`cache`[source]

`find_links_by_func`[source]

`find_links_by_pattern`[source]

Scraping

get_html[source]

cache[source]

find_links_by_func[source]

find_links_by_pattern[source]

`get_html`[source]

`cache`[source]

`find_links_by_func`[source]

`find_links_by_pattern`[source]