A lot of the useful datasets are not in a ready-to-download format. Instead, they have to be collected over a variety of sub-pages. The following methods are utility for dealing with those kind of situations.
example_link = r'https://www.football-data.co.uk/data.php'

get_html[source]

get_html(url, encoding='utf-8', bs=True)

Get the html code for a given url. If bs=True (which is the default), return the parsed BeautifulSoup object instead.

example_html = get_html(example_link, bs=False)

example_html[:10], example_html[-10:]
('<HTML>\n\n<H', '\n</HTML>\n\n')

cache[source]

cache(url, cache_name)

For testing purposes we should reduce the amount of actual http traffic, so we'll cache sites that are only used for testing the library.

chache_html = cache(example_link, 'cache_example')

assert (CACHE_DIR/'cache_example').is_file()

find_links_by_func(html, func=None, return_href=True)

Iterate over all links of the given html-BeautifulSoup-object. Return a list of all links for which func returns True. If no func is given, return all links If return_href=False, return a list of BeautifulSoup link objects

example_bs = BeautifulSoup(example_html)

all_links = find_links_by_func(example_bs)
print(f'All links: {len(all_links)}')
All links: 209
absolute_in_site_links = find_links_by_func(example_bs, lambda link: 'www.football-data.co.uk' in link)
print(f'Absolute links in site: {len(absolute_in_site_links)}')
Absolute links in site: 102
print(f'return_href=True: {find_links_by_func(example_bs)[0]}')
print(f'return_href=False: {find_links_by_func(example_bs, return_href=False)[0]}')
return_href=True: https://www.football-data.co.uk/
return_href=False: <a href="https://www.football-data.co.uk/"><img alt="Football Betting - Football Results - Free Bets" border="0" src="https://www.football-data.co.uk/logo2.jpg"/></a>

find_links_by_pattern(html, pattern, return_href=True)

Iterate over all links of the given html-BeautifulSoup-object. Return a list of all links that match the given (regex)pattern. Patterns passed as string will be compiled to regex.

number_links = find_links_by_pattern(example_bs, r'.+\d+.+')
print(f'Links containg at least one number: {len(number_links)}')
Links containg at least one number: 28