A lot of the useful datasets are not in a ready-to-download format. Instead, they have to be collected over a variety of sub-pages. The following methods are utility for dealing with those kind of situations.
example_link = r'https://www.football-data.co.uk/data.php'
example_html = get_html(example_link, bs=False)
example_html[:10], example_html[-10:]
For testing purposes we should reduce the amount of actual http traffic, so we'll cache sites that are only used for testing the library.
chache_html = cache(example_link, 'cache_example')
assert (CACHE_DIR/'cache_example').is_file()
example_bs = BeautifulSoup(example_html)
all_links = find_links_by_func(example_bs)
print(f'All links: {len(all_links)}')
absolute_in_site_links = find_links_by_func(example_bs, lambda link: 'www.football-data.co.uk' in link)
print(f'Absolute links in site: {len(absolute_in_site_links)}')
print(f'return_href=True: {find_links_by_func(example_bs)[0]}')
print(f'return_href=False: {find_links_by_func(example_bs, return_href=False)[0]}')
number_links = find_links_by_pattern(example_bs, r'.+\d+.+')
print(f'Links containg at least one number: {len(number_links)}')