All things feature engineering
%load_ext autoreload
%autoreload 2
import torch
import numpy as np
df = read_football_csv(Path('../data/football_data_uk/raw/germany/D1_1415.csv'))
df.head()
teams = extract_teams(df)
teams
There should be n_teams/2 games per gameday.
df = add_gamedays(df)
- 3 points for the winning team
- 0 points for the losing team
- 1 point for each team when they draw
- Extract all playing teams (sanity check)
- current_points: start with 0 points for each team
- home/away_points: empty list
- Go trough the matches in order of date (=time of kickoff)
- note current_points for each team
- update current points according to match outcome
- add columns for home/away points
df = add_points(df)
df.tail()
df = add_positions(df)
df.head()
df = add_simple_diffs(df)
df.tail()
df.head()
result_df = pd.DataFrame(columns=['home', 'away'])
result_df['home'] = [1,2,3,0]
result_df['away'] = [2,1,3,0]
result_df['result'] = results_from_goals(result_df, 'home', 'away')
result_df.head()
assert (result_df.result.values == [1,-1,0,0]).all()
odds_df = result_df.copy()
odds_df[['odds_home', 'odds_draw', 'odds_away']] = df.loc[:3, ['B365H', 'B365D', 'B365A']]
odds_df.head()
create_profit_df(odds_df, 'odds_home', 'odds_draw', 'odds_away')
add_profit_cols(odds_df, 'odds_home', 'odds_draw', 'odds_away')
df = pd.DataFrame([[1,2,3], [4,5,6]], columns=['a', 'b', 'c'])
df
normalize_by_args(df.a, 5, 2.)
normalize_col(df.a)
ColumnNormalizer(df.loc[:,['a', 'b']]), ColumnNormalizer.from_df(df, ['a','b'])
norm = ColumnNormalizer.from_df(df, ['a','b'])
df.loc[:, norm.names] = norm(df.loc[:, norm.names])
df
assert df.loc[:, ['a', 'b']].values.mean()==0., 'ColumnNormalizer should produce 0 mean.'
assert df.loc[:, ['a', 'b']].values.std()==1., 'ColumnNormalizer should produce 1. std.'
Computes the outcome of a betting allocation. The loss function gets negated to make minimizing achieve the goal we're aiming for.
odds_df = pd.DataFrame([[3.2, 3.25, 2.29], [1.91, 3.3, 3.93]], columns=['home', 'draw', 'away'])
odds_df.head()
allocations = torch.tensor([[.5, .2, .3], [.8, .05, .15]])
odds_loss(allocations, odds_df.values), odds_profit(allocations, odds_df.values)