flexible_linear module¶
Regularized linear regression with custom training and regularization costs.
FlexibleLinearRegression
is a scikit-learn-compatible linear
regression estimator that allows specification of arbitrary
training and regularization cost functions.
For a linear model:
this model attempts to find \(W\) by minimizing:
for given training data \(X, y\). Here \(C\) is the regularization strength and \(\textrm{cost}\) and \(\textrm{reg_cost}\) are customizable cost functions (e.g., the squared \(\ell^2\) norm or the \(\ell^1\) norm).
Note: In reality, we fit an intercept (bias coefficient) as well. Think of \(X\) in the above as having an extra column of 1’s.
Ideally, the cost functions should be convex and continuously differentiable.
We provide some cost functions: see l1_cost_func()
,
l2_cost_func()
, japanese_cost_func()
(or the
cost_func_dict
dictionary). If you want to use a
custom cost function, it should be of the form:
def custom_cost_func(z, **opts):
# <code to compute cost and gradient>
return cost, gradient
where cost is a float, gradient is an array of the same dimensions as z, and you may specify any number of keyword arguments.
-
exception
flexible_linear.
FitError
(message, res)[source]¶ Bases:
Exception
Exception raised when fitting fails.
-
message
¶ str – Error message.
-
res
¶ scipy.optimize.OptimizeResult – Results returned by scipy.optimize.minimize. See SciPy documentation on OptimizeResult for details.
-
-
class
flexible_linear.
FlexibleLinearRegression
(C=1.0, cost_func='l2', cost_opts=None, reg_cost_func='l2', reg_cost_opts=None)[source]¶ Bases:
sklearn.base.BaseEstimator
Regularized linear regression with custom training/regularization costs.
Parameters: - C (Optional[float]) – Nonnegative regularization coefficient. (Zero means no regularization.)
- cost_func (Optional[callable or str]) – Training cost function. If not callable, should be one of ‘l1’, ‘l2’, or ‘japanese’.
- cost_opts (Optional[dict]) – Parameters to pass to cost_func.
- reg_cost_func (Optional[callable or str]) – Regularization cost function. If not callable, should be one of ‘l1’, ‘l2’, or ‘japanese’.
- reg_cost_opts (Optional[dict]) – Parameters to pass to reg_cost_func.
-
coef_
¶ ndarray – Weight matrix of shape (n_features+1,). (coef_[0] is the intercept coefficient.)
-
flexible_linear.
cost_func_dict
= {'l2': <function l2_cost_func at 0x1068b1d08>, 'japanese': <function japanese_cost_func at 0x1080c9ea0>, 'l1': <function l1_cost_func at 0x1068d0488>}¶ Dictionary of implemented cost functions.
-
flexible_linear.
japanese_cost_func
(z, eta=0.1)[source]¶ ‘Japanese bracket’ cost and gradient
Computes cost and gradient for the cost function:
\[\mathrm{cost}(z) = \frac{\eta^2}{n} \sum_{i=1}^n \left( \sqrt{ 1 + \left( \frac{z_i}{\eta} \right)^2 } - 1 \right) \,.\]This cost function interpolates componentwise between the squared \(\ell^2\) norm (for \(|z_i| \ll \eta\)) and the \(\ell^1\) norm (for \(|z_i| \gg \eta\)) and is thus useful for reducing the impact of outliers (or when dealing with heavy-tailed rather than Gaussian noise). Unlike the \(\ell^1\) norm, this cost function is smooth.
The key to understanding this is that the Japanese bracket
\[\langle z \rangle := \sqrt{ 1 + |z|^2 }\]satisfies these asymptotics:
\[\begin{split}\sqrt{ 1 + |z|^2 } - 1 \approx \begin{cases} \frac12 |z|^2 & \text{for $|z| \ll 1$} \\ |z| & \text{for $|z| \gg 1$} \end{cases} \,.\end{split}\]Parameters: - z (ndarray) – Input vector.
- eta (Optional[float]) – Positive scale parameter.
Returns: Tuple[float, ndarray] – The cost and gradient (same shape as z).
-
flexible_linear.
l1_cost_func
(z)[source]¶ Normalized \(\ell^1\) cost and gradient
\[\mathrm{cost}(z) = \frac{1}{n} ||z||_{\ell^1} = \frac{1}{n} \sum_{i=1}^n |z_i| \,.\]Note
This cost is not differentiable. For a smooth alternative, see
japanese_cost_func()
.Parameters: z (ndarray) – Input vector. Returns: Tuple[float, ndarray] – The cost and gradient (same shape as z).