Ridge Lasso Regression

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
    
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['model_pow_%d'%i for i in range(1,16)]
coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

for i in range(1, 16):
    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)

pd.options.display.float_format='{:,.2g}'.format

print(coef_matrix_simple)

It is clearly evident that the size of coefficients increase exponentially with increase in model complexity. I hope this gives some intuition into why putting a constraint on the magnitude of coefficients can be a good idea to reduce model complexity

Lets try to understand this even better.

What does a large coefficient signify? It means that we're putting a lot of emphasis on that feature, i.e. the particular feature is a good predictor for the outcome. when it becomes too large, the algorithm starts modelling intricate relations to estimate the output and ends up overfitting to the particular training data.

Ridge Rehgession:

parameter：alpha

I hope this gives some sense on how alpha would impact the magnitude of coefficients. One thing is for sure that any non-zero value would give values less than that of simple linear regression.

Keep in mind that normalizing the inputs is generally a good idea is every type of regression and should be used in case of ridge regression as well.

import numpy as np
import pandas as pd
import random
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge

rcParams['figure.figsize'] = 12, 10

x = np.array([i*np.pi/180 for i in range(60, 300, 4)])
np.random.seed(0)

y = np.sin(x)+np.random.normal(0,0.15, len(x))

data = pd.DataFrame(np.column_stack([x,y]), columns=['x', 'y'])
#plt.plot(data['x'], data['y'], '.')

for i in range(2,16):
    colname='x_%d'%i
    data[colname]=data['x']**i

def linear_regression(data, power, models_to_plot):
    predictors=['x']
    
    if power >= 2:
        predictors.extend(['x_%d' % i for i in range(2, power+1)])
    
    linreg = LinearRegression(normalize=True)
    linreg.fit(data[predictors], data['y'])
    y_pred = linreg.predict(data[predictors])
    
    if power in models_to_plot:
        plt.subplot(models_to_plot[power])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for power: %d'%power)
        
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([linreg.intercept_])
    ret.extend(linreg.coef_)
    return ret
def ridge_regression(data, predictors, alpha, models_to_plot={}):
    ridgereg = Ridge(alpha=alpha, normalize=True)
    ridgereg.fit(data[predictors], data['y'])
    y_pred = ridgereg.predict(data[predictors])
    
    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'], y_pred)
        plt.plot(data['x'], data['y'], '.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([ridgereg.intercept_])
    ret.extend(ridgereg.coef_)
    return ret
    
#col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
#ind = ['model_pow_%d'%i for i in range(1,16)]
#coef_matrix_simple = pd.DataFrame(index=ind, columns=col)
#models_to_plot = {1:231,3:232,6:233,9:234,12:235,15:236}

#for i in range(1, 16):
#    coef_matrix_simple.iloc[i-1,0:i+2] = linear_regression(data,power=i, models_to_plot=models_to_plot)
#
#pd.options.display.float_format='{:,.2g}'.format
#
#print(coef_matrix_simple)

predictors=['x']
predictors.extend(['x_%d'%i for i in range(2,16)])
alpha_ridge = [1e-15, 1e-10, 1e-8, 1e-4, 1e-3,1e-2, 1, 5, 10, 20]
col = ['rss','intercept'] + ['coef_x_%d'%i for i in range(1,16)]
ind = ['alpha_%.2g'%alpha_ridge[i] for i in range(0,10)]
coef_matrix_ridge = pd.DataFrame(index=ind, columns=col)
models_to_plot = {1e-15:231, 1e-10:232, 1e-4:233, 1e-3:234, 1e-2:235, 5:236}

for i in range(10):
    coef_matrix_ridge.iloc[i,] = ridge_regression(data, predictors, alpha_ridge[i], models_to_plot)

Ridge Lasso Regression

Here we can clearly observe that as the value of alpha increases, the model complexity reduces.

Lets have a look at the value of coefficients in the above models:

Ridge Lasso Regression

This straight away gives us the following inferences:

1. The RSS increases with increase in alpha, this model complexity reduces

2. An alpha as small as 1e-15 gives us significant reduction in magnitude of coefficients. How? Compare the coefficients in the first row of this table to the last row of simple linear regression table.

3. High alpha values can lead to significant underfitting. Note that rapid increase in RSS for values of alpha greater than 1

4. Though the coefficients are very very small, they are NOT zero.

Lasso:

def lasso_regression(data, predictors, alpha, models_to_plot={}):
    lassoreg = Lasso(alpha, normalize=True, max_iter=1e5)
    lassoreg.fit(data[predictors],data['y'])
    y_pred = lassoreg.predict(data[predictors])

    if alpha in models_to_plot:
        plt.subplot(models_to_plot[alpha])
        plt.tight_layout()
        plt.plot(data['x'],y_pred)
        plt.plot(data['x'],data['y'],'.')
        plt.title('Plot for alpha: %.3g'%alpha)
    rss = sum((y_pred-data['y'])**2)
    ret = [rss]
    ret.extend([lassoreg.intercept_])
    ret.extend(lassoreg.coef_)
    return ret

Notice the additional parameters defined in Lasso function——'max_iter'. This is the maximum number of iterations for which we want the model to run if it doesn't converge before. This exists for Ridge as well but setting this to a higher than default value was required in this case.

Ridge Lasso Regression

This again tells us that the model complexity decreases with increase in the values of alpha. But notice the straight line at alpha=1.

Ridge Lasso Regression

Apart from the expected inference of higher RSS for higher alphas, we can see the following:

1. For the same values of alpha, the coefficients of lasso regression are much smaller as compared to that of ridge regression(compare row 1 of the 2 tables)

2. For the same alpha, lasso has higher RSS(poor fit) as compared to ridge regression

3. Many of the coefficients are zero even for very small values of alpha.

Ridge Lasso Regression

继续阅读

简单文档分类——朴素贝叶斯算法朴素贝叶斯算法简单文档分类实例步骤总结朴素贝叶斯分类调用(sklearn)

【分类算法】什么是分类算法定义分类与聚类分类过程方法

分类算法的评价指标

K-近邻算法以及图像分类应用

weka之NB算法

使用weka的select attribute

weka中分类器算法

在weka中集成自己的算法

【多变量线性回归】学习记录序思路实现终

申请评分模型拒绝推断（RI）方法申请评分模型拒绝推断（RI）方法

【人工智能行业大师访谈1】吴恩达采访 Geoffery Hinton

【趋高机器视觉】机器视觉技术原理解析及解决方案

吴恩达 coursera ML 第七课总结+作业答案前言目录正文模型表示作业答案

XGBoost Plotting API以及GBDT组合特征实践 XGBoost Plotting API以及GBDT组合特征实践

解码器用于语义分割：数据依赖的解码可以实现灵活的特征聚合

2021-2025年中国运动疗法（KT）带行业市场供需与战略研究报告