5. Regularized Linear Regression in Theano

The aim of this IPython notebook is to show some features of the Python Theano library in the field of machine learning. It has been developped by the LISA group at the University of Montreal (see: http://deeplearning.net/software/theano/). The notebook also relies on other standard Python libraries such as numpy, pandas and matplotlib.

To exemplify the use of Theano, this notebook solves the assignments of the Machine Learning MOOC provided by Coursera (see: https://www.coursera.org/learn/machine-learning) and performed in Stanford University by Andrew Ng (see: http://www.andrewng.org/).

The original MOOC assignments should to be programmed with the Octave language (see: https://www.gnu.org/software/octave/).The idea with this notebook is to provide Python developpers with interesting examples programmed using Theano.

This notebook has been developped using the Anaconda Python 3.4 distribution provided by Continuum Analytics (see: https://www.continuum.io/). It requires the Jupyter Notebook (see: http://jupyter.org/).

5.1 Regularized Linear Regression

5.1.1 Visualizing the dataset

In [1]:
import theano
import numpy as np
import theano.tensor as T
from theano.tensor.nnet import sigmoid
import scipy.io as sio
data = sio.loadmat('data/ex5data1.mat')
X, Y = data['X'], data['y'].T[0]
Xval, Yval = data['Xval'], data['yval'].T[0]
Xtest, Ytest = data['Xtest'], data['ytest'].T[0]

mu = X.mean()
sigma = X.std()
In [2]:
import matplotlib.pyplot as plt
%matplotlib inline
plt.figure(figsize=(8,6))
plt.scatter(X, Y, s=40, marker='x', color='r')
plt.xlabel('Change in water level (x)', fontsize=9)
plt.ylabel('Water flowing out of the dam (y)', fontsize=9)
plt.axis((-50, 40, 0, 40));

5.1.2 Regularized linear regression cost function and gradient

In [3]:
m = X.shape[0]
X1 = np.c_[np.ones(m), X]
X1 = X1.T

# Choose some alpha and lambda values
alpha = 0.1
lambda_ = 1.0

# Init Theta and Run Gradient Descent 
t = np.ones(2)
theta = theano.shared(t,name='theta')

x = T.matrix('x')
y = T.vector('y')

prediction = T.dot(theta,x)
reg = lambda_ * T.sum(theta[1:]**2) /(2*m)
cost = T.sum(T.pow(prediction-y,2))/(2*m) + reg
grad = T.grad(cost,theta)

train = theano.function([x, y], [cost, grad], updates = [(theta,theta-alpha*grad)])

costM, gradM = train(X1, Y)
print('Cost at theta = [1 ; 1]: %f\n' % costM)
print('Gradient at theta = [1 ; 1]:\n')
print(gradM)
Cost at theta = [1 ; 1]: 303.993192

Gradient at theta = [1 ; 1]:

[ -15.30301567  598.25074417]

5.1.3 Fitting linear regression

In [4]:
def linear_fit(X, Y, alpha, lambda_, iterations):
    m = Y.shape[0]
    
    # normalize data
    mu = X.mean()
    sigma = X.std()
    if sigma == 0:
        sigma = 1
    
    Xnorm = np.c_[np.ones(m), (X - mu)/sigma] # Add intercept term to X
    Xnorm = Xnorm.T

    # Init Theta and Run Gradient Descent 
    t = np.ones(Xnorm.shape[0])
    theta = theano.shared(t,name='theta')

    x = T.matrix('x')
    y = T.vector('y')

    prediction = T.dot(theta,x)
    reg = lambda_ * T.sum(theta[1:]**2) /(2*m)
    cost = T.sum(T.pow(prediction-y,2))/(2*m) + reg
    grad = T.grad(cost, theta)

    train = theano.function([x, y], cost, updates = [(theta, theta-alpha*grad)])
    test = theano.function([x], prediction)

    for i in range(iterations):
        costM = train(Xnorm, Y)
        if i==0:
            costM0 = costM

    return(costM0, costM, theta.get_value(), test)
In [5]:
lambda_ = 0.0
costM0, costM, theta, test = linear_fit(X, Y, 0.1, lambda_, 5000)
print("Cost(%i):" % 0)
print(costM0)
print("\nCost(%i):" % 5000)
print(costM)
print(theta)
Cost(0):
120.18541225445381

Cost(5000):
22.373906495108912
[ 11.21758933  10.5511193 ]
In [6]:
# Plot the linear fit
a = np.linspace(-48,38,2)
b = test(np.matrix([np.ones(a.shape), (a - mu)/sigma]))
plt.figure(figsize=(8,6))
plt.plot(a,b)
plt.scatter(X, Y, s=40, marker='x', color='r')
plt.xlabel('Change in water level (x)', fontsize=9)
plt.ylabel('Water flowing out of the dam (y)', fontsize=9)
plt.axis((-50, 40, -5, 40));

5.2 Bias-variance

5.2.1 Learning curves

In [7]:
m_val = Xval.shape[0]

error_train = np.zeros(m)
error_val = np.zeros(m)

# normalize val data
mu_val = Xval.mean()
sigma_val = Xval.std()

Xnorm_val = np.c_[np.ones(m_val), (Xval - mu_val)/sigma_val] # Add intercept term to X
Xnorm_val = Xnorm_val.T

print('# Training Examples\tTrain Error\tCross Validation Error')
for j in range(0, m):
    costM0, costM, theta, test = linear_fit(X[:j+1], Y[:j+1], 0.1, 0.0, 5000)

    error_train[j] = costM
    error_val[j] = np.sum((test(Xnorm_val)-Yval)**2)/(2*m_val)
    
    print('\t%i\t\t%f\t%f' % (j+1, costM, error_val[j]))
# Training Examples	Train Error	Cross Validation Error
	1		0.000000	129.097923
	2		0.000000	139.329200
	3		3.286595	40.663675
	4		2.842678	66.320945
	5		13.154049	46.151639
	6		19.443963	37.629229
	7		20.098522	32.889251
	8		18.172859	31.982048
	9		22.609405	31.102812
	10		23.261462	31.846804
	11		24.317250	32.067095
	12		22.373906	29.778369
In [8]:
# Plot the learning curve
fig = plt.figure(figsize=(8,6))
plt.plot(range(1,m+1),error_train, label='Train')
plt.plot(range(1,m+1),error_val, label='Cross Validation')
fig.suptitle('Learning curve for linear regression')
plt.xlabel('Number of training examples', fontsize=9)
plt.ylabel('Error', fontsize=9)
plt.legend(loc=1, fontsize=9)
plt.yticks(range(0,200,50))
plt.axis((0, 13, 0, 150));

5.3 Polynomial regression

In [9]:
# map X to X, X^2, X^3... X^p
def poly_features(X, p):
    poly = np.concatenate([np.power(X, i) for i in range(1, p+1)])
    return poly.reshape(p, X.shape[0]).T

5.3.1 Learning Polynomial Regression

In [10]:
def poly_fit(X, Y, power, alpha, lambda_, iterations):
    m = Y.shape[0]
    
    # normalize data
    mu = X.mean()
    sigma = X.std()
    if sigma == 0:
        sigma = 1
    
    # applying polynomial features
    Xpoly = poly_features((X - mu)/sigma, power)

    Xnorm = np.c_[np.ones(m), Xpoly] # Add intercept term to X
    Xnorm = Xnorm.T

    # Init Theta and Run Gradient Descent 
    t = np.ones(Xnorm.shape[0])
    theta = theano.shared(t,name='theta')

    x = T.matrix('x')
    y = T.vector('y')

    prediction = T.dot(theta,x)
    reg = lambda_ * T.sum(theta[1:]**2) /(2*m)
    cost = T.sum(T.pow(prediction-y,2))/(2*m) + reg
    grad = T.grad(cost, theta)

    train = theano.function([x, y], cost, updates = [(theta, theta-alpha*grad)])
    test = theano.function([x], prediction)

    for i in range(iterations):
        costM = train(Xnorm, Y)
        if i==0:
            costM0 = costM

    return(costM0, costM, theta.get_value(), test)
In [11]:
lambda_ = 0.0
costM0, costM, theta, test = poly_fit(X, Y, 8, 0.01, lambda_, 5000)
print("Cost(%i):" % 0)
print(costM0)
print("\nCost(%i):" % 5000)
print(costM)
print(theta)
Cost(0):
93.06341791682976

Cost(5000):
0.3691919119459248
[ 3.24244639  6.8669033   6.78717721  3.78369813  3.2658854   0.63889527
  0.07932768 -0.68892754 -0.66419387]
In [12]:
# Plot the polynomial fit
a = np.linspace(-60,60,30)
a_poly = poly_features((a - mu)/sigma, 8)
b = test(np.c_[np.ones(a_poly.shape[0]), a_poly].T)
fig = plt.figure(figsize=(8,6))
plt.plot(a,b, ls='--')
plt.scatter(X, Y, s=40, marker='x', color='r')
fig.suptitle('Polynomial Regression Fit (lambda = %f)' % lambda_)
plt.xlabel('Change in water level (x)', fontsize=9)
plt.ylabel('Water flowing out of the dam (y)', fontsize=9)
plt.axis((-80, 80, -60, 40));

5.3.2 Learning curve

In [13]:
def learning_curve(X, Y, Xval, Yval, power, alpha, lambda_, iterations):
    m = Y.shape[0]
    m_val = Yval.shape[0]

    error_train = np.zeros(m)
    error_val = np.zeros(m)

    # normalize val data
    mu_val = Xval.mean()
    sigma_val = Xval.std()

    # applying polynomial features
    Xpoly_val = poly_features((Xval - mu_val)/sigma_val, power)

    Xnorm_val = np.c_[np.ones(m_val), Xpoly_val] # Add intercept term to X
    Xnorm_val = Xnorm_val.T

    print('# Training Examples\tTrain Error\tCross Validation Error')
    for j in range(0, m):

        costM0, costM, theta, test = poly_fit(X[:j+1], Y[:j+1], power, alpha, lambda_, iterations)

        error_train[j] = costM
        error_val[j] = np.sum((test(Xnorm_val)-Yval)**2)/(2*m_val)
        
        print('\t%i\t\t%f\t%f' % (j+1, costM, error_val[j]))

    return error_train, error_val
In [14]:
lambda_ = 0.0
error_train, error_val = learning_curve(X, Y, Xval, Yval, 8, 0.001, lambda_, 5000)
# Training Examples	Train Error	Cross Validation Error
	1		0.000029	383.258247
	2		0.000000	66.507463
	3		0.029774	116.626693
	4		0.654080	8664.305130
	5		0.526251	1935.376911
	6		0.414557	287.712752
	7		2.426150	107.659949
	8		1.542915	5.534276
	9		1.566069	7.910292
	10		1.030112	16.106324
	11		1.008916	13.908059
	12		1.015601	37.320681
In [15]:
# Plot the learning curve
fig = plt.figure(figsize=(8,6))
plt.plot(range(1,m+1),error_train, label='Train')
plt.plot(range(1,m+1),error_val, label='Cross Validation')
fig.suptitle('Polynomial Regression Learning Curve (lambda =%f)' % lambda_)
plt.xlabel('Number of training examples', fontsize=9)
plt.ylabel('Error', fontsize=9)
plt.legend(loc=1, fontsize=9)
plt.yticks(range(0,200,50))
plt.axis((0, 13, 0, 150));

5.3.3 Adjusting the regularization parameter

$\lambda = 5.0$

In [16]:
lambda_ = 5
costM0, costM, theta, test = poly_fit(X, Y, 8, 0.01, lambda_, 5000)
print("Cost(%i):" % 0)
print(costM0)
print("\nCost(%i):" % 5000)
print(costM)
print(theta)
Cost(0):
94.73008458349643

Cost(5000):
9.024390160364831
[ 6.0580415   2.84653916  1.37886676  1.91278328  1.18814773  1.08509871
  0.77978527 -0.06975348 -0.1270549 ]
In [17]:
# Plot the polynomial fit
a = np.linspace(-80,60,40)
a_poly = poly_features((a - mu)/sigma, 8)
b = test(np.c_[np.ones(a_poly.shape[0]), a_poly].T)
fig = plt.figure(figsize=(8,6))
plt.plot(a,b, ls='--')
plt.scatter(X, Y, s=40, marker='x', color='r')
fig.suptitle('Polynomial Regression Fit (lambda = %f)' % lambda_)
plt.xlabel('Change in water level (x)', fontsize=9)
plt.ylabel('Water flowing out of the dam (y)', fontsize=9)
plt.axis((-80, 80, 0, 160));
In [18]:
error_train, error_val = learning_curve(X, Y, Xval, Yval, 8, 0.001, lambda_, 5000)
# Training Examples	Train Error	Cross Validation Error
	1		0.000029	138.928932
	2		0.052556	121.175907
	3		2.320395	223.317566
	4		43.093353	2344.659492
	5		12.621665	2121.040131
	6		3.571854	533.258009
	7		9.584542	461.437453
	8		6.809788	68.652716
	9		5.941391	33.108837
	10		4.838822	19.812748
	11		4.642234	18.235402
	12		9.078702	6.612376
In [19]:
# Plot the learning curve
fig = plt.figure(figsize=(8,6))
plt.plot(range(1,m+1),error_train, label='Train')
plt.plot(range(1,m+1),error_val, label='Cross Validation')
fig.suptitle('Polynomial Regression Learning Curve (lambda =%f)' % lambda_)
plt.xlabel('Number of training examples', fontsize=9)
plt.ylabel('Error', fontsize=9)
plt.legend(loc=1, fontsize=9)
plt.yticks(range(0,200,50))
plt.axis((0, 13, 0, 150));

$\lambda = 100.0$

In [20]:
lambda_ = 100
costM0, costM, theta, test = poly_fit(X, Y, 8, 0.001, lambda_, 5000)
print("Cost(%i):" % 0)
print(costM0)
print("\nCost(%i):" % 5000)
print(costM)
print(theta)
Cost(0):
126.3967512501631

Cost(5000):
19.122878705382597
[ 6.92875227  0.33874363  0.12510936  0.34657593  0.1597584   0.45847537
  0.23870829  0.73582137  0.40970502]
In [21]:
# Plot the polynomial fit
a = np.linspace(-80,60,40)
a_poly = poly_features((a - mu)/sigma, 8)
b = test(np.c_[np.ones(a_poly.shape[0]), a_poly].T)
fig = plt.figure(figsize=(8,6))
plt.plot(a,b, ls='--')
plt.scatter(X, Y, s=40, marker='x', color='r')
fig.suptitle('Polynomial Regression Fit (lambda = %f)' % lambda_)
plt.xlabel('Change in water level (x)', fontsize=9)
plt.ylabel('Water flowing out of the dam (y)', fontsize=9)
plt.axis((-80, 80, -10, 40));

5.5.4 Selecting $\lambda$ using a cross validation set

In [22]:
lambda_set = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0]
m = len(lambda_set)

error_train = np.zeros(m)
error_val = np.zeros(m)

# normalize val data
mu_val = Xval.mean()
sigma_val = Xval.std()

# applying polynomial features
Xpoly_val = poly_features((Xval - mu_val)/sigma_val, 8)

Xnorm_val = np.c_[np.ones(m_val), Xpoly_val] # Add intercept term to X
Xnorm_val = Xnorm_val.T

print('lambda\t\tTrain Error\tValidation Error')

for j, lambda_ in enumerate(lambda_set):
    costM0, costM, theta, test = poly_fit(X, Y, 8, 0.001, lambda_, 5000)
    
    error_train[j] = costM
    error_val[j] = np.sum((test(Xnorm_val)-Yval)**2)/(2*m_val)
        
    print('%f\t%f\t%f' % (lambda_, costM, error_val[j]))
lambda		Train Error	Validation Error
0.000000	1.015601	37.320681
0.001000	1.019188	37.294470
0.003000	1.026356	37.242103
0.010000	1.051396	37.059382
0.030000	1.122501	36.542117
0.100000	1.366370	34.786548
0.300000	2.022605	30.214017
1.000000	3.923640	18.478996
3.000000	7.256134	6.217888
10.000000	11.405604	16.272374
In [23]:
# Plot the learning curve
fig = plt.figure(figsize=(8,6))
plt.plot(lambda_set,error_train, label='Train')
plt.plot(lambda_set,error_val, label='Cross Validation')
plt.xlabel('lambda', fontsize=9)
plt.ylabel('Error', fontsize=9)
plt.legend(loc=1, fontsize=9)
plt.xticks(range(0,11,1))
plt.yticks(range(0,51,5))
plt.axis((0, 10, 0, 50));

5.5.5 Computing test set error

In [24]:
lambda_set = [0, 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0]
m = len(lambda_set)

m_test = Ytest.shape[0]

error_train = np.zeros(m)
error_test = np.zeros(m)

# normalize val data
mu_test = Xtest.mean()
sigma_test = Xtest.std()

# applying polynomial features
Xpoly_test = poly_features((Xtest - mu_test)/sigma_test, 8)

Xnorm_test = np.c_[np.ones(m_test), Xpoly_test] # Add intercept term to X
Xnorm_test = Xnorm_test.T

print('lambda\t\tTrain Error\tTest Error')

for j, lambda_ in enumerate(lambda_set):
    costM0, costM, theta, test = poly_fit(X, Y, 8, 0.001, lambda_, 5000)
    
    error_train[j] = costM
    error_test[j] = np.sum((test(Xnorm_test)-Ytest)**2)/(2*m_test)
        
    print('%f\t%f\t%f' % (lambda_, costM, error_test[j]))
lambda		Train Error	Test Error
0.000000	1.015601	23.018853
0.001000	1.019188	23.008392
0.003000	1.026356	22.987488
0.010000	1.051396	22.914531
0.030000	1.122501	22.707833
0.100000	1.366370	22.004412
0.300000	2.022605	20.156724
1.000000	3.923640	15.259358
3.000000	7.256134	9.241195
10.000000	11.405604	10.043021