LARS Library: Least Angle Regression laSso Library

** Frank Vanden Berghen
**

Let's assume that we want to perform a simple identification task. Let's assume that you have a set of measures . We want to find a function (or a

- is the
*regressor vector*. - is the
*input*. - is the vector of independent variables.

- is the
*target*. - is the
*output*. - ( is the dependant variable).

- is an
*input**output*pair. - is a
*sample*.

(1) | |||

We want to compute such that . Using matrix notation, is the solution of:

where is a matrix containing on each line a different

TPError |
(3) |

If we define the total prediction error

Since is a scalar, we can take its transpose without changing its value: . We obtain:

is minimum | |||

(4) |

The equation 4 is the well-known

We did previously the assumption that the constant term of the model is zero. is null when and where is the mean operator and is the column of . Thus, in the general case, when , the procedure to build a model is:

- Compute where is the ``standard deviation'' operator.
- Normalize the target:
- Normalize the columns of :
- Since the columns of have zero mean, and since has zero mean, we have . We can thus use the
*normal equations:*to find . All the columns inside have a standard deviation of one. This increase numerical stability. - Currently,
has been computed on the ``normalized columns''
and . Let's convert back the model so that it can be applied on the
un-normalized and :

One simple way to solve the

- Compute a Cholesky factorization of . i.e. Compute such that is a lower triangular matrix and such that .
- Let's define . Then we have: (the last equality comes from equation 4). This last equation ( )can be solved easily because is triangular. Once we have found , we solve to find (this is also easy because is triangular).

- A column of is a linear combination of other columns.
- The rank of A is lower than .
- The matrix is not of full rank .

The blue points inside figures 1 and 2 are the samples. The green points are the lines of (the

- Inside figure 1, the 'information' that is contained in is duplicated in . We can simply drop one of the two variables. We obtain the models or . The main difficulty here is to select carefully the columns that must be dropped. The solution is the ``LARS/Lasso Algorithm'' described later in this paper.
- The other choice is to keep inside the model
all the variables (
and
). However, in this case, we must be sure that:
- and are small values (I remind you that and can become arbitrarily large).

is the solution of(5)

This technique is named ``Ridge regression''. When increases, all the are ``equilibrated'' and ``pushed'' to zero as illustrated in figure 3.**Figure 3:**The 's sorted from greatest to smallest for different value of .

regression'' technique is in fact searching for that minimizes Squared Error under the constraint that . This definition is illustrated on the left of figure 4. We seek the solution of the minimization problem:

subject to

(with and ) To solve this minimization problem, we first have to rewrite
the constraint:
. Now, we introduce
a Lagrange multiplier and the Lagrange function:
(6) |

subject to | (7) |

As you can see, computing different 's (with a different number of null 's) involves solving several times equation 7 for different values of . Equation 7 is a QP (Quadratic Programming) problem and is not trivial to solve at all. The Lasso algorithm, as it is described here, is thus very demanding on computer resources and is nearly of no practical interest. Hopefully, there exist another, more indirect, way to compute at a very little cost all the solutions of the Lasso Algorithm for all the values of . A modification of the LARS algorithm computes all the Lasso solution in approximatively the time needed to compute and solve the simple ``normal equations'' (equation 4) (the algorithmic

The LARS algorithm is a refinement of the FOS (Fast Orthogonal Search) algorithm. We will start by describing the FOS algorithm. The FOS (and the LARS algorithm) is a forward stepwise algorithm. i.e. FOS is an iterative algorithm that, at each iteration, includes inside its model a new variable. The set of variables inside the model at iteration is the ``active set'' . The set of variables outside the model is the ``inactive set'' . is a subset of column of containing only the active variables at iteration . is a subset of column of containing only the inactive variables at iteration .

Let's first illustrate the equation 2: . Equation 2 can be rewritten:

(8) |

- Let's define the residual prediction error at step as . We have . Normalize the variables and the target to have
and . Set , the iteration counter.
**Figure 7:**Illustration of the FOS algorithm

- This step is illustrated in figure 7, Part 1.
We will add inside
the variable that is ``the most in the direction of ''. i.e. the ``angle'' between and is smaller than all the other angles () between the 's and . Using the well-know relation:
(9) where (10)

Thus the algorithm is: ``compute . Find such that(11)

- This step is illustrated in figure 7, Part 2.
We now compute , the length of the ``step'' in the direction (inside the example
*j=1*). Let's project (orthogonally) inside the space that is ``spanned'' by the active variables. The projection of is named . We obtain since is normalized. If the projection operator fails ( is

inside the space spanned by the active set of variables), remove from the active set and go to step 2. - Deflation: update of the Residual Error: .
- If is empty then stop, otherwise increment and go to step 2.

- It's ultra fast! Furthermore, the computation time is mainly proportional
to . If , then you will go a lot faster than when using the standard
normal equations.

- The algorithm will ``avoid'' naturally the correlated columns. I remind
you that a model that is using two strongly correlated columns is likely to
be unstable. If column and are strongly correlated, they are both ``pointing'' to the same direction
(the angle between them is small: see equation 9
and figure 6) . If you include inside the active
set the column , you will obtain, after the deflation, a new error that will prevent you to choose at a later iteration. One interpretation is the following: ``All
the information contained inside the direction has been used. We are now searching for other directions to reach
''. It is possible that, at the very end of the procedure, the algorithm
still tries to use column . In this case, the projection operator used at step 3 will fail
and the variable will be definitively dropped. Models produced with the FOS algorithm
will thus be usually very stable.

- The selection of the column that enters the active set is based on the ``residual
error'' at step (with ). This selection is thus exploiting the information contained
inside the target . The ``target information'' is not used by other algorithms that
are based on SVD of or QR factorization of . The FOS algorithm will thus performs a better variable selection
than SVD- or QR-based algorithm. Many other algorithms do not use any deflation.
Deflation is important because it allows us to search for variables that explains
the part of the target that is still un-explained (the residual error ). A good variable selection is useful when .

- The memory consumption is very small. To be able to use the FOS algorithm,
you only need to be able to store in memory the target , one line of , and a dense triangular matrix of dimension Other algorithms based on QR factorization of requires to have the full matrix inside memory because the memory space used to store is used during the QR factorization to store temporary, intermediate
results needed for the factorization. Algorithms based on SVD of are even more memory hungry. This is a major drawback for most advanced
applications, especially in Econometrics where we often have
. There exists some ``out of core'' QR and SVD factorizations that
are able to work even when does not fit into memory. However ``out of core'' algorithms are
currently extremely slow and unstable.

There is however one major drawback to the FOS algorithm. It is illustrated in figure 8 where we have:

and and | (12) |

- The FOS algorithm is a forward stepwise algorithm and, as almost all forward
stepwise algorithms, it is not able to detect multivariate concepts. In the
example illustrated in figure 8, the target is defined
precisely to be =+. You are able to predict the target accurately if your model is
. The FOS algorithm gives the wrong model
. In this example, the target is a multivariate concept hidden inside two variables: and . A forward stepwise algorithm that is able to detect a multivariate
concept is very difficult to develop. Inside FOS, the selection of the variable
that enters the active set is only based on the univariate relation
of with the target . On the contrary, Backward stepwise algorithms have no problem to
detect multi-variable concepts and should thus be preferred to a Simple,Classical
Forward Stepwise Algorithm. However, usually, Backward stepwise algorithms
are very time consuming because they need first to compute a full model and
thereafter to select carefully which variables they will drop.

The LARS algorithm with Lasso modification is a forward stepwise algorithm that produces all the solutions of the Lasso algorithm in a computing time proportional to ( is the number of active variable at the end of the algorithm). The Lasso algorithm is part of the family of the backward stepwise algorithm (It starts with a full model and remove some variables when decreases). Thus, the Lasso algorithm is able to discover multi-variable concepts. And thus, the LARS algorithm with Lasso modification is also able to discover multi-variable concepts (since LARS with Lasso gives the same models than the Lasso algorithm). From my knowledge, the LARS algorithm with Lasso modification is the only forward stepwise algorithm able to discover multi-variable concepts. Since, it's a forward stepwise algorithm, it's also very fast.

On the other side, conjunctive concepts are easy to detect. Let's assume that you have the following: If you create a column ( ), you obtain:andi.e. You can represent conjunctive concepts with the product of variables. You will thus obtain second order models. Computing second order models usually involves extending the original matrix with columns that are the product of some of the columns of the original . This is usually a very memory expansive operation. Indeed if is the number of column of , then the extended matrix has columns. The LARS library has been built to allow you to easily create ``on the fly'' a*regressor*(= one line of ). i.e. you only need to be able to store in memory one line of , not the whole matrix. This allows you to compute at a low memory consumption second, third, fourth,... order models.

- The FOS algorithm is ``too greedy'': when the FOS algorithm finds a good
direction, it is exploiting this direction ``to the maximum'' (the length
of the steps is
). Thereafter there is no ``room left'' for further
improvements, adding other variables. The FOS has been ``trapped'' in a so-called
``local optimum''. The LARS algorithm is not so greedy. Its steps lengthes
are shorter (see step 3. of the FOS algorithm about ``steps''). This means
that the different models that are computed at each steps of the LARS algorithm
are not 100% efficient. With the same set of active variables, one can compute
a better model. The LARS library downloadable on this page allows you to obtain
both models:
- the most efficient one.
- the one used inside the LARS algorithm to search for the next variables to add inside the active set.

- ultra fast.
- correlated columns are avoided to obtain stable models.
- Selection of the active variables based on the target (Deflation).
- small memory consumption.

There exists one last question to answer when using the LARS algorithm: When to stop? How to choose ? The question is not simple. Several strategies are available inside the LARS library downloadable on this page:

- Let's define the maximum correlation of the inactive columns with , the residual error at iteration (see the definition of in equation 11).
if , then stop (13) -
if , then stop (14) - Let's define
where is the number of active variable at iteration .
if , then stop (15)

- Let's define
as the n-Fold-Cross-Validation-Error of a model that has been regularized using an optimized ridge parameter
.
if , then stop (16)

- to choose a very stringent termination test (like the first one with ) to be sure to have all the informative variables. has a nice and easy interpretation: it's the percentage of

the variance inside that will not be explained. - to perform a backward stepwise, using as initial model, the model given by the LARS algorithm. The LARS library has been designed so that such task is very easy to code.

My experience is that stopping criterions based on MDL (maximum description
length), Akkaike index or
statistics are not reliable.

To remind you, the Ridge Regression is based on equation 5:

(17) |

To stabilize your model, you can also completely remove some variables using a ``backward stepwise'' algorithm. Here is a simple ``backward stepwise'' algorithm:

- Let's define , the creation set without the column . Let's define where is the solution to the standard normal equation: . is the performance on the validation set of a model using all the columns of the creation set . Set .
- We will try to remove the variable . Compute , the solution to . Compute : the performance of the model on the validation set when the model is built without taking into account the column of the creation set . If , then the column did not contain any information and can be dropped: Set .
- Choose another and go back to step 2.

Once we have found an optimal ridge parameter and/or the optimal set of informative columns (backward stepwise), we can compute the final model using:

(18) |

WARNING: If you try to use both regularization techniques at the same time, they will ``enter in competition''. You should then be very careful.

To get an idea of the quality of your final model , you can apply it on the test set:

Squared Error on test set=

It's crucial to never use the data in the Test Set when building the model because otherwise the results will
always be too optimistic. Indeed, if we have used a part of the test set to build
the model, we are ``cheating'': we are not truly estimating the ``generalization
ability'' of our model. The methodology just described here is a correct approach to a modelization problem. However, this approach does not use a very large validation and test set. Since the validation set is small, the parameter and the set of active columns will be very approximative. Since the test set is small, the estimation of the quality of the model is also very approximative. To overcome these difficulties, we can use a n-fold-cross-validation technique. If we want a more accurate estimation of the quality of the model, we can build 5 test sets instead of one: see figure 10. It's named a 5-fold-cross-validation. It also means that we have 5 training sets and thus 5 different models. The main point here is that none of these 5 models have seen the data that are inside their corresponding test set.

where is the solution to | (19) |

When we perform a n-fold-cross-validation, we must compute different models using equation 19. The only time-consuming part inside equation 19 is the computation of . Here is a small ``trick'' to be able to compute in a very efficient way: If we define as in figure 10, we can compute . Then, we have:

(20) |

The same technique can be applied to increase the size of the validation set: see figure 11.

When building a model, the first step is the normalization of all the variables (the columns ). The normalization of a variable consist of two steps:

- We subtract from the column its mean
- We divide the column by its standard deviation

``never use any information from the test set to build a model'' | (21) |

The columns in (part of the creation set) are normalized using information from (the test set). This small ``infraction'' of the rules is usually negligible and allows us to reduce considerably the computation time. However, for some rare variables, the rule must be respected strictly. These variables are named ``dynamic variables''. The LARS library does a strong distinction between ``dynamic variables'' and ``static variables''. The LARS library handles both types of variables. However only the ''static variables'' are reviewed inside this documentation.

The concept of ``dynamic variables'' is linked with the concept of ``variable recoding''. The normalization of the variables is a (very basic) recoding. If there is no recoding, there is no ``dynamic variables''. The ``normalization of a variable'' is a recoding that never produces any ``dynamic variables'' (especially if all the lines of have been correctly scrambled). ``Dynamic variables'' appears mostly when using advanced recodings such as, for example, in the paper ``1-Dimensional Splines as Building Blocks for Improving Accuracy of Risk Outcomes Models'' by David S. Vogel and Morgan C. Wang. The LARS library is handling advanced recodings through the use of ``dynamic variables''.

Inside the LARS library, the parameter of the Ridge Regression is computed using a n-fold-cross-validation on the validation sets. Equations 20 are used to achieve high efficiency. The matrices are built incrementally each time a new variable enters the active set . The equations 19 are solved using an advanced Cholesky factorization that is using an advanced pivoting strategy and an advanced perturbation strategy to obtain a very high accuracy model even when the matrices are close to singularity or badly conditioned.

The LARS library has been written to use the less memory possible. For example, all the matrices are symmetric, thus we only need to store half of these matrices in memory. ``In place'' factorization of matrices have been used when possible.

It can happen inside the LARS forward stepwise algorithm that several variables should

[betaBest,betas,errors,cHats]=matlabLARS(X,y,stoppingCriteriaType,stoppingCriteriaValue,lassoModif, verbosity);

At the first iteration only one is non-null. At each iteration , we will:

- set one more to a non-null value.
- compute
`Error`where`Error` - compute
`cHat=`(see equation 13 about )

If the lasso modification is used (set

Let's define

Let's define

Let's define

The variable '

- 'R': if
`((cHat(k-10)-cHat(k))< stoppingCriteriaValue cHat(0))`, then stop - 'S': if
`((sqEr(k-10)-sqEr(k))< stoppingCriteriaValue sqEr(0))`, then stop - 'C': Cp test: Let's define
`Cp(k) = sqEr(k)-nCol+2*n(k)``p =stoppingCriteriaValue`

`(Cp(k)> max Cp(k-p), Cp(k-p+1),..., Cp(k-1))`, then stop - 'M': Let's define
`minCVError(k)`as the 6-Fold-Cross-Validation-Error of a model`beta(k)`that has been regularized using an optimized ridge parameter .

if`(minCVError(k)>minCVError(k-1))`, then stop - 'E': if (
`stoppingCriteriaValue`=-1) include all columns, otherwise:

if (number of non-null =`stoppingCriteriaValue`), then stop

The variable '`verbosity`' is optional. It's a number between 0 and 2. (2=maximum verbosity).

- the most efficient one (stored inside
`betaBest`) - the one used inside the LARS algorithm to search for the next variables
to add inside the active set (stored inside
`betas`).

betaBest

The Matlab interface to the LARS library is very poor compared to the C++ interface. The source code of the matlab interface (in C++) is inside the zip-file downloadable at the top of this webpage. If you want to customize the Matlab version of the LARS library, it's very easy to edit/modify this small C++ code. The C++ interface is documented below.

It's very easy to edit/modify the small C++ code to customize it to your needs. The C++ interface is documented below.

- In the ``debug'' version of you program, add inside the ''Linker/Input/Additional Dependencies'' tab: ``larsDLLDebug.lib''
- Inside the directory containing the executable of the ''debug'' version of you program, add the file ''larsDLLDebug.dll''
- In the ``release'' version of you program, add inside the ''Linker/Input/Additional Dependencies'' tab: ``larsDLL.lib''
- Inside the directory containing the executable of the ''release'' version of you program, add the file ``larsDLL.dll''
- Add inside your project the files ``lars.h'' and ''miniSSEL1BLAS.h''

You must create a child of the*Line by Line:*`LARS`class. Inside the child class, you must re-define some of the following functions:`larsFLOAT *getLineX(int i, larsFLOAT *b);`

This function return the line of . The required line can be stored inside the`b`array or inside an other array. A pointer to the array containing the line must be returned.

`larsFLOAT getLineY(int i);`

This function return the component of : . Alternatively, you can use the third parameter of the`getModel`function to specify a target.

`void setSelectedColumns(int n,int *s);`

The re-definition of this function is optional. The default implementation is:void LARS::setSelectedColumns(int n,int *s) { selectedC=s; nSelected=n; }

The`setSelectedColumns`function announces that subsequent calls to the function`getLineXRequired`will return only a part of the column of . The indexes of the columns that should be returned are given inside the array`s`of length`n`.

`larsFLOAT *getLineXRequired(int i, larsFLOAT *b);`

The re-definition of this function is optional. However, it's strongly recommended that you re-define this function for performance reasons. The default implementation is:larsFLOAT *LARS::getLineXRequired(int i, larsFLOAT *b) { int j,ns=nSelected, *s=selectedC; larsFLOAT *f=getLineX(i,b); for (j=0; j<ns; j++) b[j]=f[s[j]]; return b; }

This function returns only the columns of line`i`that have been selected using the function`setSelectedColumns`.

You must create a child of the*Column by Column:*`LARS`class. Inside the child class, you must re-define the following functions:`larsFLOAT *getColX(int i, larsFLOAT *b);`

This function return the column of . The required column can be stored inside the`b`array or inside an other array. A pointer to the array containing the line must be returned.

`larsFLOAT *getColY(larsFLOAT *b);`

This function return the target . The target can be stored inside the`b`array or inside an other array. A pointer to the array containing must be returned. Alternatively, you can use the third parameter of the`getModel`function to specify a target.

`nCol:`the number of column of .`nRow:`the number of rows of .`orientation:`if`orientation='L'`, then we access the matrix line by line. Otherwise, we access the matrix column by column.`lasso:`if`lasso=1`, then the lasso modification of the LARS algorithm is used.`fullModelInTwoSteps:`if`fullModelInTwoSteps=1`, then no forward stepwise is performed: we are directly building a model using all the column of . When the number of column`nCol`is small this is faster than the full LARS algorithm.`ridge:`if`ridge=1`, then an optimization of the parameter of the ridge regression is performed at the end of the build of the model (this requires ).`memoryAligned:`all the linear algebra routines are SSE2 optimized. This means that the functions`getCol*`and`getLine*`should return a pointer to an array that is 16-byte-memory-aligned. Such an array ca be obtained using the`mallocAligned`function of the*miniSSEL1BLAS*library. If you are not able to provide aligned-pointers, you can set`memoryAligned=0`, but the computing time will be more or less doubled.`nMaxVarInModel:`maximum number of variables inside the model when performing a forward stepwise (this is a stopping criteria).`nFold:`the number of block (as defined in figure 10) for the n-fold-cross-validation.`folderBounds:`an array of size of integer that contains the line-index of the first line of each (as defined in figure 10). We should also have`folderBounds[nFold]=nRow`. If you don't initialize this parameter, then`folderBounds`will be initialized for you automatically so that all the blocks have the same size.`stoppingCriteria:`define which kind of stopping criteria will be used inside the forward stepwise:`stoppingCriteria='R':`stop based on the maximum correlation of the inactive columns with the residual error. see equation 13 with`stopSEP`.`stoppingCriteria='S':`stop based on the L2-norm of the residual error : see equation 14 with`stopSEP`.`stoppingCriteria='C':`stop based on the criterion: see equation 15 with`stopSEP`.`stoppingCriteria='M':`stop based on the n-Fold-cross-validation-Error of the model: see equation 16.

`void initStoppingCriteria(larsFLOAT squaredError, larsFLOAT cHat);`

This function is used to initialize internal variables for the stopping criteria.`char isStoppingCriteriaTrue(larsFLOAT squaredError, larsFLOAT cHat, larsFLOAT *vBetaC, LARSSetSimple *sBest);`

If this function returns '1', the LARS forward stepwise algorithm stops. If this function returns '0', the LARS algorithm is continuing. This function update the object`sBest`. Typically, when a good selection of variable is found we do:`sBest->copyFrom(sA);`. The final model contains only the variables inside`sBest`.

You usually call it this way:

larsFLOAT *model=NULL; LARSSETSimple sForce(1); sForce.add(3); getModel(&model,&sForce);This means that the LARS library will construct a model that will be returned inside the array pointed by

Here is a detailed description of the parameters of

**First parameter [output]:**This parameter describes an array that will be used to store the model. In the example above`model`was set to NULL, so the`getModel`function allocates itself the memory needed. You can allocate yourself the memory used to store the model: for example:

`larsFLOAT *model=(larsFLOAT*)mallocAligned((nCol+1)*sizeof(larsFLOAT));`

getModel(&model);**Second parameter [input]:**Describes the set of variables that will be added inside the model. If you use the forward stepwise algorithm and if you stop prematurely, this set of variable will be added to the active variables. Of course, adding a variable that is already active will have no effect.**Third parameter [input]:**Describes the target. This should be an array allocated with:

`larsFLOAT *target=(larsFLOAT*)mallocAligned(nRow*sizeof(larsFLOAT)+32);`The LARS library will free this array for you at the right time. If you use this parameter and if (

`stoppingCriteria'M')`, then the LARS library will not use the`getColY`or the`getLineY`functions (you don't have to define them).**Fourth parameter [output]:**an error code that can be interpreted using the`getErrorMessage`function.

`sA:`you don't have direct access to the content of`sA`. Usually, you do the following:

`getModel();`The object

LARSSETSimple sInModel(sA);

`sInModel`contains the set of indexes of all the columns that are used inside the final model.`invStdDev:`this is an array that contains the inverse of the standard deviation of all the columns of . If you have normalized yourself the column of (so that ), this array is`NULL`.`invStdDevY:`the inverse of the standard deviation of .`mean:`this is an array that contains the mean of all the column of . If you have normalized yourself the column of (so that mean of ), this array is`NULL`.`meanY:`the mean of .`lambda:`the optimal ridge parameter .`target:`the residual error at the end of the LARS algorithm (NOT normalized).

- Build a model using the
`getmodel`function. Stop prematurely. The index of the active variables are stored inside`sA`. - Try to remove variable :
- Get a copy
`sCopy`of all the variable inside the model:`sCopy.copyrom(sA);` - Remove variable from
`sCopy`:`sCopy.remove(i);`. Let's define`na`, the number of variable inside`sCopy`:`int na=sCopy.n;`. - This computes a model using only the columns inside
`sCopy`:

`larsFLOAT *vBeta=(larsFLOAT*)malloc((nCol+1)*sizeof(larsFLOAT));`

buildModel(vBeta, lambda, -1, &sCopy);

A model in ``compressed/short form'' is stored inside the array`vBeta`.`vBeta`is currently an array of size`na`.`vBeta[j]`is the weight of the model given to the column of index`sCopy.s[j]`The short form is a very efficient way of storing the model when combined with functions such as`getLineXRequired`. To convert the model to the ``normal/long form'', use:`shortToLongModel(vBeta);`Note that the size of the ``long form'' of the model is`nCol+1`because we need also to store the offset(constant term) of the model. Inside this example the model is built using all the blocks (see figure 10 for a definition of the 's). If we want to build a model using all the blocks except the block , we will do:`buildModel(vBeta, lambda, 4, &sCopy);`This allows us to perform easily a n-fold-cross-validation-error test. - Test the model to see if we can remove column without having a loss of efficiency. This test may involves a n-fold-cross-validation
technique. If this is the case, we can drop definitively the column : we do:
`keepVars(sCopy);`(sA will be updated). - choose another column inside
`sA`and go back to the first point of step 2.

- Get a copy

To summarize, there is at least

- The model that is produced by the forward stepwise LARS algorithm when searching
for informative variables inside the stepwise iterations. This model is suboptimal
(see remark about ``greedy'' algorithms to know why). You can access this
model through the
`isStoppingCriteriaTrue`function. - The model given by the function
`buildModel( )`: this model is using all the blocks inside the creation set. - The models given by the function
`buildModel(`: these models are using all the the creation set minus the block . There are)*i*`nFold`models of this type (`nFold`-1). - The model given by the function
`buildAverageModel`that is an average of the`nFold`models computed using a n-fold-cross-validation technique.

Here are some more useful functions:

`larsFLOAT normalizedLinearCorrelation(int i, int j, int cv=-1);`

This will return : the linear correlation between variable and : . The result is normalized so that . If`cv=4`, then the block is ignored while computing .`larsFLOAT linearCorrelationWithTarget(int i, int cv=-1);`

This will return : the linear correlation between variable and the target : . The result is normalized so that . If`cv=4`, then the block is ignored while computing .`static larsFLOAT apply(int n, larsFLOAT *line, larsFLOAT *model);`

This computes (usually`n=nCol`).

- The source code of a matlab mex-file function based on the LARS library.
This allows you to use the LARS Library under Matlab. Due to limitation of
Matlab, we have to set the options
`orientation=`C'`and`memoryAlignment=0`. Thus, the library will be rather slow compared to a full C++ implementation. The classical `diabetes' example is given (the same one than inside the main paper on LARS). An evolution of the example illustrated in figure 8, is also given. This example is interesting because the FOS algorithm fails on it. The final model of the FOS algorithm contains only one variable: the variable. On the contrary, the LARS algorithm is able to find the right set of active variables: and . - The source code of a small C++ executable that solves the `diabetes' example.
The `diabetes' data are stored inside an external
*.csv*file. The last column of the*.csv*file is the target. A very fast*.csv*-file-reader is provided.

The documentation does not cover the ``dynamic variables'' (see equation 21 about ``dynamic variables'').

The library allows you to construct second order, third order,... models very easily.

If

Inside the LARS algorithm, the Cholesky factorizations are performed incrementally as new variables enter the model. The Cholesky factorizations are based on the paper ``A revised Modified Cholesky Factorization Algorithm'' by Robert B. Schnabel and Elisabeth Eskow. This advanced Cholesky Factorization allows us to obtain high accuracy models even when close to rank deficiency.

The optimization of the ridge parameter is performed using a modified Brent algorithm to ensure global convergence. The optimization algorithm has been extracted from an optimization library named CONDOR.

- One of the first paper published on FOS by the inventor of FOS: ``A robust orthogonal algorithm for System Identification and Time-Series Analysis'', Michael J.Korenberg.
- An introductory paper: ``Application of Fast Orthogonal Search for the Design of RBFNN'', W. Ahmed, D.M. Hummels and M.T. Musavi
- A complete thesis: ``Fast Orthogonal Search For Training of Radial Basis Function Neural Networks'' by Wahid Ahmed
- An example why LARS is better than FOS for ARMA models: ''A new Algorithm for Linear and non-linear ARMA Model Parameter Estimation Using Affine Geometry'', Sheng Lu, Ki Hwan Ju and Ki H. Chon.
- Some application of FOS:
- ``Orthogonal Approaches to Time-series Analysis and System Identification'', Michael J. Korenberg.
- ``Spectral Analysis of Heart Valve Sound for detection of Prosthetic Heart Valve Diseases'', Sang Huyn Kim, Hee Jong Lee, Jae Man Huh and Byung Chul Chang

A paper about Lasso: ``Regression shrinkage and selection via the lasso'', Robert Tibshirani

A paper about LARS: ``Least Angle Regression'', Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani

Cholesky factorization is based on the paper ``A revised Modified Cholesky Factorization Algorithm'' by Robert B. Schnabel and Elisabeth Eskow.

A very good advanced recoding: ``1-Dimensional Splines as Building Blocks for Improving Accuracy of Risk Outcomes Models'' by David S. Vogel and Morgan C. Wang.