I’ve posted on ridge regression and the LASSO (Least Absolute Shrinkage and Selection Operator) some weeks back. Here I want to compare them in connection with variable selection where there are more predictors than observations (“many predictors”). 1. Ridge regression does not really select variables in the many predictors situation. Rather, ridge regression “shrinks” all … Continue reading Estimation and Variable Selection with Ridge Regression and the LASSO→
The LASSO (Least Absolute Shrinkage and Selection Operator) is a method of automatic variable selection which can be used to select predictors X* of a target variable Y from a larger set of potential or candidate predictors X. Developed in 1996 by Tibshirani, the LASSO formulates curve fitting as a quadratic programming problem, where the … Continue reading Variable Selection Procedures – The LASSO→
As Hal Varian writes in his popular Big Data: New Tricks for Econometrics the wealth of data now available to researchers demands new techniques of analysis. In particular, often there is the problem of “many predictors.” In classic regression, the number of observations is assumed to exceed the number of explanatory variables. This obviously is … Continue reading Primer on Method – Some Perspectives from 2014→
I got a chance to work with the problem of forecasting during a business downturn at Microsoft 2007-2010. Usually, a recession is not good for a forecasting team. There is a tendency to shoot the messenger bearing the bad news. Cost cutting often falls on marketing first, which often is where forecasting is housed. But … Continue reading Forecasting the Downswing in Markets→
In many business applications, forecasting is not a hugely complex business. For a sales forecasting, the main challenge can be obtaining the data, which may require sifting through databases compiled before and after mergers or other reorganizations. Often, available historical data goes back only three or four years, before which time product cycles make comparisons … Continue reading Business Forecasting – Some Thoughts About Scope→
I’ve been doing a deep dive into Bayesian materials, the past few days. I’ve tried this before, but I seem to be making more headway this time. One question is whether Bayesian methods and statistics informed by the more familiar frequency interpretation of probability can give different answers. I found this question on CrossValidated, too … Continue reading Some Ways in Which Bayesian Methods Differ From the “Frequentist” Approach→
As regular readers of this blog know, I’ve migrated to a weekly (or potentially longer) topic focus, and this week’s topic is variable selection. And the next planned post in the series will compare and contrast ridge regression and the LASSO (least absolute shrinkage and selection operator). There also are some new results for the … Continue reading The Tibshirani’s – Statistics and Machine Learning Superstars→
If you can, form the regression Y = β0+ β1X1+ β2X2+…+ βNXN where Y is the target variable and the N variagles Xi are the predictors which have the highest correlations with the target variables, based on some cutoff value of the correlation, say +/- 0.3. Of course, if the number of observations you have in … Continue reading First Cut Modeling – All Possible Regressions→
In a recent post on logistic regression, I mentioned research which developed diagnostic tools for breast cancer based on true Big Data parameters – notably 62,219 consecutive mammography records from 48,744 studies in 18,270 patients reported using the Breast Imaging Reporting and Data System (BI-RADS) lexicon and the National Mammography Database format between April 5, … Continue reading Selecting Predictors→
A couple of years or so ago, I analyzed a software customer satisfaction survey, focusing on larger corporate users. I had firmagraphics – specifying customer features (size, market segment) – and customer evaluation of product features and support, as well as technical training. Altogether, there were 200 questions that translated into metrics or variables, along … Continue reading Complete Subset Regressions→
Sales and new product forecasting in data-limited (real world) contexts