Predicting the Singularity, the Advent of Superintelligence

From thinking about robotics, automation, and artificial intelligence (AI) this week, I’m evolving a picture of the future – the next few years. I think you have to define a super-technological core, so to speak, and understand how the systems of production, communication, and control mesh and interpenetrate across the globe. And how this sets in motion multiple dynamics.

But then there is the “singularity” –  whose main publicizer is Ray Kurzweil, current Director of Engineering at Google. Here’s a particularly clear exposition of his view.

There’s a sort of rebuttal by Paul Root Wolpe.

Part of the controversy, as in many arguments, is a problem of definition. Kurzweil emphasizes a “singularity” of superintelligence of machines. For him, the singularity is at first the point at which the processes of the human brain will be well understood and thinking machines will be available that surpass human capabilities in every respect. Wolpe, on the other hand, emphasizes the “event horizon” connotation of the singularity – that point beyond which out technological powers will have become so immense that it is impossible to see beyond.

And Wolpe’s point about the human brain is probably well-taken. Think, for instance, of how decoding the human genome was supposed to unlock the secrets of genetic engineering, only to find that there were even more complex systems of proteins and so forth.

And the brain may be much more complicated than the current mechanical models suggest – a view espoused by Roger Penrose, English mathematical genius. Penrose advocates a  quantum theory of consciousness. His point, made first in his book The Emperor’s New Mind, is that machines will never overtake human consciousness, because, in fact, human consciousness is, at the limit, nonalgorithmic. Basically, Penrose has been working on the idea that the brain is a quantum computer in some respect.

I think there is no question, however, that superintelligence in the sense of fast computation, fast assimilation of vast amounts of data, as well as implementation of structures resembling emotion and judgment – all these, combined with the already highly developed physical capabilities of machines, mean that we are going to meet some mojo smart machines in the next ten to twenty years, tops.

The dysutopian consequences are enormous. Bill Joy, co-founder of Sun Microsystems, wrote famously about why the future does not need us. I think Joy’s singularity is a sort of devilish mirror image of Kurzweil’s – for Joy the singularity could be a time when nanotechnology, biotechnology, and robotics link together to make human life more or less impossible, or significantly at risk.

There’s is much more to say and think on this topic, to which I hope to return from time to time.

Meanwhile, I am reminded of Voltaire’s Candide who, at the end of pursuing the theories of Dr. Pangloss, concludes “we must cultivate our garden.”

Robotics – the Present, the Future

A picture is worth one thousand words. Here are several videos, mostly from Youtube, discussing robotics and artificial intelligence (AI) and showing present and future capabilities. The videos fall in five areas – concepts with Andrew Ng, Industrial Robots and their primary uses, Military Robotics, including a presentation on predator drones, and some state-of-the-art innovations in robotics which mimic the human approach to a degree.

Andrew Ng  – The Future of Robotics and Artificial Intelligence

Car Factory – Kia Sportage factory production line

ABB Robotics – 10 most popular applications for robots


Predator Drones


Innovators: The Future of Robotic Warfare


Bionic kangaroo


The Duel: Timo Boll vs. KUKA Robot


The “Hollowing Out” of Middle Class America

Two charts in a 2013 American Economic Review (AER) article put numbers to the “hollowing out” of middle class America – a topic celebrated with profuse anecdotes in the media.

Autor1

The top figure shows the change in employment 1980-2005 by skill level, based on Census IPUMS and American Community Survey (ACS) data. Occupations are ranked by skill level, approximated by wages in each occupation in 1980.

The lower figure documents the changes in wages of these skill levels 1980-2005.

These charts are from David Autor and David Dorn – The Growth of Low-Skill Service Jobs and the Polarization of the US Labor Market – who write that,

Consistent with the conventional view of skill-biased technological change, employment growth is differentially rapid in occupations in the upper two skill quartiles. More surprising in light of the canonical model are the employment shifts seen below the median skill level. While occupations in the second skill quartile fell as a share of employment, those in the lowest skill quartile expanded sharply. In net, employment changes in the United States during this period were strongly U-shaped in skill level, with relative employment declines in the middle of the distribution and relative gains at the tails. Notably, this pattern of employment polarization is not unique to the United States. Although not recognized until recently, a similar “polarization” of employment by skill level has been underway in numerous industrialized economies in the last 20 to 30 years.

So, employment and wage growth has been fastest in the past three or so decades (extrapolating to the present) in low skill and high skill occupations.

Among lower skill occupations, such as food service workers, security guards, janitors and gardeners, cleaners, home health aides, child care workers, hairdressers and beauticians, and recreational workers, employment grew 30 percent 1980-2005.

Among the highest paid occupations – classified as managers, professionals, technicians, and workers in finance, and public safety – the share of employment also grew by about 30 percent, but so did wages – which increased at about double the pace of the lower skill occupations over this period.

Professor Autor is in the MIT economics department, and seems to be the nexus of a lot of interesting research casting light on changes in US labor markets.

DavidAutor

In addition to “doing Big Data” as the above charts suggest, David Autor is closely associated with a new, common sense model of production activities, based on tasks and skills.

This model of the production process, enables Autor and his coresearchers to conclude that,

…recent technological developments have enabled information and communication technologies to either directly perform or permit the offshoring of a subset of the core job tasks previously performed by middle skill workers, thus causing a substantial change in the returns to certain types of skills and a measurable shift in the assignment of skills to tasks.

So it’s either a computer (robot) or a Chinaman who gets the middle-class bloke’s job these days.

And to drive that point home – (and, please, I consider the achievements of the PRC in lifting hundreds of millions out of extreme poverty to be of truly historic dimension) Autor with David Dorn and Gordon Hansen publihsed another 2013 article in the AER titled The China Syndrome: Local Labor Market Effects of Import Competition in the United States.

This study analyzes local labor markets and trade shocks to these markets, according to initial patterns of industry specialization.

The findings are truly staggering – or at least have been equivocated or obfuscated for years by special pleaders and lobbyists.

Dorn et al write,

The value of annual US goods imports from China increased by a staggering 1,156 percent from 1991 to 2007, whereas US exports to China grew by much less…. 

Our analysis finds that exposure to Chinese import competition affects local labor markets not just through manufacturing employment, which unsurprisingly is adversely affected, but also along numerous other margins. Import shocks trigger a decline in wages that is primarily observed outside of the manufacturing sector. Reductions in both employment and wage levels lead to a steep drop in the average earnings of households. These changes contribute to rising transfer payments through multiple federal and state programs, revealing an important margin of adjustment to trade that the literature has largely overlooked,

This research – conducted in terms of ordinary least squares (OLS), two stage least squares (2SLS) as well as “instrumental” regressions – is definitely not something a former trade unionist is going to ponder in the easy chair after work at the convenience store. So it’s kind of safe in terms of arousing the ire of the masses.

But I digress.

For my purposes here, Autor and his co-researchers put pieces of the puzzle in place so we can see the picture.

The US occupational environment has changed profoundly since the 1980’s. Middle class jobs have simply vanished over large parts of the landscape. More specifically, good-paying production jobs, along with a lot of other more highly paid, but routinized work, has been the target of outsourcing, often to China it seems it can be demonstrated. Higher paid work by professionals in business and finance benefits from complementarities with the advances in data processing and information technology (IT) generally. In addition, there are a small number of highly paid production workers whose job skills have been updated to run more automated assembly operations which seem to be the chief beneficiaries of new investment in production in the US these days.

There you have it.

Market away, and include these facts in any forecasts you develop for the US market.

Of course, there are issues of dynamics.

Automation, Robotics -Trends and Impacts

Trying to figure out the employment impacts of automation, computerization, and robotics is challenging, to say the least.

There are clear facts, such as the apparent permanent loss of jobs in US manufacturing since the early 1990’s.

MANEMP

But it would be short-sighted to conclude these jobs have been lost to increased use of computers and robots in production.

That’s because, for one thing, you might compare a chart like the above with statistics on Chinese manufacturing.

Chinesemanemp

Now you can make a case – if you focus on the urban Chinese manufacturing employment – that these two charts are more or less mirror images of one another in recent years. That is urban manufacturing employment in China, according the US BLS, increased about 4 mllion 2002-2009, while US manufacturing employment dropped by about that amount over the same period.

Of course, there are other off-shore manufacturing sites of importance, such as the maquiladoras along the US border with Mexico.

But what brings robotics into focus for me is that significant automation and robotics are being installed in factories in China.

Terry Guo, head of Foxconn – the huge Chinese contract manufacturer making the I-phone and many other leading electronics products – has called for installation of a million industrial robots in Foxconn factories over the next few years.

In fact, Foxconn apparently is quietly partnering with Google to help bring its vision of robotics to life.

Decoupling of Productivity and Employment?

Erik Brynjolfsson at MIT is an expert on the productivity implications of information technology (IT).

About a year ago, the MIT Technology Review ran an article How Technology Is Destroying Jobs featuring the perspective developed recently by Brynjolfsson that there is increasingly a disconnect between productivity growth and jobs in the US.

The article featured two infographics – one of which I reproduce here.

info

There have been highly focused studies of the effects of computerization on specific industries.

Research published just before the recent economic crisis did an in-depth regarding automation or computerization in a “valve industry,” arriving at three, focused findings.

First, plants that adopt new IT-enhanced equipment also shift their business strategies by producing more customized valve products. Second, new IT investments improve the efficiency of all stages of the production process by reducing setup times, run times, and inspection times. The reductions in setup times are theoretically important because they make it less costly to switch production from one product to another and support the change in business strategy to more customized production. Third, adoption of new IT-enhanced capital equipment coincides with increases in the skill requirements of machine operators, notably technical and problem-solving skills, and with the adoption of new human resource practices to support these skills

This is the positive side of the picture.

No more drudgery on assembly lines with highly repetitive tasks. Factory workers are being upgraded to computer operatives.

More to follow.

Jobs and the Next Wave of Computerization

A duo of researchers from Oxford University (Frey and Osborne) made a splash with their analysis of employment and computerization in the US (English spelling). Their research, released September of last year, projects that –

47 percent of total US employment is in the high risk category, meaning that associated occupations are potentially automatable over some unspecified number of years, perhaps a decade or two..

Based on US Bureau of Labor Statistics (BLS) classifications from O*NET Online, their model predicts that most workers in transportation and logistics occupations, together with the bulk of office and administrative support workers, and labour in production occupations, are at risk.

This research deserves attention, if for no other reason than masterful discussions of the impact of technology on employment and many specific examples of new areas for computerization and automation.

For example, I did not know,

Oncologists at Memorial Sloan-Kettering Cancer Center are, for example, using IBM’s Watson computer to provide chronic care and cancer treatment diagnostics. Knowledge from 600,000 medical evidence reports, 1.5 million patient records and clinical trials, and two million pages of text from medical journals, are used for benchmarking and pattern recognition purposes. This allows the computer to compare each patient’s individual symptoms, genetics, family and medication history, etc., to diagnose and develop a treatment plan with the highest probability of success..

There are also specifics of computerized condition monitoring and novelty detection -substituting for closed-circuit TV operators, workers examining equipment defects, and clinical staff in intensive care units.

A followup Atlantic Monthly article – What Jobs Will the Robots Take? – writes,

We might be on the edge of a breakthrough moment in robotics and artificial intelligence. Although the past 30 years have hollowed out the middle, high- and low-skill jobs have actually increased, as if protected from the invading armies of robots by their own moats. Higher-skill workers have been protected by a kind of social-intelligence moat. Computers are historically good at executing routines, but they’re bad at finding patterns, communicating with people, and making decisions, which is what managers are paid to do. This is why some people think managers are, for the moment, one of the largest categories immune to the rushing wave of AI.

Meanwhile, lower-skill workers have been protected by the Moravec moat. Hans Moravec was a futurist who pointed out that machine technology mimicked a savant infant: Machines could do long math equations instantly and beat anybody in chess, but they can’t answer a simple question or walk up a flight of stairs. As a result, menial work done by people without much education (like home health care workers, or fast-food attendants) have been spared, too.

What Frey and Osborne at Oxford suggest is an inflection point, where machine learning (ML) and what they call mobile robotics (MR) have advanced to the point where new areas for applications will open up – including a lot of menial, service tasks that were not sufficiently routinized for the first wave.

In addition, artificial intelligence (AI) and Big Data algorithms are prying open up areas formerly dominated by intellectual workers.

The Atlantic Monthly article cited above has an interesting graphic –

jobsautomationSo at the top of this chart are the jobs which are at 100 percent risk of being automated, while at the bottom are jobs which probably will never be automated (although I do think counseling can be done to a certain degree by AI applications).

The Final Frontier

This blog focuses on many of the relevant techniques in machine learning – basically unsupervised learning of patterns – which in the future will change everything.

Driverless cars are the wow example, of course.

Bottlenecks to moving further up the curve of computerization are highlighted in the following table from the Oxford U report.

ONETvars

As far as dexterity and flexibility goes, Baxter shows great promise, as the following YouTube from his innovators illustrates.

There also are some wonderful examples of apparent creativity by computers or automatic systems, which I plan to detail in a future post.

Frey and Osborn, reflecting on their research in a 2014 discussion, conclude

So, if a computer can drive better than you, respond to requests as well as you and track down information better than you, what tasks will be left for labour? Our research suggests that human social intelligence and creativity are the domains were labour will still have a comparative advantage. Not least, because these are domains where computers complement our abilities rather than substitute for them. This is because creativity and social intelligence is embedded in human values, meaning that computers would not only have to become better, but also increasingly human, to substitute for labour performing such work.

Our findings thus imply that as technology races ahead, low-skill workers will need to reallocate to tasks that are non-susceptible to computerisation – i.e., tasks requiring creative and social intelligence. For workers to win the race, however, they will have to acquire creative and social skills. Development strategies thus ought to leverage the complementarity between computer capital and creativity by helping workers transition into new work, involving working with computers and creative and social ways.

Specifically, we recommend investing in transferable computer-related skills that are not particular to specific businesses or industries. Examples of such skills are computer programming and statistical modeling. These skills are used in a wide range of industries and occupations, spanning from the financial sector, to business services and ICT.

Implications For Business Forecasting

People specializing in forecasting for enterprise level business have some responsibility to “get ahead of the curve” – conceptually, at least.

Not everybody feels comfortable doing this, I realize.

However, I’m coming to the realization that these discussions of how many jobs are susceptible to “automation” or whatever you want to call it (not to mention jobs at risk for “offshoring”) – these discussions are really kind of the canary in the coal mine.

Something is definitely going on here.

But what are the metrics? Can you backdate the analysis Frey and Osborne offer, for example, to account for the coupling of productivity growth and slower employment gains since the last recession?

Getting a handle on this dynamic in the US, Europe, and even China has huge implications for marketing, and, indeed, social control.

Machine Learning and Next Week

Here is a nice list of machine learning algorithms. Remember, too, that they come in two or three flavors – supervised, unsupervised, semi-supervised, and reinforcement learning.

MachineLearning

An objective of mine is to cover each of these techniques with an example or two, with special reference to their relevance to forecasting.

I got this list, incidentally, from an interesting Australian blog Machine Learning Mastery.

The Coming Week

Aligned with this marvelous list, I’ve decided to focus on robotics for a few blog posts coming up.

This is definitely exploratory, but recently I heard a presentation by an economist from the National Association of Manufacturers (NAM) on manufacturing productivity, among other topics. Apparently, robotics is definitely happening on the shop floor – especially in the automobile industry, but also in semiconductors and electronics assembly.

And, as mankind pushes the envelope, drilling for oil in deeper and deeper areas offshore and handling more and more radioactive and toxic material, the need for significant robotic assistance is definitely growing.

I’m looking for indices and how to construct them – how to guage the line between merely automatic and what we might more properly call robotic.

LInks – late May

US and Global Economic Prospects

Goldman’s Hatzius: Rationale for Economic Acceleration Is Intact

We currently estimate that real GDP fell -0.7% (annualized) in the first quarter, versus a December consensus estimate of +2½%. On the face of it, this is a large disappointment. It raises the question whether 2014 will be yet another year when initially high hopes for growth are ultimately dashed.

 Today we therefore ask whether our forecast that 2014-2015 will show a meaningful pickup in growth relative to the first four years of the recovery is still on track. Our answer, broadly, is yes. Although the weak first quarter is likely to hold down real GDP for 2014 as a whole, the underlying trends in economic activity are still pointing to significant improvement….

 The basic rationale for our acceleration forecast of late 2013 was twofold—(1) an end to the fiscal drag that had weighed on growth so heavily in 2013 and (2) a positive impulse from the private sector following the completion of the balance sheet adjustments specifically among US households. Both of these points remain intact.

Economy and Housing Market Projected to Grow in 2015

Despite many beginning-of-the-year predictions about spring growth in the housing market falling flat, and despite a still chugging economy that changes its mind quarter-to-quarter, economists at the National Association of Realtors and other industry groups expect an uptick in the economy and housing market through next year.

The key to the NAR’s optimism, as expressed by the organization’s chief economist, Lawrence Yun, earlier this week, is a hefty pent-up demand for houses coupled with expectations of job growth—which itself has been more feeble than anticipated. “When you look at the jobs-to-population ratio, the current period is weaker than it was from the late 1990s through 2007,” Yun said. “This explains why Main Street America does not fully feel the recovery.”

Yun’s comments echo those in a report released Thursday by Fitch Ratings and Oxford Analytica that looks at the unusual pattern of recovery the U.S. is facing in the wake of its latest major recession. However, although the U.S. GDP and overall economy have occasionally fluctuated quarter-to-quarter these past few years, Yun said that there are no fresh signs of recession for Q2, which could grow about 3 percent.

Report: San Francisco has worse income inequality than Rwanda

If San Francisco was a country, it would rank as the 20th most unequal nation on Earth, according to the World Bank’s measurements.

Googlebus

Climate Change

When Will Coastal Property Values Crash And Will Climate Science Deniers Be The Only Buyers?

sea

How Much Will It Cost to Solve Climate Change?

Switching from fossil fuels to low-carbon sources of energy will cost $44 trillion between now and 2050, according to a report released this week by the International Energy Agency.

Natural Gas and Fracking

How The Russia-China Gas Deal Hurts U.S. Liquid Natural Gas Industry

This could dampen the demand – and ultimately the price for – LNG from the United States. East Asia represents the most prized market for producers of LNG. That’s because it is home to the top three importers of LNG in the world: Japan, South Korea and China. Together, the three countries account for more than half of LNG demand worldwide. As a result, prices for LNG are as much as four to five times higher in Asia compared to what natural gas is sold for in the United States.

The Russia-China deal may change that.

If LNG prices in Asia come down from their recent highs, the most expensive LNG projects may no longer be profitable. That could force out several of the U.S. LNG projects waiting for U.S. Department of Energy approval. As of April, DOE had approved seven LNG terminals, but many more are waiting for permits.

LNG terminals in the United States will also not be the least expensive producers. The construction of several liquefaction facilities in Australia is way ahead of competitors in the U.S., and the country plans on nearly quadrupling its LNG capacity by 2017. More supplies and lower-than-expected demand from China could bring down prices over the next several years.

Write-down of two-thirds of US shale oil explodes fracking mythThis is big!

Next month, the US Energy Information Administration (EIA) will publish a new estimate of US shale deposits set to deal a death-blow to industry hype about a new golden era of US energy independence by fracking unconventional oil and gas.

EIA officials told the Los Angeles Times that previous estimates of recoverable oil in the Monterey shale reserves in California of about 15.4 billion barrels were vastly overstated. The revised estimate, they said, will slash this amount by 96% to a puny 600 million barrels of oil.

The Monterey formation, previously believed to contain more than double the amount of oil estimated at the Bakken shale in North Dakota, and five times larger than the Eagle Ford shale in South Texas, was slated to add up to 2.8 million jobs by 2020 and boost government tax revenues by $24.6 billion a year.

China

The Annotated History Of The World’s Next Reserve Currency

yuanhistory

Goldman: Prepare for Chinese property bust

…With demand poised to slow given a tepid economic backdrop, weaker household affordability, rising mortgage rates and developer cash flow weakness, we believe current construction capacity of the domestic property industry may be excessive. We estimate an inventory adjustment cycle of two years for developers, driving 10%-15% price cuts in most cities with 15% volume contraction from 2013 levels in 2014E-15E. We also expect M&A activities to take place actively, favoring developers with strong balance sheet and cash flow discipline.

China’s Shadow Banking Sector Valued At 80% of GDP

The China Banking Regulatory Commission has shed light on the country’s opaque shadow banking sector. It was as large as 33 trillion yuan ($5.29 trillion) in mid-2013 and equivalent to 80% of last year’s GDP, according to Yan Qingmin, a vice chairman of the commission.

In a Tuesday WeChat blog sent by the Chong Yang Institute for Financial Studies, Renmin University, Yan wrote that his calculation is based on shadow lending activities from asset management businesses to trust companies, a definition he said was very broad.  Yan said the rapid expansion of the sector, which was equivalent to 53% of GDP in 2012, entailed risks of some parts of the shadow banking business, but not necessarily the Chinese economy.

Yan’s estimation is notably higher than that of the Chinese Academy of Social Sciences. The government think tank said on May 9 that the sector has reached 27 trillion yuan ($4.4 trillion in 2013) and is equivalent to nearly one fifth of the domestic banking sector’s total assets.

Massive, Curvaceous Buildings Designed to Imitate a Mountain Forest

Chinamassive

Information Technology (IT)

I am an IT generalist. Am I doomed to low pay forever? Interesting comments and suggestions to this question on a Forum maintained by The Register.

I’m an IT generalist. I know a bit of everything – I can behave appropriately up to Cxx level both internally and with clients, and I’m happy to crawl under a desk to plug in network cables. I know a little bit about how nearly everything works – enough to fill in the gaps quickly: I didn’t know any C# a year ago, but 2 days into a project using it I could see the offshore guys were writing absolute rubbish. I can talk to DB folks about their DBs; network guys about their switches and wireless networks; programmers about their code and architects about their designs. Don’t get me wrong, I can do as well as talk, programming, design, architecture – but I would never claim to be the equal of a specialist (although some of the work I have seen from the soi-disant specialists makes me wonder whether I’m missing a trick).

My principle skill, if there is one – is problem resolution, from nitty gritty tech details (performance and functionality) to handling tricky internal politics to detoxify projects and get them moving again.

How on earth do I sell this to an employer as a full-timer or contractor? Am I doomed to a low income role whilst the specialists command the big day rates? Or should I give up on IT altogether

Crowdfunding is brutal… even when it works

China bans Windows 8

China has banned government use of Windows 8, Microsoft Corp’s latest operating system, a blow to a US technology company that has long struggled with sales in the country.

The Central Government Procurement Center issued the ban on installing Windows 8 on Chinese government computers as part of a notice on the use of energy-saving products, posted on its website last week.

Data Analytics

Statistics of election irregularities – good forensic data analytics.

Dimension Reduction With Principal Components

The method of principal components regression has achieved new prominence in machine learning, data reduction, and forecasting over the last decade.

It’s highly relevant in the era of Big Data, because it facilitates analyzing “fat” or wide databases. Fat databases have more predictors than observations. So you might have ten years of monthly data on sales, but 1000 potential predictors, meaning your database would be 120 by 1001 – obeying here the convention of stating row depth first and the number of columns second.

After a brief discussion of these Big Data applications and some elements of principal components, I illustrate dimension reduction with a violent crime database from the UC Irvine Machine Learning Repository.

Dynamic Factor Models

In terms of forecasting, a lot of research over the past decade has focused on “many predictors” and reducing the dimensionality of “fat” databases. Key names are James Stock and Mark Watson (see also) and Bai.

Stock and Watson have a white paper that has been updated several times, which can be found in PDF format at this link

stock watson generalized shrinkage June _2012.pdf

They write in the June 2012 update,

We find that, for most macroeconomic time series, among linear estimators the DFM forecasts make efficient use of the information in the many predictors by using only a small number of estimated factors. These series include measures of real economic activity and some other central macroeconomic series, including some interest rates and monetary variables. For these series, the shrinkage methods with estimated parameters fail to provide mean squared error improvements over the DFM. For a small number of series, the shrinkage forecasts improve upon DFM forecasts, at least at some horizons and by some measures, and for these few series, the DFM might not be an adequate approximation. Finally, none of the methods considered here help much for series that are notoriously difficult to forecast, such as exchange rates, stock prices, or price inflation.

Here DFM refers to dynamic factor models, essentially principal components models which utilize PC’s for lagged data.

Note also that this type of autoregressive or classical time series approach does not work well, in Stock and Watson’s judgment, for “series that are notoriously difficult to forecast, such as exchange rates, stock prices, or price inflation.”

Presumably, these series are closer to being random walks in some configuration.

Intermediate Level Concepts

Essentially, you can take any bundle of data and compute the principal components. If you mean-center and (in most cases) standardize the data, the principal components divide up the variance of this data, based on the size of their associated eigenvalues. The associated eigenvectors can be used to transform the data into an equivalent and same size set of orthogonal vectors. Really, the principal components operate to change the basis of the data, transforming it into an equivalent representation, but one in which all the variables have zero correlation with each other.

The Wikipaedia article on principal components is useful, but there is no getting around the fact that principal components can only really be understood with matrix algebra.

Often you see a diagram, such as the one below, showing a cloud of points distributed around a line passing through the origin of a coordinate system, but at an acute angle to those coordinates.

pcpic

This illustrates dimensionality reduction with principal components. If we express all these points in terms of this rotated set of coordinates, one of these coordinates – the signal – captures most of the variation in the data. Projections of the datapoints onto the second principal component, therefore, account for much less variance.

Principal component regression characteristically specifies only the first few principal components in the regression equation, knowing that, typically, these explain the largest portion of the variance in the data.

An Application to Crime Data

Looking for some non-macroeconomic data to illustrate principal components (PC) regression, I found the Communities and Crime Data Set in the University of California at Irving Machine Learning Repository.

The data do not illustrate “many predictors” in the sense of more predictors than observations.

Here, the crime and other data comprise 128 variables, including a violent crime variable, which are collated for 1994 cities. That is, there are more observations than predictors.

The variables included in the dataset involve the community, such as the percent of the population considered urban, and the median family income, and involving law enforcement, such as per capita number of police officers, and percent of officers assigned to drug units. The per capita violent crimes variable was calculated using population and the sum of crime variables considered violent crimes in the United States: murder, rape, robbery, and assault.

I standardize the data, dropping variables with a lot of missing values. That leaves me 100 variables, including the violent crime metric.

This table gives you a flavor of the variables included – you have to interpret the abbreviations

crime

I developed a comparison of OLS regression with principal components regression, finding that principal component regression can outperform OLS in out-of-sample predictions of violent crimes per capita.

The Matlab program to carry out this analysis is as follows:

Matalbp

So I used a training set of 1800 cities, and developed OLS and PC regressions to predict violent crime per capita in the remaining 194 cities.  I calculate the  principal components (coeff) from a training set (xtrain) comprised of the first 1800 cities. Then, I select the first twenty pc’s  and translate them back to weightings on all 99 variables for application to the test set (xtest). I also calculate OLS regression coefficients on xtrain.

The mean square prediction error (mse1) of the OLS regression was 0.35 and the mean square prediction error (mse2) of the PC regression was 0.34 – really a marginal difference but large enough to make the point.

What’s really interesting is that I had to use the first twenty (20) principal components to achieve this improvement. Thus, this violent crime database has a quite diverse characteristic, compared with many socioeconomic datasets I have seen, where, as noted above, the first few principal components explain most of the variation in the data.

This method – PC regression – is especially good when there are predictors which are closely correlated (“multicollinearity”) as often is the case with market research surveys of consumer attitudes and income and wealth variables.

The bottom line here is that principal compoments can facilitate data reduction or regression regularization. Quite often, this can improve the prediction capabilities of a regression, when compared with an OLS regression using all the variables. The PC regression assigns higher weights to the most important predictors, in effect performing a kind of variable selection – although the coefficients or pc’s may not zero out variables per se.

I am continuing to work on this data with an eye to implementing k-fold cross-validation as a way of estimating the optimal number of principal components which should be used in the PC regressions.

Estimation and Variable Selection with Ridge Regression and the LASSO

I’ve posted on ridge regression and the LASSO (Least Absolute Shrinkage and Selection Operator) some weeks back.

Here I want to compare them in connection with variable selection  where there are more predictors than observations (“many predictors”).

1. Ridge regression does not really select variables in the many predictors situation. Rather, ridge regression “shrinks” all predictor coefficient estimates toward zero, based on the size of the tuning parameter λ. When ordinary least squares (OLS) estimates have high variability, ridge regression estimates of the betas may, in fact, produce lower mean square error (MSE) in prediction.

2. The LASSO, on the other hand, handles estimation in the many predictors framework and performs variable selection. Thus, the LASSO can produce sparse, simpler, more interpretable models than ridge regression, although neither dominates in terms of predictive performance. Both ridge regression and the LASSO can outperform OLS regression in some predictive situations – exploiting the tradeoff between variance and bias in the mean square error.

3. Ridge regression and the LASSO both involve penalizing OLS estimates of the betas. How they impose these penalties explains why the LASSO can “zero” out coefficient estimates, while ridge regression just keeps making them smaller. From
An Introduction to Statistical Learning

ridgeregressionOF

Similarly, the objective function for the LASSO procedure is outlined by An Introduction to Statistical Learning, as follows

LASSOobkf

4. Both ridge regression and the LASSO, by imposing a penalty on the regression sum of squares (RWW) shrink the size of the estimated betas. The LASSO, however, can zero out some betas, since it tends to shrink the betas by fixed amounts, as λ increases (up to the zero lower bound). Ridge regression, on the other hand, tends to shrink everything proportionally.

5.The tuning parameter λ in ridge regression and the LASSO usually is determined by cross-validation. Here are a couple of useful slides from Ryan Tibshirani’s Spring 2013 Data Mining course at Carnegie Mellon.

RTCV1

RTCV2

6.There are R programs which estimate ridge regression and lasso models and perform cross validation, recommended by these statisticians from Stanford and Carnegie Mellon. In particular, see glmnet at CRAN. Mathworks MatLab also has routines to do ridge regression and estimate elastic net models.

Here, for example, is R code to estimate the LASSO.

lasso.mod=glmnet(x[train,],y[train],alpha=1,lambda=grid)
plot(lasso.mod)
set.seed(1)
cv.out=cv.glmnet(x[train,],y[train],alpha=1)
plot(cv.out)
bestlam=cv.out$lambda.min
lasso.pred=predict(lasso.mod,s=bestlam,newx=x[test,])
mean((lasso.pred-y.test)^2)
out=glmnet(x,y,alpha=1,lambda=grid)
lasso.coef=predict(out,type=”coefficients”,s=bestlam)[1:20,]
lasso.coef
lasso.coef[lasso.coef!=0]

 What You Get

I’ve estimated quite a number of ridge regression and LASSO models, some with simulated data where you know the answers (see the earlier posts cited initially here) and other models with real data, especially medical or health data.

As a general rule of thumb, An Introduction to Statistical Learning notes,

 ..one might expect the lasso to perform better in a setting where a relatively small number of predictors have substantial coefficients, and the remaining predictors have coefficients that are very small or that equal zero. Ridge regression will perform better when the response is a function of many predictors, all with coefficients of roughly equal size.

The R program glmnet linked above is very flexible, and can accommodate logistic regression, as well as regression with continuous, real-valued dependent variables ranging from negative to positive infinity.

 

The Tibshirani’s – Statistics and Machine Learning Superstars

As regular readers of this blog know, I’ve migrated to a weekly (or potentially longer) topic focus, and this week’s topic is variable selection.

And the next planned post in the series will compare and contrast ridge regression and the LASSO (least absolute shrinkage and selection operator). There also are some new results for the LASSO. But all this takes time and is always better when actual computations can be accomplished to demonstrate points.

But in researching this, I’ve come to a deeper appreciation of the Tibshiranis.

Robert Tibshirani was an early exponent of the LASSO and has probably, as much as anyone, helped integrate the LASSO into standard statistical procedures.

Here’s his picture from Wikipedia.

RobertTib2

You might ask why put his picuture up, and my answer is that Professor Robert Tibshirani (Stanford) has a son Ryan Tibshirani, whose picture is just below.

Ryan Tibsharani has a great Data Mining course online from Carnegie Mellon, where he is an Assistant Professor.

RyanTib

Professor Ryan Tibshirani’s Spring 2013 a Data Mining course can be found at http://www.stat.cmu.edu/~ryantibs/datamining/

Reviewing Ryan Tibsharani’s slides is very helpful in getting insight into topics like cross validation, ridge regression and the LASSO.

And let us not forget Professor Ryan Tibshirani is author of essential reading about how to pick your target in darts, based on your skill level (hint – don’t go for the triple-20 unless you are good).

Free Books on Machine Learning and Statistics

Robert Tibshirani et al’s text – Elements of Statistical Learning is now in the 10th version and is available online free here.

But the simpler An Introduction to Statistical Leaning is also available for an online download of a PDF file here. This is the corrected 4th printing. The book, which I have been reading today, is really dynamite – an outstanding example of scientific exposition and explanation.

These guys and their collaborators are truly gifted teachers. They create windows into new mathematical and statistical worlds, as it were.