2008年12月4日星期四

The attempt to understand LOESS

I first look up some English papers and literatures on LOESS, which confuse me quite a lot. Then I find an article written by Xie Yihui in Chinese (http://cos.name/2008/11/lowess-to-explore-bivariate-correlation-by-yihui/) which concisely and clearly introduces the main ideas of LOESS. It is helpful for me to further understand the details of the English introduction to loess.

Local weighted polynomial regression (LOESS) is a statistical method which aims at proportionally generating regression lines from a localized and limited data, which are integrated into a curve to show the whole trend of data without losing the important details that a small proportion of data may reflect. The method, I believe, is similar to the concept of approximation in calculus in mathematics, constructs a linear equation within the vicinity of a given point.

LOESS is a good adjustment for classical statistical methods (e.g. linear least square) to the flexibility of modern procedures. The algorithm is determined by two key parameters: the bandwidth which controls the smoothing property of curve and the degree of local polynomials which reflects the accurateness and complex of imitation。.(e.g. the first degree is linear, the second degree is quadratic.

Although higher degree of local polynomials can fit the empirical data better, it also makes the calculation expand to quite a big one and go against the sprit of LOESS. Therefore, selecting an available degree is very important for an effective LOESS. A general polynomial of degree p is calculated as the first picture below indicates. The second picture shows formula for calculating coefficient a (i) which is used to interpolate the local regression value at x. In the second formula, X is a Design Matrix; W is a Diagonal Matrix ; Y is simply the y value of the data.



(quoted from http://voteforamerica.net/Docs/Local%20Regression.pdf)

The method to select the bandwidth is Mean integrated Square error which is considered a reliable method for the selection of optimal bandwidth.

As it is said in the Wikipedia (http://en.wikipedia.org/wiki/Local_regression) , the advantages of LOESS are various: first, it is unnecessary to produce certain function to fit the model to all the data. Second, it is flexible to imitate some kinds of data that have no certain mathematical model. However, some disadvantages still exist. Effective LOESS needs a large sample of data. Besides, it is difficult for us to extract a concrete mathematical formula from LOESS, so the possibility of extending it to other instances is no obviously be limited.

Loess technique can be very useful to various fields. In the website below, loess technique is applied to reflect the trend of American election poll and attempt to predict the ultimate outcome of election.

(http://voteforamerica.net/editorials/Comments.aspx?ArticleId=28&ArticleName=Electoral+Projections+Using+LOESS)

In addition, LOESS can be used in neurocognitive science as a useful smoothing technique of data.

Reference:


谢益辉:用局部加权回归散点平滑法观察二维变量之间的关系

http://cos.name/2008/11/lowess-to-explore-bivariate-correlation-by-yihui/


NIST Engineering Statistics Handbook Section on LOESS

http://www.itl.nist.gov/div898/handbook/pmd/section1/pmd144.htm

apply LOESS to describe the trend of American election.

http://voteforamerica.net/polls.aspx

local regression

http://voteforamerica.net/Docs/Local%20Regression.pdf

the article about local regression in wikipedia

http://en.wikipedia.org/wiki/Local_regression

没有评论: