tirsdag 28. februar 2017

Correlation and trend when an outlier is added

This is the fourth blog post in a series of six that deals with mathematics for calculation of correlation and trend in data with outliers. The posts are numbered 1 to 6. They should be read consecutively.

Post 1  Introduction to Statistical analysis of data with outliers
Post 2  Correlation when outliers in the data.
Post 3  Trend when outliers in the data.
Post 4  Correlation and trend when an outlier is added. Example
Post 5  Compare Kendall-Theil and OLS trends.                  Simulations.
Post 6  Detect serial correlation when outliers.                     Simulations.

The posts are gathered in this pdf document.

Start of post 4:
Correlation and trend when an outlier is added.


This blog post contains an example that demonstrates the shortcomings of the mostly used methods to calculate trends and correlations when an outlier is added to the data. It demonstrates that alternative methods based on medians and ranks are more robust against outliers.

lørdag 25. februar 2017

Trend when outliers in the data

This is the third post in a series of six that deals with mathematics for calculation of correlation and trend in data with outliers. The posts are numbered 1 to 6. They should be read consecutively.

Post 1  Introduction to Statistical analysis of data with outliers
Post 2  Correlation when outliers in the data.
Post 3  Trend when outliers in the data.
Post 4  Correlation and trend when an outlier is added.    Example.
Post 5  Compare Kendall-Theil and OLS trends.              Simulations.
Post 6  Detect serial correlation when outliers.                 Simulations.

The posts are gathered in this pdf document.

Start of post 3: Trend when outliers in the data


The method most commonly used in linear regression analysis is to calculate the trend based on the data values. But it is more robust against outliers to calculate it based on the ranks of the data. This blog post discusses the mathematics behind both methods.

onsdag 22. februar 2017

Correlation when outliers in the data

This is the second post in a series of six that deals with mathematics for calculation of correlation and trend in data with outliers. The posts are numbered 1 to 6. They should be read consecutively.

Post 1  Introduction to Statistical analysis of data with outliers
Post 2  Correlation when outliers in the data.
Post 3  Trend when outliers in the data.
Post 4  Correlation and trend when an outlier is added.   Example.
Post 5  Compare Kendall-Theil and OLS trends.             Simulations.
Post 6  Detect serial correlation when outliers.                Simulations.

The posts are gathered in this pdf document.

Start of post 2: Correlation when outliers in the data


The method most commonly used to estimate the correlation between two datasets is to calculate the correlation coefficient based on the values in the two data sets.. But it is more robust against outliers to calculate it based on the ranks of the data. This blog post discusses the mathematics behind both methods.

fredag 17. februar 2017

Introduction to Statistical analysis of data with outliers

This is the first blog post in a series of six that deals with mathematics for calculation of correlation and trend in data with outliers. The posts are numbered 1 to 6. They should be read consecutively. This first post is just an introduction.

Post 1  Introduction to Statistical analysis of data with outliers
Post 2  Calculate correlation when outliers in the data.
Post 3  Calculate trend when outliers in the data.
Post 4  Correlation and trend when an outlier is added.   Example.
Post 5  Compare Kendall-Theil and OLS trends.             Simulations.
Post 6  Detect serial correlation when outliers.                Simulations.

The posts are gathered in this pdf document.

Start of post 1 Introduction to Statistical analysis of data with outliers


Five blog posts in June 2014 deal with the mathematics that is most commonly used when analysing global temperature series. That mathematics is not well suited when there are large outliers in the data. The first blog post in that series gives an overview of those five posts.

Ordinary least square (OLS) error mathematics is the most commonly used method to calculate trends. It is based on data values, and it therefore performs poorly when there are large outliers in the data. Global temperatures do not have large outliers due to both the inertia in the global climate system and due to the thorough processing before the temperature data is released. Other climate data, such as precipitation, snow depth and skiing conditions at specific locations, have large outliers, and the OLS mathematics is not suitable for those data.

The calculation of the Pearson correlation coefficient is also based on data values. This is the most commonly used method to calculate correlation between variables. It too performs poorly when there are large outliers in the data.

Mathematics based on data ranks performs better than mathematics based on data values when analysing data with large outliers. In this series of blog posts I will describe the rank mathematics which I use to calculate the Kendall tau-b correlation coefficient and the Kendall-Theil robust trend line. For comparison I also shortly describe the Pearson and the OLS mathematics.

As will be seen, the mathematics that is used to calculate the Kendall tau-b correlation coefficient and the Kendall-Theil robust trend line is rather simple and easy to explain. But the mathematics that is used to quantify their uncertainties, which are p-values and confidence intervals, is more complicated.

Next post in the series