This is a file from the Wikimedia Commons

File:Datest Outlier.JPG

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Original file(1,028 × 746 pixels, file size: 103 KB, MIME type: image/jpeg)


Summary

Description
English: This picture has been created with MD*Tech XploRe.


Barchart

This chart shows classes of countries according to the dimensions, in which they are considered to be "boxplot outliers".

Starplot

This plot shows the characteristics of each observation as stars, where each axis of the star represents one variable. For simplicity they are standardized to a 0-1-scale by default settings.

Usage of the Program

thumb|right|250px| Observations are colored blue if outlier in more than 3 dimensions. Click on the image to see which countries.

With the help of this program, we tried to overcome the handicaps associated with univariate outlier treatment as it is regularily done with boxplots. Therefore we computed a 163 x 14 matrix containing logical values 0 or 1 for each observation in all relevant dimension (country- and continentcode excluded), where 1 means observation is an outlier in that dimension.

This can be used to decide in a more appropriate manner which observations should be excluded from the main analysis. That means one decides according to the number of dimensions where the countries are "boxplot outliers", shown in the barchart, and confirms or refuses that decision with respect to the stardiagram.

The dataset without those outliers can optionally be saved by the program for further usage.

Program Code

Attention! For repeating the computation a transformed dataset is needed! If you have not yet computated and saved the transformation, run the program for transformation on the wikipage en:Analysis of Tuberculosis first!


library("stats")

; ----- Reading Data ---------------------------------------------------------------------------

choose = "Read data from:"

defaults = "C:\Dokumente und Einstellungen\All Users\Desktop\UN_data_ordered.csv"

v = readvalue(choose, defaults)

x = readcsvm(v)

data = x.double
country = x.text

; ----- Get Country Names if Data Read Does Not Include It -------------------------------------

proc()=nondefault()

v = getglobal("v")
defaults = getglobal("defaults")

if(v<>defaults)

country = readm("C:\Dokumente und Einstellungen\All Users\Desktop\country.csv")

country = country.text

putglobal("country")

endif

endp

nondefault()

; ----- Compute Outlier-Yes-No-Matrix----------------------------------------------------------

proc(y)=extremvalues(i)

        data = getglobal("data")

        y = matrix(rows(data),cols(data)-2)

do

uq = quantile(data[,i+2], 0.75)
lq = quantile(data[,i+2], 0.25)
iqd = uq - lq

y[,i] = data[,i+2] > uq + 1.5*iqd || data[,i+2] < lq - 1.5*iqd

i = i + 1

until(i>cols(data)-2)

putglobal("y")

endp

extremvalues(1)

; ----- Compute Sum of Extreme Values per Observation and Barchart -----------------------------

sum(y)

evf = sum(y[,2:cols(y)]')'

{cat, freq} = discrete(evf)

evF = cat~freq

evF

gr1 = grbar(evf)

setsize(800, 600)
d = createdisplay(2, 1)

show(d, 1, 1, gr1)

title1 = "Extreme Values per Observation"
xlabel1 = "number of dimensions, where observation is an outlier (boxplot defintion)"
ylabel1 = "number of observations"
setgopt(d, 1, 1, "yoffset", 15|15, "xoffset", 15|5, "title", title1, "xlabel", xlabel1,"ylabel", ylabel1)

; ----- Option to Choose "Outlier Limit" -------------------------------------------------------

choose2 = "Observation considered as outlier, if extreme values in more than ... dimensions."

defaults2 = 3

v2 = readvalue(choose2, defaults2)

outlier = paf((1:rows(data))~y, evf>v2)

; ----- Outlier Characteristics ----------------------------------------------------------------

outlier

country[outlier[,1]]

dataoutex = paf(data, evf<=v2)

dim(dataoutex)

namesdatarest = country[dataoutex[,1]]
namesdatarest

; ----- Stardiagram ----------------------------------------------------------------------------

col  = grc.col.green-grc.col.blue
col  = grc.col.blue+col*(evf<=v2)

gr2 = grstar(data[,4:cols(data)], col)
show(d, 2, 1, gr2)

title2 = "Starplot (Outlier in more than ... dimensions: blue)"
setgopt(d, 2, 1, "yoffset", 15|15, "xoffset", 15|5, "title", title2)

; ----- Saving Data Option ---------------------------------------------------------------------

proc()=save(dataoutex)

head2 = "Save dataset without outliers"

item2 = "Ok, save as ..." | "Cancel"

sel2 = selectitem(head2, item2)

switch

case(sel2[1]==1 && sel2[2]==0)

folder = "Save to:"

default3 = "C:\Dokumente und Einstellungen\All Users\Desktop\UN_data_outl_excl.csv"

v3=readvalue(folder, default3)

writecsv(dataoutex, v3)

endsw

endp

save(dataoutex)

Date 30 March 2007 (original upload date)
Source Transferred from en.wikibooks to Commons.
Author Schtiwi at English Wikibooks

Licensing

Schtiwi at the English Wikipedia, the copyright holder of this work, hereby publishes it under the following license:
GNU head Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled GNU Free Documentation License.
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 3.0 Unported license.
Attribution: Schtiwi at the English Wikipedia
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.
This licensing tag was added to this file as part of the GFDL licensing update.

Original upload log

The original description page was here. All following user names refer to en.wikibooks.
Date/Time Dimensions User Comment
2007-03-30 11:40 1028×746× (105367 bytes) Schtiwi This picture has been created with MD*Tech XploRe.

Captions

Add a one-line explanation of what this file represents

Items portrayed in this file

depicts

30 March 2007

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current14:19, 19 August 2017Thumbnail for version as of 14:19, 19 August 20171,028 × 746 (103 KB)JackPotte{{BotMoveToCommons|en.wikibooks|year={{subst:CURRENTYEAR}}|month={{subst:CURRENTMONTHNAME}}|day={{subst:CURRENTDAY}}}} == {{int:filedesc}} == {{Information |Description={{en|This picture has been created with MD*Tech XploRe. == Barchart == This char...

The following page uses this file: