User:Athampan/Lesson1

From Wikibooks, open books for an open world
Jump to navigation Jump to search

Introduction to Data Analysis[edit | edit source]

The intended audience of this book is students who wish to learn the basics of data analysis. The main focus will be on astronomical applications, because of the expertise of the authors, but the majority of the techniques will be applicable to all fields of science and to a broader data mining context.

The organization will be in chapters which each chapter containing links to examples of the techniques in IDL or Python. The student is encouraged to use and modify the programs to understand the concepts.

Software Packages[edit | edit source]

IDL[edit | edit source]

Installation[edit | edit source]

IDL (Interactive Data Language) is a high-level programming language first designed for use in astronomy. The commercial version is available from Harris Geospatial. There are two free analogues:

Both languages are compatible with IDL 8.0 and most tutorials and programs for IDL may be used verbatim. We will refer to IDL as a generic term for any of the three languages.

Tutorials[edit | edit source]
Collections[edit | edit source]
  • IDLASTRO: The most comprehensive collection of IDL astronomy programs available.

Python[edit | edit source]

Arun - this section is for you.


Basic Concepts[edit | edit source]


The goal of any science is to develop a model that will allow one to predict the behaviour of any given system. A simple system may be to predict the acceleration of a falling object. All the parameters are known and the final velocity is a function only of the time. Other systems are more complex and may depend on many parameters, not all of which are known. The job is further complicated by the data which we collect which may be subject to experimental errors, both random and systematic.

Programming[edit | edit source]


Any field of science now requires a familiarity with computers and the ability to use at least one programming language. The programs in this module require the installation of GDL, a free data processing language that is available for a number of platforms from the main GDL site. A brief guide to its installation is here. GDL is an interpreted language which is well suited to data analysis and plotting. Because GDL is largely compatible with IDL, IDL tutorials are usually applicable to GDL and are available here, here and here. A library of astronomical routines are at Wayne Landsman’s site.

Types of Errors[edit | edit source]

Data points are always subject to errors and the modeling and interpretation of astronomical data is highly dependent on a proper statistical treatment of the data and the errors. Illegitimate errors are errors that are due to some mistake with the equipment, recording or computation. They will not be repeatable over the long run although they may be instrument dependent. Systematic errors are errors that are inherent in the experimental set up but which may be corrected for. If the experiment is repeated in exactly the same manner, the systematic errors will be consistent. An example is a balance where the balance pan has a weight which is not subtracted from the total. Random Errors are different each time the measurement is done, either due to instrumental uncertainty or to statistical uncertainty.

Random Numbers[edit | edit source]

Simulations are an integral part of modern astronomy but, in order to make them more realistic, we have to add experimental noise to them. In this section, we will discuss random numbers and will illustrate them with GDL commands. The commands will be in bold. Advanced comments will be in square brackets [...].

Uniform Distribution[edit | edit source]

In a uniform distribution, the probability of finding a value is the same everywhere within the range.


u=randomu(seed,10000)

Uniform random array with 10,000 elements between 0 and 1. seed should not be defined beforehand. [If seed is defined, the sequence of numbers will always be the same. This is useful for testing.]

plot,u,psym=3

Plot the data with symbol type dot (.). The x axis is the index; the y axis is the value of u[i].

Probability and Statistics:

   What is probability, Calculus of Probability, Moments of the distribution, Discrete probability distributions:   Binomial Distribution, Poisson Distribution, Moments of Poisson Distribution, sqrt(n), Applications: Detection of a Source,  Isotropy of the universe, Identifying sources, Continuous probability distributions: Gaussian Distribution, Random variables, Applications: , Addition of random variable:  Multivariate distribution, Probability distribution of summed random variables, Error Propagation, Characteristic Functions, Central Limit Theorem: Derivation, Measurement Theory, Gaussian equal to series of convolutions with rectangular function, Sampling distributions: Sample variance Application