Saturday, March 2, 2013

Introduction to R

R is a open source programming language. R is being used by wide disciplines for data analysis. Ecologists are one of them. Ecological data can be analyzed in R. R also can be used for predicting future at provided scenarios. R comes with some basic packages also know as library. Users can add more library or packages as per their requirement. The R software and other required packages can be downloaded from its CRAN pages.
This blog is intended to new beginners of R. Viewers are well-come to this blog. Here we start the job in following steps.
1. Install the R base. To download latest version of R please click here. Follow the Download R link in the new page and choose any CRAN you find near to your location. Then install the downloaded .exe file of R.
2. In windows, you will find the install R software in start menu-all program-R. Users can run the application by clicking it.

Why use R?
An experienced user will probably give you these five arguments:
  • R is a modern programming language with high flexibility for writing functions or coding statistical models that fits different designs,
  • A language is easier to communicate with others over e-mail etc. than "point-and-click"
  • R is continuously being developed by professional statisticians and programmers from reputable scientific groups,
  • R is available for most operative systems, and
  • R is open source software! 
How does R works?
We give commands when we want R to perform any task. The output from command is shown in the same window, or in a separate window if you give graphical commands.
When giving a specific command we normally call for one or several system supplied functions.

Data types in R
Class: Character, Numeric, Integer, Logical
ObjectsVectors, Matrices, Data frames, Lists, Factors, Missing values
   A Vector: A set of values with the same class
   A list: A vector of values of possibly different classes
   Matrices: Vectors with multiple dimensions
   Data frame: Multiple vectors of possibly different classes, of the same length
   Factor: Qualitative variables that can be included in models
   Missing value: In R they are usually coded NA
OperationsSubsetting, Logical subsetting

Getting started with R

Here we will start working with R. First we will begin R by using R as a simple calculator. This will ease understanding about R codes/scripts.
Now onward viewers will see two different kinds of font in this blog. This regular font is for illustating the methods and giving detail about the things. And another font starts here is to give the R syntatx or command, only this font text should be copied to R when working with R.
Users type the syntax next to the Prompt '>' symbol


This is where we write our R-syntax commands.
Syntax may be written in one line but if we press enter before being finished, R waits us to enter the missing part on a new line etc.
we see this by the normal prompt (>) changing to become a plus sign (+).

Try these
2 + 2                                      # simple sum
2 * 3                                        # simple multiplication
2 ^ 5                                       # simple power function
10/2                                          # simple division
8^(1/3)                                    # cube root
round (10.6666264, 3)  # round off to 3 decimal place
abs (16 – 19)                  # absolute value

Test yourself:  8^(1/3)   &  8^1/3,  find what is difference and how should you go ahead.
Try the rightside two equations. How do you express them and what will be the results

To make objects is a central part of R-programming. Objects may be variables, tables, character strings, functions or more general structures built from different components. The way to assign something to an object is to use the arrow sign "->" that constructed by typting a '-' and a '>'. An examples where you want to assign 33 to an object you call 'x' looks like this
x <- 33      (or 33->x)

If you want to have an overview of the objects you have created during an R session, you can use one of the following commands
objects()    or     ls()

Lets create an object and perform some basic statistics. Remember, the text after "#" is not a part of syntax or command. It is a note to illustrate the command
x <- c(45,43,46,48,51,46,50,47,46,45)                  # Creating an object
Syntax                                         Task to find:
mean(x)                                 # the mean
median(x)                             # the median
max(x)                                  # the maximum or largest value
min(x)                                  # the minimum value
sd(x)                                    # standard deviation
var(x)                                   # variance

More practice on objects and their treatments are below
a = 5
b = 10
c = a*b             # Result will be as of 5*10
Check by enter
c
x <- c(45,43,46,48,51,46,50,47,46,45) # creating vector 'x', it will replace previous 'x' vector
y <- 5.3            # Creating 'y' vector
z= x*y
Check the result by entering
z

To work with complex numbers, supply an explicit complex part. Thus
sqrt(-17)
Or
(-17) ^(1/2)
will give NaN and a warning, but

sqrt(-17+0i)
Or
(-17+0i)^(1/2)
Above syntax dill do the computations as complex numbers.

When use can create any object in R, user can also delete those objects. The command following command will delete all objects user has create.
rm(name-of-object)
If user would like to remove the object "x" only, it is done as follows:
rm(x)
For removing all objects at once you have created, follow the syntax below.

rm(list=ls(all=TRUE))
It is good idea to keep R free of any unnecessary objects.

Some norms and rules in R
There are some general rules about naming objects, vectors, data frames etc:-
   Do not use space in variable names.
     Example: swimming.speed but not swimming speed
   Use a point (.) as decimal symbol (Norwaygian style uses ’,’ for ’.’)
   Missing values should be coded as NA.
   Avoid underscore ( _ ) in variable names.
     Example: names like swimming_speed should be avoided
    Avoid number at start of a variable name.
     Example: names like 07swimming should be avoided.
   Avoid using object names that is the same as a variable name within a data frame, matrix etc.
     Example: Do not call the dataframe time when the data frame itself contains a variable that is named time
   Avoid using variable names that is reserved for functions.
   Finally, be aware that R is case sensitive.
     Thus if a variable is named Time, R will not understand if you refer to it as time

Importing Data into R
  • The safest way to get large amounts of data into R is to save them in a tab- or comma-delimited text file (*.txt) before importing them.
  • Such files are easy to create from a spreadsheet i.e. Excel or other database programs
  • It is recommend to give each variable a name in the text file before it is imported.
  • This is done by letting the first row represent names of the variables.
  • Such a tab- and command-delimited text file may look like this when viewed in a text editor.
Let's assume that a text file is stored on a MS Windows computer under the folder D:\Data\ and with the file name example.txt  . To import these data into R, you write the following syntax:
data.df <- read.table ('C:/Data/example.txt', header=T , ...)
Or
data.df <- read.table('clipboard', header =T,...)
data.df  is a name you are giving to the object that will contain the dataset (you may of course call it something else). The arrow (<-) sign is an assignment as explained above.
The right side of the assignment tells the program to read the file in table format and create a data frame from it (defined by the function read.table).  
When the text file to be imported includes names of the variables in the first row (recommended), you must specify this by the command header=T, where T means TRUE. If no variable names are present in the text file to be imported, you must omit header=T or write header=F. The program will then write its own names on each column.

Import data from demo.xls, sheet 2, and assign dataframe name test.df
To view the data frame in R (that is just created during import, just type the object (data frame) name and press enter) e.g.
test.df
To see the names of variables (or vectors in data frame) give the command name as below
names(test.df)
The output from the above syntax is as follows:
"year" ”temp" ”pcpn"


R is a language, the more you speak, the more you will learn.
-MJ Corley

More under construction......
thanks for your visit