R getting easy: Central tendency and more

Making vector and combining

We can make vectors with commands

a <- seq (1,10) # make sequence number from 1 to 10 and store in 'a'

b <- seq (10,1)

c <-cbind (a, b) # making matrix by combining two vectors

c # check what is in 'c'

Now try find mean, standard deviation and variance from above matrix c

mean (c) # mean of 'c'

sd (c) # standard deviation of 'c'

var (c) # variance of 'c'

Import data to begin (open and copy data from sheet 1, in demo data excel file)

read.table ('clipboard',header=T)-> a.df # command can be composed this was as well

attach(a.df) # defining the object to work

mean(a.df) # the mean

max(a.df) # the maximum or largest value

min(a.df) # the minimum value

sd (a.df) # standard deviation

var(a.df) # variance

summary(a.df) # to see summary of all variables at once

Checking Dataframe properties

dim(a.df) # dimensions of a matrix or data frame

ncol(a.df) # number of columns

nrow(a.df) # number of rows

colnames(a.df) # give headings of the columns

rownames(a.df) # row headings

Adding a column in a.df, called multiple, which will have average of column age and weight of a.df. The 'dollor' sigh refers column of dataframe

a.df $ avg<-(a.df$age + a.df $ weight)/2

rowSums(a.df)

a.df$sum<-rowSums(a.df) # adding sum col

colSums(a.df)

rowSums(a.df)

rowMeans(a.df)

colMeans(a.df)

Try yourself adding a column with values

multiple = 'age' * 'weight'

Some simple plots

We are working the the same previous data, if need import again (copy data from sheet 1, in demodata excel file). Here, pratice some basic plots

bio.df<- read.table ('clipboard', header=T)

attach (bio.df)

plot (age,weight) # plot (predictor, response) i.e. x, y

plot (age,height)

plot (height,weight)

hist (age) # plot histogram

hist (weight)

hist (height)

hist (age, nclass=6)

boxplot(weight,height) # plot boxplot

To make ease in working, we can first attach the data frame/matrix = (it tells R to work with the assigned object)

attach(a.df)

This will shorten the command when we deal with each variables separately from an object. If 'attach' is forgot, follow command will result error.

mean (age)

mean (weight)

mean (height)

median (age)

median (weight)

median (height)

If we do not 'attach(file)', we will need to command specifying variable and file name, eg.

mean(a.df $ age)

sd(a.df $ weight)

Installing Package in R

In R-sofware main menu 'Packages', go to 'Install Pakcage(s)'. Then choose a CRAN from there (anyone, nearest location will be better). Then find the required package name in the list, click it and click 'OK'. Here, installation package prettyR.

Or if you would like to install offline, then downlaod the zip file of the R package first from CRAN page and install it latter from main menu Packages and Install Package(s) from local zip files...

Loading the package to make it functional load in R

To calculate Mode, we will load the library “prettyR” as this function is not directly available in default libraries.

And the library do not come with the installation of R, so we installed the package first (above) and now we load the library

library(prettyR)

Mode(x) # mode calculatioon (x is any variable)

Mode (age)

Mode (weight)

Mode (height)

You may want to practice more installation of packages.

Using 'apply' function

The apply function is used for applying functions to rows or columns of matrices or dataframes

x<-matrix(1:24,nrow=4)

apply(x,1,sum) #1 = row, 2 = col

apply(x,1,sum)

In this case, the above commands are equivalent to

colSums(x)

rowSums(x)

Try few more

apply(x,1,sqrt)

apply(x,2,sqrt)

To apply a function to vector/variable i.e. column (in matrix or dataframe) then use 'sapply' (rather than 'apply')

read.table('clipboard',header=T)->bio.df # import (from demo.xls, sheet 1) in case previous one is replace or cleared

sapply(bio.df,mean) # find means of each variables

Above 'sapply' is equivalent to

apply(bio.df,2,mean) # find means of each variables

Try these as well

sapply(bio.df,sum) # give col sum

sapply(bio.df,sqrt) # square root of all values

sapply(bio.df,sample) # samples in each col

sapply(bio.df,levels) # lists levels in categorical variables

More under construction .....