These are the notes I took from here and there, including Coursera Data Analysis course and R's online help, with help.start

# Basics

R objects have attributes which can be observed by attributes() functions.

<- is the assignment operator.

: is used to create integer sequences. 1:4 = 1 2 3 4

c function can be used to create vectors from different kinds of objects. (concatenate) c(TRUE, FALSE) creates a logical vector. c(1+3i, 4+8i, 3-5i) creates a complex vector. Type-coercion happens if different kinds of objects are mixed.

as... functions can be used to coerce different kinds of data. as.numeric(TRUE) returns 1.

matrix(ncol = 3, nrow = 4) creates a matrix.

cbind and rbind are other options to create a matrix from vectors by binding them to columns or rows.

factors are categorized data, like male/female. They are created using factor function. table and unclass functions can also be used to get information and change the factor into a numeric table.

levels parameter to factor function can be used to determined the factor  number correspondence.

is.nan and is.na functions are used to check whether vector values are NaN or NA.

data.frames are used to store tabular data like matrices. unlike matrices, they can store different types of data in each column. first column can be numeric, second can be factor, third can be logical etc.

data.frames are created usually by read.table and read.csv functions. each row has a name which can be accessed by row.names. It can be converted to matrix with data.matrix.

nrow and ncol functions can be used to get row and columns sizes of data.table.

str and summary functions can be used to get a concise information about a structure.

Use getwd to report the working directory, and setwd to change it.

ls function displays the names of objects in your workspace:

> x <- 10
> y <- 50
> z <- c("three", "blind", "mice")
26 | Chapter 2: Some Basics
> f <- function(n,p) sqrt(p*(1-p)/n)
> ls()
[1] "f" "x" "y" "z"


rm function removes, permanently, one or more objects from the workspace.

# Scripts

R will execute the .Rprofile script when it starts. The place of .Rprofile depends upon your platform: In Linux / Unix save the file in your home directory ~/.Rprofile

The source function instructs R to read the text file and execute its contents:

source("myScript.R")


On the command line, this can be run as

$R CMD BATCH /home/jim/psych/adoldrug/partyuse1.R  Managing various objects used in R can be challenging. Using the directory structure to sort these objects into sensible categories can be a big help. Instead of starting the R session in a particular directory, you may wish to keep a directory of R scripts and allow these to change the working directory to suit whatever task they perform. par(ask=TRUE) requires you to hit before each plot is displayed. readline("Press <Enter> to continue") presents a prompt. # Vectors Vectors are created like v <- c(1.1, 2.2, 3.3). Vectors can be used in arithmetic expressions. x <- v + 2 * w A shorter vector is cycled until it reaches the length of longer vector in arithmetic expressions. range returns the minimum and maximum elements of a vector. sort sorts a vector in increasing order. sqrt(-17) returns NaN but sqrt(-17+0i) returns a result. Regular sequences are generated by : operator. 4:10 returns [4, 5, 6, 7, 8, 9, 10]. This is a syntactic sugar for seq function which also receives step size and length parameters. rep function repeats the supplied elements to create a vector. # Arrays If z is a vector with 1500 elements (e.g. z <- 1:1500), then dim(z) <- c(3, 5, 100) makes it a 3D array with the respective boundaries. Another way to create an array is like x <- array(1:20, dim=c(4,5)) # Matrices Two matrices A and B can be multiplied like A %*% B. A linear equation of the for b <- A %*% x can be solved by solve(A, b). # Lists A list can be created by list function. List elements don't have to be of the same type. They can be anything from characters to vectors. > mylist <- list(name="Fred", no.children=3, child.ages=c(4, 7, 9))  Components can be reached by index like mylist[[1]] or by component name like mylist$no.children. mylist[[no.children]] can also be used.

Lists are a lot like structs.

c function can be used to concatenate lists.

# Arbitrary Functions

An arbitrary function (similar to lambda) can be created like f <- function(x, y) x + y

# Statistical Functions for Discrete Distributions

library(distrEx) has functions E, var and sd that calculates mean, variance and standard deviation.

Uniform random events can be emulated with sample function. It has three parameters. The first determines the range of values to select randomly, the second (size) determines the number of events and the third (replace) sets whether we replace the balls we have drawn.

1000 dice: sample(6, size=1000, replace=TRUE)

50 random numbers from 1000 to 2000: sample(1000:2000, size=50, replace=TRUE)

Flip a fair coin 100 times: sample(c("H", "T"), size=100, replace=TRUE)

read.table and read.csv reads tabular data from text files.

readLines reads lines of text.

source and dget reads R code files.

load and unserialize is used to read binary objects.

dump and dput are inverse of source and dget. They include the metadata of object in the output.

file is used to open files. gzfile opens gzipped, bzfile opens bzip2 files. url command is used to open a connection to a web page.

## read.table

read.table is the most important function for inputting data.

file is name of the file or connection.

header determines if there is header in the csv input.

sep indicates the separator. (comma, tab, etc.)

colClasses is vector for column data types. specifying this makes R faster twice.

nrows is the number of rows in the dataset.

comment.char indicates the comment char.

with skip it's possible to skip lines from the beginning.

read.csv=read.table, but the default separator is comma.

initial <- read.table("datatable.txt", nrows = 100)
classes <- sapply(initial.class)
tabAll <-  read.table("datatable.txt", colClasses = classes)


# Plotting

plot(x, y) plots the values in x with the corresponding values in y.

Additional parameters to the plot can be used to configure the visual settings.

Use the density function to approximate the sample density. Use lines to draw the approximation:

hist(x, prob=T)
lines(density(x))


# Installing R packages

## Method 1: Install from source

Download the add-on R package, say mypkg, and type the following command in Unix console to install it to /my/own/R-packages/:

## Statistical inference is based on the assumption that

The data must be independent and identically distributed -- that is multinomial with some specified probability distribution. If these assumptions are satisfied, then the χ2 statistic is approximately χ2 distributed with n − 1 degrees of freedom

## using null hypothesis for independence testing

The same statistic can also be used to study if two rows in a contingency table are "independent". That is, the null hypothesis is that the rows are independent and the alternative hypothesis is that they are not independent.