0% found this document useful (0 votes)

19 views19 pages

Introduction to R Statistical Package

This document provides a 3-sentence summary of a document that introduces R: [1] R is a widely used statistical software package that allows users to perform statistical analyses, generate graphics and customize functions. [2] The document reviews R's main features and capabilities for basic statistics, graphics, and creating custom functions across 5 sections. [3] It also provides instructions on downloading, installing and using R, as well as examples of basic calculations, data manipulation, and common statistical analyses that can be performed in R like t-tests and linear regression.

Uploaded by

Joel Tan Yi Jie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views19 pages

Introduction to R Statistical Package

Uploaded by

Joel Tan Yi Jie

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

A Quick Introduction to R

This document very briefly reviews the main features of the R statistical package. For full
details, please refer to the various manuals and help files that come packaged with R, or the
extra material referred to at the end of this document.

This document is in divided into five sections:

1. Introduction: How to download and install, overview of capabilities.

2. R Basics: Covers using R as a calculator, entering data, types of variables, random

number generation, and getting help.

3. Basic Statistics: Covers t-tests with confidence intervals for means, chi-square tests
with confidence intervals for proportions, non-parametrics, simple regression, general-
ized linear models including logistic regression.

4. Graphics: scatter plots, histograms, 3-D plots, boxplots

5. Creating your own functions: We will see here how to create our own functions,
to extend the capabilities of R.

1 Introduction

1.1 What is R?

Quoted directly from the R WWW page ([Link]

“R is a language and environment for statistical computing and graphics. It is

a GNU project which is similar to the S language and environment which was
developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by
John Chambers and colleagues. R can be considered as a different implemen-
tation of S. There are some important differences, but much code written for S
runs unaltered under R.
“R provides a wide variety of statistical (linear and nonlinear modelling, classical
statistical tests, time-series analysis, classification, clustering, ...) and graphical
techniques, and is highly extensible. The S language is often the vehicle of choice

1
for research in statistical methodology, and R provides an Open Source route to
participation in that activity.
“One of R’s strengths is the ease with which well-designed publication-quality
plots can be produced, including mathematical symbols and formulae where
needed. Great care has been taken over the defaults for the minor design choices
in graphics, but the user retains full control.
“R is available as Free Software under the terms of the Free Software Foundation’s
GNU General Public License in source code form. It compiles and runs on a wide
variety of UNIX platforms and similar systems (including FreeBSD and Linux),
Windows and MacOS.”

In short: R is “free Splus”, a very sophisticated statistics and graphics package, used by
most statisticians these days for modern analysis and statistical research.

2 Installation

Installation follows a few trivial steps:

1. Go to R web page, at [Link]

2. On the left hand side menu on the screen, click on “CRAN” which is under the “Down-
load” item.
3. Pick a country site from which to download (for example, University of Toronto under
Canada, but really you can pick any, all this effects is download speed).
4. Near the top of the page (e.g., [Link] inside the box labelled “Pre-
compiled Binary Distributions”, pick the right file to download, depending on your
operating system (e.g., click on “Windows (95 and later)” if you have any Windows
system).
5. This brings you to a page where you select the part of R you need. While you may later
want to download the set of user contributed functions, for now just click on “base”,
which gets you the basic R program.
6. At the next screen, click on “[Link]” (or similar). At this point you should
be asked (via a prompt box) where you want to save the file. Pick a place on your
computer to save this (e.g., c:\temp).
7. After file has downloaded, in your explorer (in Windows, use equivalent if you have an-
other operating system), find where you have saved it, and click on “[Link]”
(or similar).

2
8. Follow directions on screen until program is installed. In general, you can just leave
all options unchanged.

9. After installation is completed, click on the “R” icon on your desktop to begin the
program. At this point, your screen should have an R prompt on it, and you are ready
to begin using the program.

3 Adding Packages

You can easily extend R’s basic capabilities by adding in functions that others have created
and made freely available through the web. Rather than surfing the web trying to find,
them, R makes adding these capabilities trivially easy by using the “Packages” menu items.

For example, to add a new package not previously downloaded onto your computer, click on
“Packages −→ Install Package(s)”, pick a site from the list (usually a Canadian site) and
choose the package you want to install from the long list (currently, about 900 packages are
available). To activate this package, you then need to click on it in the ‘Packages −→ Load
Package” menu item. You are then ready to use it, and the help files for the package will
also be automatically added to your installation. See, for example, the “Help −→ HTML
help” menu item, and click on “Packages” in the browser window that opens.

4 R Basics

R can be used in many ways:

Use R as a Calculator:

> 2+3
[1] 5
> pi
[1] 3.141593
> 2+3*pi
[1] 11.42478
> log(2+3*pi)
[1] 2.435785
> exp(2.435785)
[1] 11.42478

3
> 6/7
[1] 0.8571429
> cos(pi)
[1] -1

Entering and Manipulating Data in R:

Scaler variables:

> a <- 3
> a
[1] 3
> b <- 5
> b
[1] 5
> b-a
[1] 2
> b/a
[1] 1.666667

Vector variables:

> x<-c(2,3,1,5,4,6,5,7,6,8)
> x
[1] 2 3 1 5 4 6 5 7 6 8
> y <- c(10, 12, 14, 13, 34, 23, 12, 34, 25, 43)
> y
[1] 10 12 14 13 34 23 12 34 25 43
> x[4]
[1] 5
> y[3]
[1] 14
> x[1:10]
[1] 2 3 1 5 4 6 5 7 6 8
> x[6:8]
[1] 6 5 7
> x[x > 5]
[1] 6 7 6 8

4
Functions on vectors:

> length(x)
[1] 10
> length(y)
[1] 10
> sum(x)
[1] 47
> sum(y)
[1] 220
> sum(x^2)
[1] 265
> mean(x)
[1] 4.7
> mean(y)
[1] 22
> var(x)
[1] 4.9
> var(y)
[1] 136.4444
> sqrt(var(x))
[1] 2.213594
> sqrt(var(y))
[1] 11.68094
> sum((x-mean(x))^2)
[1] 44.1
> summary(x)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 3.25 5.00 4.70 6.00 8.00
> summary(y)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.00 12.25 18.50 22.00 31.75 43.00
> summary(x^2)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.00 10.75 25.00 26.50 36.00 64.00

Matrices:

> z<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=T)

> z

5
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9
> [Link]<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=F)
> [Link]
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9
> z[2,3]
[1] 6
> z[,1]
[1] 1 4 7
> z[1,]
[1] 1 2 3
> z[3,]
[1] 7 8 9

You can also have character variables:

> sex <- c("male", "female", "female", "male", "male", "female", "female")
> sex
[1] "male" "female" "female" "male" "male" "female" "female"
> smoking <- c("yes", "no", "no", "yes", "no", "yes", "no", "no", "yes", "no")
> smoking
[1] "yes" "no" "no" "yes" "no" "yes" "no" "no" "yes" "no"

You can declare these variables to be factor variables for regression analyses:

> [Link]<- [Link](sex)

> [Link]
[1] male female female male male female female
Levels: female male

It is trivially easy to generate random numbers in R, useful for all kinds of

statistical analyses and simulations:

> x<-rnorm(10, mean=3, sd=2)

6
> x
[1] 2.135011 7.573118 3.840814 5.544555 3.393277 2.731562
[7] 1.780588 4.576343 3.250077 3.908082
> y<-rbinom(10, size=1, prob=0.4)
> y
[1] 0 1 1 1 0 0 1 0 1 0
> z<- 5 + 2*x + 3*y +rnorm(10, mean=0, sd=0.5)
> z
[1] 9.44732 23.30763 15.34688 19.62479 11.30383 10.89258
[7] 11.22488 13.33018 15.11767 12.81894

Data Frames:

Most data sets are entered into R as data frames. To create a data frame, you
can enter:

> [Link]<-[Link](x,y,z)
> [Link]
x y z
1 2.135011 0 9.44732
2 7.573118 1 23.30763
3 3.840814 1 15.34688
4 5.544555 1 19.62479
5 3.393277 0 11.30383
6 2.731562 0 10.89258
7 1.780588 1 11.22488
8 4.576343 0 13.33018
9 3.250077 1 15.11767
10 3.908082 0 12.81894

To get help on any R function, simply type

> help(sum)

and a help window will pop up, in this case for the function “sum”. R also comes with
extensive manuals, html help files, a nice introduction, etc. Extremely Useful Suggestion for
this Course: Look at the help files for [Link] to see how to enter data from a text file on
your computer.

7
5 Basic Statistical Routines

You can use R to do any of the standard statistical routines you will typically need. R
includes everything you will learn in 607/621, and many other routines, with more added
regularly. R has many more modern (frequentist) routines available compared to other
packages such SAS or STATA, and whatever is not in the base package is likely to be in
the contributed section. In addition, whatever is not available in either of these sources,
“built-in” routines can easily be added on your own, as we will soon see.

T-Tests

> group1 <- x[y==1]

> group0 <- x[y==0]
> group1
[1] 7.573118 3.840814 5.544555 1.780588 3.250077
> group0
[1] 2.135011 3.393277 2.731562 4.576343 3.908082
> [Link](group1, group0)

Welch Two Sample t-test

data: group1 and group0 t = 0.9667, df = 5.431, p-value = 0.3748

alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-1.675244 3.773195
sample estimates: mean of x mean of y
4.397831 3.348855

Chi-square tests

Suppose you have a two-by-two table of data, such as:

Patient Independent Patient Dependent

Stroke Unit 67 34
Medical Unit 46 45

You can simply enter:

> [Link](c(67,46), c(67+34, 46+45) )

2-sample test for equality of proportions with continuity correction

8
data: c(67, 46) out of c(67 + 34, 46 + 45) X-squared = 4.2965, df
= 1, p-value = 0.03819 alternative hypothesis: [Link] 95
percent confidence interval:
0.009420794 0.306322868
sample estimates:
prop 1 prop 2
0.6633663 0.5054945

More generally:

> smokers <- rbinom(100, 1, 0.2)

> [Link]<-rbinom(200, 1, 0.12)
> [Link](c(sum(smokers), sum([Link])), c(100,200), correct=F)

2-sample test for equality of proportions without continuity

correction

data: c(sum(smokers), sum([Link])) out of c(100, 200)

X-squared = 2.1319, df = 1, p-value = 0.1443 alternative
hypothesis: [Link] 95 percent confidence interval:
-0.02659294 0.15659294
sample estimates: prop 1 prop 2
0.200 0.135

Linear Models

Recall our data frame:

> [Link]
x y z
1 2.135011 0 9.44732
2 7.573118 1 23.30763
3 3.840814 1 15.34688
4 5.544555 1 19.62479
5 3.393277 0 11.30383
6 2.731562 0 10.89258
7 1.780588 1 11.22488
8 4.576343 0 13.33018
9 3.250077 1 15.11767
10 3.908082 0 12.81894

9
Suppose we want simple linear regression model of z on x and y.

> [Link]<-lm(z ~ x + y)
> summary([Link])

Call:
lm(formula = z ~ x + y)

Residuals:
Min 1Q Median 3Q Max
-0.6821 -0.4339 0.0893 0.3849 0.5679

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.8644 0.4245 11.460 8.66e-06 ***
x 1.9989 0.1064 18.792 3.00e-07 ***
y 3.2690 0.3450 9.475 3.05e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.5162 on 7 degrees of freedom

Multiple R-Squared: 0.9889, Adjusted R-squared: 0.9857
F-statistic: 311.6 on 2 and 7 DF, p-value: 1.444e-07

Recall that z was generated by

> z<- 5 + 2x + 3y +rnorm(10, mean=0, sd=0.5)

so our estimates are close to expected (including sd estimate).

Other types of regression equations are available through generalized linear models:

For logistic regression, type (assuming z is dichotomous)

> glm(z ~ x + y, family=binomial(link="logit"))

For poisson regression, type (assuming z is a count variable)

> glm(z ~ x + y, family=poisson(link = "log"))

and so on.

10
6 Graphics

R is capable of producing very high quality graphics in a variety of formats (wmf, ps, jpg,
pdf, png, bmp).

For a scatter plot, simply type:

> plot(x,z)

Figure 1: A simple scatter plot

Adding titles and labels for each axis is easy:

> plot(x,z, main="Scatter plot of age versus outcome", xlab="age", ylab="outcome")

Adding the best fitting regression line to the plot as well:

> plot(x,z, main="Scatter plot of age versus outcome", xlab="age", ylab="outcome")

> lsfit(x,z)$coef

11
Figure 2: A simple scatter plot with labels and title

Intercept X
5.236471 2.324865
> [Link] <-seq(2,7,by=0.01)
> [Link] <- 5.236471 + 2.324865*[Link]
> points([Link], [Link], type="l")

You can plot more than one curve on a single plot, and label them via a legend:

> range <- seq(-10, 10, by = 0.001)

> norm1 <- dnorm(range, mean=0, sd=1)
> norm2 <- dnorm(range, mean=1, sd=2)
> plot(range, norm1, type="l", lty=1, main="Two Normal Distributions",
xlab="Range", ylab="Probability Density")
> points(range, norm2, type="l", lty=2)
> legend(x=-10, y=0.4, legend= c("N(0,1)", "N(1,2)"), lty=c(1,2))

12
Figure 3: Scatter plot with best fitting line added

Figure 4: Distributions with legend

13
You can also do histograms . . .

> hist(rnorm(1000,mean=0, sd=1), main = "Sample From N(0,1)

Distribution", xlab = "Range", ylab="Frequency")

Figure 5: Histogram

. . . and boxplots.

> boxplot(rnorm(1000, mean=0, sd=1), rnorm(1000, mean=1, sd=2),

names=c("N(0,1)", "N(1,2)"))

14
Figure 6: Boxplot

15
7 Writing your Own Functions

It is very easy to write your own functions in R. Once these are written, they become
permanent functions in your version of R, available just like any other function. Here I
present two examples, one easy, one (slightly) more complex.

First the easy example, creating a function to add two numbers. Of course, this is a silly
example, since R already does this easily, but its simplicity will allow us to see how functions
are constructed in general.

That is all one needs to do to make a function!

Now for something more complex, we will create a function to carry out Bayesian inferences
for a single normal mean, when the variance is assumed known.

The theory is the following (to remind you of what we covered last class):

Suppose that your data set, X = (x1 , x2 , . . . , xn ), with sample size n is known to arise from
a Normal distribution, xi ∼ N (µ, σ 2 ), i = 1, 2, . . . , n, i.e., the likelihood function for the data
is normal. Suppose also that the variance of the data, σ 2 is a known constant, and so the
only unknown parameter to be estimated is the mean. Suppose that the prior distribution
for the unknown mean µ is N (θ, τ 2 ).

If we combine the information in the prior distribution with the information in the data set
via Bayes Theorem, we derive the posterior distribution, which summarizes current informa-
tion about the mean µ. It happens that the posterior is also normal, and the exact formula

16
is:
θ
+ nx
−1 !
1 n

τ2 σ2
µposterior ∼ N 1 n , 2
+ 2 ,
τ2
+ σ2 τ σ
where x is the mean of the observed data, and [·]−1 indicates matrix inversion.

A function that implements this formula in R is:

> [Link] <- function(x, [Link], [Link], [Link])

{
####################################################################
# R function for Bayesian analysis of normal mean, variance known #
# Parameters included are: #
# #
# Inputs: #
# #
# x = vector of data #
# [Link] = prior mean #
# [Link] = prior variance #
# [Link] = assumed known variance of data #
# #
# Outputs: #
# #
# [Link] = posterior mean #
# [Link] = posterior variance #
# #
####################################################################

n<- length(x)
[Link] <- mean(x)
[Link] <- [Link]/[Link] + n*[Link]/[Link]
[Link] <- 1/[Link] + n/[Link]
[Link] <- [Link]/[Link]
[Link] <- (1/(1/[Link] + n/[Link]))

a <- "Post mean = "

b <- "Post Var = "

cat(a, [Link], ",", b, [Link], "\n")

}
> [Link](1:10, 0, 1000, 1)
Post mean = 5.49945 , Post Var = 0.09999

Another useful way to create output in functions is illustrated below. This format is especially
useful if you want to use the program’s results later on in your R session.

17
> [Link] <- function(x, [Link], [Link], [Link])
{
####################################################################
# R function for Bayesian analysis of normal mean, variance known #
# Parameters included are: #
# #
# Inputs: #
# #
# x = vector of data #
# [Link] = prior mean #
# [Link] = prior variance #
# [Link] = assumed known variance of data #
# #
# Outputs: #
# #
# [Link] = posterior mean #
# [Link] = posterior variance #
# #
####################################################################

n<- length(x)
[Link] <- mean(x)
[Link] <- [Link]/[Link] + n*[Link]/[Link]
[Link] <- 1/[Link] + n/[Link]
[Link] <- [Link]/[Link]
[Link] <- (1/(1/[Link] + n/[Link]))

[Link] <- list([Link]= [Link], [Link] = [Link])

return([Link])
}
> [Link](1:10, 0, 1000, 1)
$[Link]
[1] 5.49945

$[Link]
[1] 0.09999

A useful function relating to functions is args, which reminds you of the required arguments
of the function. For example:

> args([Link])
function (x, [Link], [Link], [Link])

18
8 Conclusion

R is a very powerful package, we have just scratched the surface of what is available. Its
interactive nature makes it more pleasant to use than “batch” programs like SAS, and
its developmental environment makes it the first choice of many, if not the majority, of
statisticians today.

Some other resources include:

• A useful book titled “Analysis of Epidemiological Data Using R and Epicalc,” available
at

[Link]

• A series of useful tips are available at

[Link]

I would be happy to answer any questions you have about R, just email me.

Common questions

R offers a comprehensive suite of statistical tests, such as t-tests, chi-square tests, and regression models (including linear, logistic, and Poisson regression), which are integral to data analysis and hypothesis testing in research. Compared to other tools like SAS and STATA, R presents several advantages: it is known for being at the forefront of methodological development, offering more modern and frequentist statistical techniques. R's continued community support and open-source model allow for rapid incorporation of the latest statistical methods, often surpassing proprietary software in terms of variety and innovation. Additionally, users can create custom functions to implement new methodologies not yet available in R or other software, enhancing its adaptability and appeal to statisticians who prioritize customization and cutting-edge techniques .

R ensures high-quality graphical capabilities by providing default settings that are carefully chosen for minor design choices, while also allowing users full control over the graphical parameters. This enables users to produce well-designed publication-quality plots, which can include mathematical symbols and formulae where needed. R supports various formats such as wmf, ps, jpg, pdf, png, and bmp, thus making it versatile in adapting to different publication standards and media. The combination of thoughtful defaults and flexible user control helps maintain both consistency and high quality in visual outputs .

Performing a Bayesian analysis of a normal mean in R, assuming the variance is known, involves defining the prior distribution and data with the assumed variance. The process uses Bayes' Theorem to combine prior information with observed data. The R function for Bayesian analysis calculates the posterior mean and variance, utilizing inputs such as prior mean, prior variance, the known data variance, and the observed data vector. The posterior distribution reflects updated beliefs about the mean, summarizing current information and providing statistical insights based on both prior knowledge and new data .

To install R on a Windows system, users must follow these steps: 1) Visit the R web page at http://www.r-project.org/. 2) Click on “CRAN” under the “Download” section. 3) Choose a mirror from which to download, considering download speeds (closer mirrors like those in your country will often be faster). 4) Select 'Windows' under 'Pre-compiled Binary Distributions'. 5) Click on the appropriate R installer file (e.g., 'R-2.3.1-win32.exe'). 6) Save the file to a specific location on the computer. 7) Run the downloaded executable file and follow on-screen instructions, generally leaving options unchanged for typical installations. Once installed, users can start R by clicking the icon on the desktop .

Writing custom functions in R offers unique advantages, particularly for advanced users who require specialized calculations or repeated processes not covered by existing packages. Custom functions enable researchers to encapsulate complex workflows into reusable scripts, ensuring consistency and efficiency across similar tasks. This practice not only saves time and reduces the potential for error compared to writing and running individual commands manually, but it also fosters experiment reproducibility and transparency. For example, functions can be tailored to implement new statistical methodologies or custom algorithms, enabling advanced users to stay at the forefront of statistical analysis and extend R's capabilities beyond out-of-the-box functionalities .

Confidence intervals in R provide a range of values that are estimated to contain the population parameter with a certain level of confidence, usually 95%. This feature improves the interpretation of statistical test results by offering more information than a simple p-value. While a p-value indicates whether or not there is evidence against a null hypothesis, confidence intervals help quantify the precision and reliability of the estimate. They give context to the significance of the test and allow researchers to make educated inferences about the parameter in question, enhancing overall decision-making in statistical analysis .

R allows users to extend its basic capabilities by installing additional packages, which are collections of functions and datasets. These packages can add a wide range of functionalities not included in the base R distribution. Users can easily install these packages from the Comprehensive R Archive Network (CRAN) via the 'Packages' menu in R. This extensibility provides several advantages: it enables users to tailor R to their specific needs by incorporating niche statistical techniques and tools; it fosters a community-driven approach to software development, allowing users to benefit from the contributions of statisticians and programmers worldwide; and it secures regular updates and new methods, ensuring that R remains a dynamic and cutting-edge tool for statistical analysis .

R's graphics capabilities benefit users by allowing them to produce high-quality visual representations of data, which are essential for understanding complex datasets and communicating findings effectively. R supports a wide range of graphical formats and types, such as scatter plots, histograms, and boxplots, enhancing flexibility in visualization. The ability to customize plots with labels, titles, and legends helps in tailoring visual outputs to specific audiences, improving clarity and comprehension. R’s graphics also facilitate exploratory data analysis by enabling quick and insightful examinations of data distributions and relationships, guiding subsequent statistical modeling .

T-tests in R are used to determine if there is a statistically significant difference between the means of two groups. They include the Welch Two Sample t-test to accommodate groups with unequal variances. Linear regression models, on the other hand, evaluate the relationship between a dependent variable and one or more independent variables, allowing for predictions and assessments of variable influence. While t-tests provide insights into mean differences, linear regression offers a detailed understanding of the relationship dynamics between variables, including effect size and the direction of association. Both tests come with confidence intervals, which help in interpreting the reliability and precision of the estimates .

R’s help files and online documentation are crucial for users, providing comprehensive guidance on the functionality of various commands and packages. For new users, they offer an essential resource for learning how to perform specific tasks, troubleshoot issues, understand function arguments, and explore detailed examples of command usage. This support system reduces the learning curve associated with mastering R's extensive capabilities and encourages user independence in problem-solving. The documentation also facilitates keeping pace with R's evolving features, benefiting all users by ensuring access to up-to-date instructions and best practices .

Introduction to R Software Basics
No ratings yet
Introduction to R Software Basics
11 pages
R Using R Statistics Stowell2014
No ratings yet
R Using R Statistics Stowell2014
232 pages
Introduction to R for Biostatistics
No ratings yet
Introduction to R for Biostatistics
20 pages
Owen TheRGuide
No ratings yet
Owen TheRGuide
61 pages
R Programming for Climate Research
No ratings yet
R Programming for Climate Research
31 pages
Introduction to R Software Basics
No ratings yet
Introduction to R Software Basics
51 pages
R Course: Statistical Graphics Basics
No ratings yet
R Course: Statistical Graphics Basics
169 pages
R Data Handling and Analysis Guide
No ratings yet
R Data Handling and Analysis Guide
104 pages
EDA Week1
No ratings yet
EDA Week1
24 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
233 pages
R Statistics: Installation and Basics
100% (1)
R Statistics: Installation and Basics
15 pages
R Programming Basics and Installation Guide
No ratings yet
R Programming Basics and Installation Guide
111 pages
R Data Analysis Guide for Beginners
No ratings yet
R Data Analysis Guide for Beginners
91 pages
R Guide
No ratings yet
R Guide
152 pages
Introduction to R for Statistics
No ratings yet
Introduction to R for Statistics
47 pages
Quick Start to Statistical Analysis in R
100% (1)
Quick Start to Statistical Analysis in R
47 pages
Simulating Binomial Distribution in R
No ratings yet
Simulating Binomial Distribution in R
30 pages
R Regression Modeling Overview
No ratings yet
R Regression Modeling Overview
162 pages
Intro To R Lesson 1
No ratings yet
Intro To R Lesson 1
9 pages
R With RStudio For Introductory Statistics
100% (1)
R With RStudio For Introductory Statistics
163 pages
R Programming A Step-by-Step Guide For Absolute Beginners by Daniel Bell
100% (2)
R Programming A Step-by-Step Guide For Absolute Beginners by Daniel Bell
145 pages
Introduction to R Programming Basics
No ratings yet
Introduction to R Programming Basics
35 pages
Introduction to R Programming Guide
No ratings yet
Introduction to R Programming Guide
35 pages
Introduction to R Programming
100% (1)
Introduction to R Programming
189 pages
Introduction to R Software Basics
No ratings yet
Introduction to R Software Basics
22 pages
Topic1B-Introduction To Computers
No ratings yet
Topic1B-Introduction To Computers
18 pages
R Language Notes
No ratings yet
R Language Notes
68 pages
Undergrad Guide Tor
No ratings yet
Undergrad Guide Tor
68 pages
Learn R Programming Basics PDF
No ratings yet
Learn R Programming Basics PDF
53 pages
Data Analysis Using R
100% (1)
Data Analysis Using R
78 pages
Call Type Analysis for Sales Reps
No ratings yet
Call Type Analysis for Sales Reps
25 pages
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
No ratings yet
What Is Statistical Programming?: Computations Which Aid in Statistical Analysis To
47 pages
R Programming
100% (4)
R Programming
163 pages
R Programming PDF
No ratings yet
R Programming PDF
163 pages
R Programming FULL
No ratings yet
R Programming FULL
140 pages
R Programming
No ratings yet
R Programming
10 pages
R Programming: Installation & Basics
No ratings yet
R Programming: Installation & Basics
58 pages
Beginner's Guide to R Programming
No ratings yet
Beginner's Guide to R Programming
30 pages
Basic R Programming Syntax Guide
No ratings yet
Basic R Programming Syntax Guide
22 pages
R Tutorial Session 1-2
100% (1)
R Tutorial Session 1-2
8 pages
R Programming Basics for Data Management
No ratings yet
R Programming Basics for Data Management
25 pages
STAT319 Lab Manual Overview
No ratings yet
STAT319 Lab Manual Overview
127 pages
Statistical Data Analysis - R Tutorial - DR A. J. Bevan
No ratings yet
Statistical Data Analysis - R Tutorial - DR A. J. Bevan
6 pages
R Programming for Data Analytics Basics
100% (2)
R Programming for Data Analytics Basics
156 pages
Introduction to R for Data Analysis
No ratings yet
Introduction to R for Data Analysis
22 pages
RStudio Pane Overview
No ratings yet
RStudio Pane Overview
31 pages
An Introduction To R
No ratings yet
An Introduction To R
141 pages
Introduction to Regression with R
No ratings yet
Introduction to Regression with R
36 pages
R for Machine Learning Guide
No ratings yet
R for Machine Learning Guide
9 pages
R Manual PDF
No ratings yet
R Manual PDF
78 pages
R Programming Basics for Beginners
No ratings yet
R Programming Basics for Beginners
431 pages
Mastering R for Health Research Analysis
No ratings yet
Mastering R for Health Research Analysis
2 pages
R Basics - What Is R
No ratings yet
R Basics - What Is R
4 pages
Disadvantages of R Programming Explained
No ratings yet
Disadvantages of R Programming Explained
4 pages
Getting Started with R Programming
No ratings yet
Getting Started with R Programming
10 pages
Ifw Deep Dive R-Quick Guide
No ratings yet
Ifw Deep Dive R-Quick Guide
12 pages
Types of Organic Reactions Overview
No ratings yet
Types of Organic Reactions Overview
25 pages
Qualitative Analysis Practical Guide
No ratings yet
Qualitative Analysis Practical Guide
17 pages
Percentage Calculations and Discounts
No ratings yet
Percentage Calculations and Discounts
2 pages
Introduction to Time Series Analysis
No ratings yet
Introduction to Time Series Analysis
53 pages
INF1003 Tutorial 04 Exercises
No ratings yet
INF1003 Tutorial 04 Exercises
1 page
Chapter 1: Time Series Projections
No ratings yet
Chapter 1: Time Series Projections
16 pages
ARMA Models and Stationarity Analysis
No ratings yet
ARMA Models and Stationarity Analysis
20 pages
Box-Jenkins Model Parameter Estimation
No ratings yet
Box-Jenkins Model Parameter Estimation
39 pages
New Hire Reporting Instructions NYS
No ratings yet
New Hire Reporting Instructions NYS
4 pages
iSolarSuite User Manual Overview
No ratings yet
iSolarSuite User Manual Overview
28 pages
Bad Habits Listening Worksheet
No ratings yet
Bad Habits Listening Worksheet
3 pages
Variable Naming and Data Types Guide
No ratings yet
Variable Naming and Data Types Guide
11 pages
Data Warehousing and Mining Exam Paper
No ratings yet
Data Warehousing and Mining Exam Paper
15 pages
VRC2010 Voltage Regulator Tester Guide
100% (3)
VRC2010 Voltage Regulator Tester Guide
780 pages
APMOPS & HCIC App User Guide
No ratings yet
APMOPS & HCIC App User Guide
13 pages
Understanding Service-Oriented Architecture
No ratings yet
Understanding Service-Oriented Architecture
25 pages
Advantages of Encapsulation in Java
0% (1)
Advantages of Encapsulation in Java
4 pages
Simrad RGC50 Instruction Manual
No ratings yet
Simrad RGC50 Instruction Manual
102 pages
Irregular Sudoku Solver Project Report
No ratings yet
Irregular Sudoku Solver Project Report
35 pages
MAT 216 Problem Sheet Solutions
No ratings yet
MAT 216 Problem Sheet Solutions
3 pages
e-Garage Booking App Using MERN Stack
No ratings yet
e-Garage Booking App Using MERN Stack
10 pages
Lee Valley - Brassware 2016 PDF
0% (1)
Lee Valley - Brassware 2016 PDF
296 pages
Linear Programming in Civil Engineering
No ratings yet
Linear Programming in Civil Engineering
26 pages
Import Amazon Products to Bazaar Guide
No ratings yet
Import Amazon Products to Bazaar Guide
241 pages
SC10-2 Safety Controller: Datasheet
No ratings yet
SC10-2 Safety Controller: Datasheet
2 pages
HMI Advanced Operation Training Guide
No ratings yet
HMI Advanced Operation Training Guide
45 pages
Understanding Copyrights in India
No ratings yet
Understanding Copyrights in India
51 pages
Anna University AID Dept. Course Materials
No ratings yet
Anna University AID Dept. Course Materials
34 pages
Drawing Robot Build Guide
No ratings yet
Drawing Robot Build Guide
28 pages
Full Stack Software Engineer Position
No ratings yet
Full Stack Software Engineer Position
3 pages
Telephone Bill Aug
No ratings yet
Telephone Bill Aug
3 pages
Create Bootable USB with Etcher
No ratings yet
Create Bootable USB with Etcher
5 pages
Website Fingerprinting Lab Overview
No ratings yet
Website Fingerprinting Lab Overview
5 pages
Aviatrix ACE Exam Dumps v9.02
No ratings yet
Aviatrix ACE Exam Dumps v9.02
55 pages
Lennox Prodigy M3 Controller Setup Guide
No ratings yet
Lennox Prodigy M3 Controller Setup Guide
88 pages
Understanding Zero Address Instructions
No ratings yet
Understanding Zero Address Instructions
3 pages
Computational Modeling in Finance
No ratings yet
Computational Modeling in Finance
19 pages
2026 USPS Add-Ons for Mail Promotions
No ratings yet
2026 USPS Add-Ons for Mail Promotions
5 pages

Introduction to R Statistical Package

Uploaded by

Introduction to R Statistical Package

Uploaded by

A Quick Introduction to R

This document is in divided into five sections:

1. Introduction: How to download and install, overview of capabilities.

2. R Basics: Covers using R as a calculator, entering data, types of variables, random

4. Graphics: scatter plots, histograms, 3-D plots, boxplots

Quoted directly from the R WWW page ([Link]

“R is a language and environment for statistical computing and graphics. It is

Installation follows a few trivial steps:

1. Go to R web page, at [Link]

R can be used in many ways:

Entering and Manipulating Data in R:

> z<- matrix(c(1,2,3,4,5,6,7,8,9), nrow=3, byrow=T)

You can also have character variables:

> [Link]<- [Link](sex)

It is trivially easy to generate random numbers in R, useful for all kinds of

> x<-rnorm(10, mean=3, sd=2)

To get help on any R function, simply type

> group1 <- x[y==1]

Welch Two Sample t-test

data: group1 and group0 t = 0.9667, df = 5.431, p-value = 0.3748

Suppose you have a two-by-two table of data, such as:

Patient Independent Patient Dependent

You can simply enter:

> [Link](c(67,46), c(67+34, 46+45) )

2-sample test for equality of proportions with continuity correction

> smokers <- rbinom(100, 1, 0.2)

2-sample test for equality of proportions without continuity

data: c(sum(smokers), sum([Link])) out of c(100, 200)

Recall our data frame:

Residual standard error: 0.5162 on 7 degrees of freedom

Recall that z was generated by

> z<- 5 + 2*x + 3*y +rnorm(10, mean=0, sd=0.5)

so our estimates are close to expected (including sd estimate).

For logistic regression, type (assuming z is dichotomous)

> glm(z ~ x + y, family=binomial(link="logit"))

For poisson regression, type (assuming z is a count variable)

> glm(z ~ x + y, family=poisson(link = "log"))

For a scatter plot, simply type:

Figure 1: A simple scatter plot

Adding titles and labels for each axis is easy:

> plot(x,z, main="Scatter plot of age versus outcome", xlab="age", ylab="outcome")

Adding the best fitting regression line to the plot as well:

> plot(x,z, main="Scatter plot of age versus outcome", xlab="age", ylab="outcome")

> range <- seq(-10, 10, by = 0.001)

Figure 4: Distributions with legend

> hist(rnorm(1000,mean=0, sd=1), main = "Sample From N(0,1)

> boxplot(rnorm(1000, mean=0, sd=1), rnorm(1000, mean=1, sd=2),

> [Link] <- function(a,b)

That is all one needs to do to make a function!

A function that implements this formula in R is:

> [Link] <- function(x, [Link], [Link], [Link])

a <- "Post mean = "

cat(a, [Link], ",", b, [Link], "\n")

[Link] <- list([Link]= [Link], [Link] = [Link])

Some other resources include:

• A series of useful tips are available at

Common questions

Discuss the role and importance of statistical tests available in R for research and how they compare to other tools like SAS or STATA.

Discuss the role and importance of statistical tests available in R for research and how they compare to other tools like SAS or STATA.

How does R ensure the accuracy and quality of its graphical capabilities for publication purposes?

How does R ensure the accuracy and quality of its graphical capabilities for publication purposes?

Describe the general process for performing a Bayesian analysis of a normal mean in R, assuming the variance is known.

Describe the general process for performing a Bayesian analysis of a normal mean in R, assuming the variance is known.

What are the steps to install R on a Windows system, and what considerations should be taken while doing so?

What are the steps to install R on a Windows system, and what considerations should be taken while doing so?

What are some unique advantages of writing custom functions in R, and how do they benefit advanced users?

What are some unique advantages of writing custom functions in R, and how do they benefit advanced users?

How do confidence intervals improve the interpretation of statistical test results in R?

How do confidence intervals improve the interpretation of statistical test results in R?

How does R handle the extensibility of its functions and packages, and what advantages does this offer to users?

How does R handle the extensibility of its functions and packages, and what advantages does this offer to users?

In what ways does R's capability in handling graphics benefit users in their data analysis processes?

In what ways does R's capability in handling graphics benefit users in their data analysis processes?

What are the key differences in the implementation of t-tests and linear regression models in R, and what insights do these tests provide?

What are the key differences in the implementation of t-tests and linear regression models in R, and what insights do these tests provide?

What role do R’s help files and online documentation play in aiding users, especially new ones, in effectively utilizing the software?

> z<- 5 + 2x + 3y +rnorm(10, mean=0, sd=0.5)