ggplot2 Elegant Graphics for Data Analysis
The important parts of the book are
- grammar of ggplot
- qplot for easy plotting
- linear models in plots
=qplot()= is designed after
The three most important parameters to
data is specified, it's used as a namespace for variables
=qplot(carat, price, data = diamonds)=
=qplot(carat, x * y * z, data = diamonds)=
=color= is another argument that can be specified for differentiating.
it's possible to specify these arguments directly like
there are various
geom parameters. ="point"= is the standard
scatterplot, ="smooth"= fits a smoother into data and shows the
standard error as well, ="boxplot"= produces box-and-whisker plot ,
="path"= and ="line"= are used to draw lines between data points.
Smoother is specified via
geom c("point", "smooth")= parameter
qplot. Particula smoother can be specified using the
qplot(carat, data=diamonds, geom= "histogram") qplot(carat, data=diamonds, geom= "density")
shows the distribution of a single variable.
=binwidth= parameter can be specified for histogram.
It's possible to specify subgroup aesthetics using
Bar charts count the number of instances per class. No need to specify bin sizes in bar charts because bins = classes
Faceting divides the data into subsets and displaying the same graph for
each subset. To facet, use
facets parameter like =
facets color ~
size= or when single parameter is needed =
facets color ~ .=
A: How can I add legends?
ggplot2 legends are produced automatically by the geoms and scales.
There is little you can do to directly control the legends.
A: How can I use faceting in ggplot?
The following example is for
qplot(cty, hwy, data=mpg2) + facet_grid(. ~ .)
The same function
facet_grid can be used in ggplot as well.
A: How can I use linear regression?
qplot(carat, price, data = diamonds, log = "xy") + geom_smooth(method = "lm")
A: How can I use multiple sources of data frames?
The data should be a data.frame.
ggplot2 cannot read from multiple
sources. However it's possible to use multiple layers with different
dataset, by specifying different data for each
A: How generic ggplot is?
The grammar is generic enough but the data.frame restriction on ggplot is important.
A: How mappings between data frames and plot values are determined?
aes function is used to map data frame columns and layers in the plot.
geom_ functions, different
aes parameters are
A: How is it different to set aes in geom_ and ggplot functions?
The difference lies in setting the defaults. If for each layer, mappings
does not change, setting them in
ggplot function saves a few strokes.
Otherwise each layer can have their particular mappings in geom_
A: How different coordinate systems are used?
coord_ functions like
coord_cartesian to specify different
be the most important of these functions.
A: What can I learn further from the book?
Layering strategy, plot annotations, themes, reducing duplicate code, qplot and ggplot conversions, aesthetic specs, grid
A: What is grouping?
When $x$ and $y$ are continuous in
aes, then we need a grouping
parameter to group these numbers. That should be a discrete variable to
identify different classes of data, like height of a particular boy.
A: What is Stat?
stats are calculations on dataset which produce plottable datasets.
They can also be referred in
aes function. Their parameters should be
surrounded with dots, like
..density.. while refering to not to
mistake them with variables in the original dataset.
A: How to draw maps?
There is a
maps package that provides the maps, then it's possible to
use these maps in data frames and draw withgeom~polygon~ etc.
A: How to summarize data?
stat_summary function to summarize the data. It has some
predefined alternatives and is able to receive simple aggregate
mean) also. It can also receive more complex
A: How to set a theme?
It's possible to set a theme globally by
theme_set and locally by
passing theme function (
theme_bw) to the ggplot func.
A: How to generate subplots?
Subplots are small plots inside large plots. They can be used to
summarize the data in another way. It is possible to set viewports using
viewport function and draw another plot to that.
A: What does plyr do? Why is it used here?
plyr package is used to group and manipulate the data frame. It can be
used to create any transformation in the subsets of data.
A: How to reduce duplicate code?
last_plot function returns the last plot. You can define groups of
plot elements to reuse. qplot also uses these tricks to generate