ggplot2 Elegant Graphics for Data Analysis
The important parts of the book are
- grammar of ggplot
- qplot for easy plotting
- geoms
- linear models in plots
qplot
=qplot()= is designed after plot()
The three most important parameters to qplot
are x
, y
and data
. If data
is specified, it's used as a namespace for variables
=qplot(carat, price, data = diamonds)=
=qplot(carat, x * y * z, data = diamonds)=
=color= is another argument that can be specified for differentiating. also shape
and size
it's possible to specify these arguments directly like I('red')
there are various geom
parameters. ="point"= is the standard scatterplot, ="smooth"= fits a smoother into data and shows the standard error as well, ="boxplot"= produces box-and-whisker plot , ="path"= and ="line"= are used to draw lines between data points.
Smoothing
Smoother is specified via geom
c("point", "smooth")= parameter given to qplot
. Particula smoother can be specified using the method
parameter.
qplot(carat, data=diamonds, geom= "histogram")
qplot(carat, data=diamonds, geom= "density")
shows the distribution of a single variable.
=binwidth= parameter can be specified for histogram.
It's possible to specify subgroup aesthetics using color
or fill
parameters.
Bar Charts
Bar charts count the number of instances per class. No need to specify bin sizes in bar charts because bins = classes
Faceting
Faceting divides the data into subsets and displaying the same graph for each subset. To facet, use facets
parameter like =facets
color ~ size= or when single parameter is needed =facets
color ~ .=
A: How can I add legends?
In ggplot2
legends are produced automatically by the geoms and scales. There is little you can do to directly control the legends.
A: How can I use faceting in ggplot?
The following example is for qplot
qplot(cty, hwy, data=mpg2) + facet_grid(. ~ .)
The same function facet_grid
can be used in ggplot as well.
A: How can I use linear regression?
qplot(carat, price, data = diamonds, log = "xy") +
geom_smooth(method = "lm")
A: How can I use multiple sources of data frames?
The data should be a data.frame. ggplot2
cannot read from multiple sources. However it's possible to use multiple layers with different dataset, by specifying different data for each geom_
function.
A: How generic ggplot is?
The grammar is generic enough but the data.frame restriction on ggplot is important.
A: How mappings between data frames and plot values are determined?
aes
function is used to map data frame columns and layers in the plot. For different geom_
functions, different aes
parameters are available.
A: How is it different to set aes in geom_ and ggplot functions?
The difference lies in setting the defaults. If for each layer, mappings does not change, setting them in ggplot
function saves a few strokes. Otherwise each layer can have their particular mappings in geom_ functions
A: How different coordinate systems are used?
there are coord_
functions like coord_cartesian
to specify different coordinate systems. coord_trans
, coord_map
and coord_polar
might be the most important of these functions.
A: What can I learn further from the book?
Layering strategy, plot annotations, themes, reducing duplicate code, qplot and ggplot conversions, aesthetic specs, grid
A: What is grouping?
When $x$ and $y$ are continuous in aes
, then we need a grouping parameter to group these numbers. That should be a discrete variable to identify different classes of data, like height of a particular boy.
A: What is Stat?
stats are calculations on dataset which produce plottable datasets. They can also be referred in aes
function. Their parameters should be surrounded with dots, like ..density..
while refering to not to mistake them with variables in the original dataset.
A: How to draw maps?
There is a maps
package that provides the maps, then it's possible to use these maps in data frames and draw withgeom~polygon~ etc.
A: How to summarize data?
There is stat_summary
function to summarize the data. It has some predefined alternatives and is able to receive simple aggregate functions (like max
or mean
) also. It can also receive more complex functions.
A: How to set a theme?
It's possible to set a theme globally by theme_set
and locally by passing theme function (theme_grey
or theme_bw
) to the ggplot func.
A: How to generate subplots?
Subplots are small plots inside large plots. They can be used to summarize the data in another way. It is possible to set viewports using viewport
function and draw another plot to that.
A: What does plyr do? Why is it used here?
plyr
package is used to group and manipulate the data frame. It can be used to create any transformation in the subsets of data.
A: How to reduce duplicate code?
last_plot
function returns the last plot. You can define groups of plot elements to reuse. qplot also uses these tricks to generate ggplots.