Table of Contents
- The important parts of the book are:
- Q: How can I add legends?
- Q: How can I use faceting in ggplot?
- Q: How can I use linear regression?
- Q: How can I use multiple sources of data frames?
- Q: How generic is ggplot?
- Q: How are mappings between data frames and plot values determined?
- Q: What is the difference between setting aes in geom_ and ggplot functions?
- Q: How are different coordinate systems used?
- Q: What else can I learn from the book?
- Q: What is grouping?
- Q: What is a Stat?
- Q: How do I draw maps?
- Q: How do I summarize data?
- Q: How do I set a theme?
- Q: How do I generate subplots?
- Q: What does plyr do? Why is it used here?
- Q: How do I reduce duplicate code?
The important parts of the book are:
- The grammar of graphics in
ggplot2 qplotfor easy plotting- Geoms (Geometric objects)
- Linear models in plots
qplot() is designed after the base plot() function.
The three most important parameters to qplot are x, y, and data. If data is specified, it’s used as a namespace for variables.
qplot(carat, price, data = diamonds)
qplot(carat, x * y * z, data = diamonds)
color is another argument that can be specified for differentiation, as are shape and size.
It’s possible to specify these arguments directly using I(), like I('red').
There are various geom parameters:
"point"is the standard scatterplot."smooth"fits a smoother to the data and shows the standard error as well."boxplot"produces a box-and-whisker plot."path"and"line"are used to draw lines between data points.
Smoothing
A smoother is specified via the geom = c("point", "smooth") parameter given to qplot. A particular smoother can be specified using the method parameter.
qplot(carat, data = diamonds, geom = "histogram")
qplot(carat, data = diamonds, geom = "density")
These show the distribution of a single variable.
The binwidth parameter can be specified for histograms.
It’s possible to specify subgroup aesthetics using color or fill parameters.
Bar Charts
Bar charts count the number of instances per class. There is no need to specify bin sizes in bar charts because bins equal classes.
Faceting
Faceting divides the data into subsets and displays the same graph for each subset. To facet, use the facets parameter, like facets = color ~ size or, when a single parameter is needed, facets = color ~ ..
Q: How can I add legends?
In ggplot2, legends are produced automatically by the geoms and scales. There is little you can do to directly control the legends.
Q: How can I use faceting in ggplot?
The following example is for qplot:
qplot(cty, hwy, data = mpg2) + facet_grid(. ~ .)
The same function facet_grid can be used in ggplot as well.
Q: How can I use linear regression?
qplot(carat, price, data = diamonds, log = "xy") +
geom_smooth(method = "lm")
Q: How can I use multiple sources of data frames?
The data should be a data.frame. ggplot2 cannot read from multiple sources directly. However, it’s possible to use multiple layers with different datasets by specifying different data for each geom_ function.
Q: How generic is ggplot?
The grammar is generic enough, but the data.frame restriction on ggplot is important.
Q: How are mappings between data frames and plot values determined?
The aes function is used to map data frame columns to layers in the plot. For different geom_ functions, different aes parameters are available.
Q: What is the difference between setting aes in geom_ and ggplot functions?
The difference lies in setting the defaults. If mappings do not change for each layer, setting them in the ggplot function saves a few keystrokes. Otherwise, each layer can have its own particular mappings in geom_ functions.
Q: How are different coordinate systems used?
There are coord_ functions like coord_cartesian to specify different coordinate systems. coord_trans, coord_map, and coord_polar are some of the most important of these functions.
Q: What else can I learn from the book?
Layering strategy, plot annotations, themes, reducing duplicate code, qplot and ggplot conversions, aesthetic specifications, and the grid system.
Q: What is grouping?
When x and y are continuous in aes, we may need a group parameter to group these numbers. This should be a discrete variable used to identify different classes of data, such as the height of a particular individual over time.
Q: What is a Stat?
Stats are calculations on a dataset that produce plottable datasets. They can also be referenced in the aes function. Their parameters should be surrounded by dots, like ..density.., when referring to them so as not to mistake them for variables in the original dataset.
Q: How do I draw maps?
There is a maps package that provides map data; it is then possible to use these maps in data frames and draw them with geom_polygon, etc.
Q: How do I summarize data?
There is a stat_summary function to summarize data. It has some predefined alternatives and is able to receive simple aggregate functions (like max or mean). It can also receive more complex functions.
Q: How do I set a theme?
It’s possible to set a theme globally using theme_set and locally by passing a theme function (like theme_grey or theme_bw) to the ggplot function.
Q: How do I generate subplots?
Subplots are small plots inside larger plots. They can be used to summarize data in another way. It is possible to set viewports using the viewport function and draw another plot within it.
Q: What does plyr do? Why is it used here?
The plyr package is used to group and manipulate data frames. It can be used to create any transformation on subsets of data.
Q: How do I reduce duplicate code?
The last_plot function returns the last plot created. You can also define groups of plot elements to reuse. qplot also uses these tricks to generate ggplot objects.