Table of Contents
- The important parts of the book are
- A: How can I add legends?
- A: How can I use faceting in ggplot?
- A: How can I use linear regression?
- A: How can I use multiple sources of data frames?
- A: How generic ggplot is?
- A: How mappings between data frames and plot values are determined?
- A: How is it different to set aes in geom_ and ggplot functions?
- A: How different coordinate systems are used?
- A: What can I learn further from the book?
- A: What is grouping?
- A: What is Stat?
- A: How to draw maps?
- A: How to summarize data?
- A: How to set a theme?
- A: How to generate subplots?
- A: What does plyr do? Why is it used here?
- A: How to reduce duplicate code?
The important parts of the book are
- grammar of ggplot
- qplot for easy plotting
- geoms
- linear models in plots
qplot
qplot() is designed after plot()
The three most important parameters to qplot are x, y and data. If data is specified, it’s used as a namespace for variables
qplot(carat, price, data = diamonds)
qplot(carat, x \* y \* z, data = diamonds)
color is another argument that can be specified for differentiating. also shape and size
it’s possible to specify these arguments directly like I('red')
there are various geom parameters. "point" is the standard scatterplot, "smooth" fits a smoother into data and shows the standard error as well, "boxplot" produces box-and-whisker plot , "path" and "line" are used to draw lines between data points.
Smoothing
Smoother is specified via geom c(“point”, “smooth”)= parameter given to qplot. Particula smoother can be specified using the method parameter.
qplot(carat, data=diamonds, geom= "histogram")
qplot(carat, data=diamonds, geom= "density")
Rshows the distribution of a single variable.
binwidth parameter can be specified for histogram.
It’s possible to specify subgroup aesthetics using color or fill parameters.
Bar Charts
Bar charts count the number of instances per class. No need to specify bin sizes in bar charts because bins = classes
Faceting
Faceting divides the data into subsets and displaying the same graph for each subset. To facet, use facets parameter like =facets color ~ size= or when single parameter is needed =facets color ~ .=
A: How can I add legends?
In ggplot2 legends are produced automatically by the geoms and scales. There is little you can do to directly control the legends.
A: How can I use faceting in ggplot?
The following example is for qplot
qplot(cty, hwy, data=mpg2) + facet_grid(. ~ .)
RThe same function facet_grid can be used in ggplot as well.
A: How can I use linear regression?
qplot(carat, price, data = diamonds, log = "xy") +
geom_smooth(method = "lm")
RA: How can I use multiple sources of data frames?
The data should be a data.frame. ggplot2 cannot read from multiple sources. However it’s possible to use multiple layers with different dataset, by specifying different data for each geom_ function.
A: How generic ggplot is?
The grammar is generic enough but the data.frame restriction on ggplot is important.
A: How mappings between data frames and plot values are determined?
aes function is used to map data frame columns and layers in the plot. For different geom_ functions, different aes parameters are available.
A: How is it different to set aes in geom_ and ggplot functions?
The difference lies in setting the defaults. If for each layer, mappings does not change, setting them in ggplot function saves a few strokes. Otherwise each layer can have their particular mappings in geom_ functions
A: How different coordinate systems are used?
there are coord_ functions like coord_cartesian to specify different coordinate systems. coord_trans, coord_map and coord_polar might be the most important of these functions.
A: What can I learn further from the book?
Layering strategy, plot annotations, themes, reducing duplicate code, qplot and ggplot conversions, aesthetic specs, grid
A: What is grouping?
When aes, then we need a grouping parameter to group these numbers. That should be a discrete variable to identify different classes of data, like height of a particular boy.
A: What is Stat?
stats are calculations on dataset which produce plottable datasets. They can also be referred in aes function. Their parameters should be surrounded with dots, like ..density.. while refering to not to mistake them with variables in the original dataset.
A: How to draw maps?
There is a maps package that provides the maps, then it’s possible to use these maps in data frames and draw withgeom~polygon~ etc.
A: How to summarize data?
There is stat_summary function to summarize the data. It has some predefined alternatives and is able to receive simple aggregate functions (like max or mean) also. It can also receive more complex functions.
A: How to set a theme?
It’s possible to set a theme globally by theme_set and locally by passing theme function (theme_grey or theme_bw) to the ggplot func.
A: How to generate subplots?
Subplots are small plots inside large plots. They can be used to summarize the data in another way. It is possible to set viewports using viewport function and draw another plot to that.
A: What does plyr do? Why is it used here?
plyr package is used to group and manipulate the data frame. It can be used to create any transformation in the subsets of data.
A: How to reduce duplicate code?
last_plot function returns the last plot. You can define groups of plot elements to reuse. qplot also uses these tricks to generate ggplots.