ggplot2 Elegant Graphics for Data Analysis

The important parts of the book are

=qplot()= is designed after plot()

The three most important parameters to qplot are x, y and data. If data is specified, it's used as a namespace for variables

=qplot(carat, price, data = diamonds)=

=qplot(carat, x * y * z, data = diamonds)=

=color= is another argument that can be specified for differentiating. also shape and size

it's possible to specify these arguments directly like I('red')

there are various geom parameters. ="point"= is the standard scatterplot, ="smooth"= fits a smoother into data and shows the standard error as well, ="boxplot"= produces box-and-whisker plot , ="path"= and ="line"= are used to draw lines between data points.

Smoothing

Smoother is specified via geom c("point", "smooth")= parameter given to qplot. Particula smoother can be specified using the method parameter.

qplot(carat, data=diamonds, geom= "histogram")
qplot(carat, data=diamonds, geom= "density")

shows the distribution of a single variable.

=binwidth= parameter can be specified for histogram.

It's possible to specify subgroup aesthetics using color or fill parameters.

Bar Charts

Bar charts count the number of instances per class. No need to specify bin sizes in bar charts because bins = classes

Faceting

Faceting divides the data into subsets and displaying the same graph for each subset. To facet, use facets parameter like =facets color ~ size= or when single parameter is needed =facets color ~ .=

A: How can I add legends?

In ggplot2 legends are produced automatically by the geoms and scales. There is little you can do to directly control the legends.

A: How can I use faceting in ggplot?

The following example is for qplot

qplot(cty, hwy, data=mpg2) + facet_grid(. ~ .)

The same function facet_grid can be used in ggplot as well.

A: How can I use linear regression?

qplot(carat, price, data = diamonds, log = "xy") +
geom_smooth(method = "lm")

A: How can I use multiple sources of data frames?

The data should be a data.frame. ggplot2 cannot read from multiple sources. However it's possible to use multiple layers with different dataset, by specifying different data for each geom_ function.

A: How generic ggplot is?

The grammar is generic enough but the data.frame restriction on ggplot is important.

A: How mappings between data frames and plot values are determined?

aes function is used to map data frame columns and layers in the plot. For different geom_ functions, different aes parameters are available.

A: How is it different to set aes in geom_ and ggplot functions?

The difference lies in setting the defaults. If for each layer, mappings does not change, setting them in ggplot function saves a few strokes. Otherwise each layer can have their particular mappings in geom_ functions

A: How different coordinate systems are used?

there are coord_ functions like coord_cartesian to specify different coordinate systems. coord_trans, coord_map and coord_polar might be the most important of these functions.

A: What can I learn further from the book?

Layering strategy, plot annotations, themes, reducing duplicate code, qplot and ggplot conversions, aesthetic specs, grid

A: What is grouping?

When $x$ and $y$ are continuous in aes, then we need a grouping parameter to group these numbers. That should be a discrete variable to identify different classes of data, like height of a particular boy.

A: What is Stat?

stats are calculations on dataset which produce plottable datasets. They can also be referred in aes function. Their parameters should be surrounded with dots, like ..density.. while refering to not to mistake them with variables in the original dataset.

A: How to draw maps?

There is a maps package that provides the maps, then it's possible to use these maps in data frames and draw withgeom~polygon~ etc.

A: How to summarize data?

There is stat_summary function to summarize the data. It has some predefined alternatives and is able to receive simple aggregate functions (like max or mean) also. It can also receive more complex functions.

A: How to set a theme?

It's possible to set a theme globally by theme_set and locally by passing theme function (theme_grey or theme_bw) to the ggplot func.

A: How to generate subplots?

Subplots are small plots inside large plots. They can be used to summarize the data in another way. It is possible to set viewports using viewport function and draw another plot to that.

A: What does plyr do? Why is it used here?

plyr package is used to group and manipulate the data frame. It can be used to create any transformation in the subsets of data.

A: How to reduce duplicate code?

last_plot function returns the last plot. You can define groups of plot elements to reuse. qplot also uses these tricks to generate ggplots.