Error in eval(expr, envir, enclos): object 'diamonds' not found
Error in ggplot(smaller, aes(x = carat)): could not find function "ggplot"
Session 8
2023-10-18
Consider:
This next section is straight from R for Data Science (2e) - 11 Exploratory data analysis.
“More than anything, EDA is a state of mind.”
When you ask a question…
When you start with a dataset, you might do something where you look at the general summary, using functions such as:
“These work really well when you’ve got a small amount of data, but when you have more data, you are generally limited by how much you can read.”
To understand the subgroups, ask:
What makes a value unusual?
Handling unusual values can include:
If a systematic relationship exists between two variables it will appear as a pattern in the data. If you spot a pattern, ask yourself:
naniar
provides principled, tidy ways to summarise, visualise, and manipulate missing data with minimal deviations from the workflows in ggplot2 and tidy data.
Visualizations of Distributions and Uncertainty • ggdist
ggdist
is an R package that provides a flexible set of ggplot2 geoms and stats designed especially for visualizing distributions and uncertainty.
Create Codebooks from Data Frames • codebookr
The
codebookr
package is intended to make it easy for users to create codebooks (also called data dictionaries) directly from an R data frame.
Option to put interactive elements in an HTML table — opt_interactive • gt
inspectdf is collection of utilities for columnwise summary, comparison and visualisation of data frames. Functions are provided to summarise missingness, categorical levels, numeric distribution, correlation, column types and memory usage.
Datasette is a tool for exploring and publishing data. It helps people take data of any shape, analyze and explore it, and publish it as an interactive website and accompanying API.