Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- Histograms
- Helps you understand the distribution of a numeric value in a way that you cannot with mean or median alone
- Time Series
- Any chart that shows a trend over time
- Scatter Plots
- A way to visualize how two numeric variables are related in your data
- Helps you find outliers
- Bar Graphs
- A convenient way to compare numeric values of several groups
- Good for comparing multiple values (Example: revenue and cost)
- What to do when your data is too big
- 1. Data Aggregation
- A way to aggregate your data so that important information is easily seen
- 2. Sampling
- The idea is you can select a random subset of the original data to get an idea about the properties of the original data
- Example: When polling firms give out surveys about what people think about particular political candidates they use a similar technique because it's impossible to survey anyone in the given country
- Notes:
- * Remember to try sampling a few times to make sure that result you're seeing doesn't depend on particular samples that you selected.
- * The fact you can sample a subset of your data and still get meaningful results suggests that perhaps you don't need to have collected all the data in the first place. For example, if your website has 100 million users it might not be necessary to keep track of all of their behaviors on the website. It might be sufficient to sample only 10 percent of the users for data analysis purposes.
Add Comment
Please, Sign In to add comment