Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- - **2e.** Check that in both `sprint.m.df` and `sprint.w.df`, the values in `City.Date` column
- only once (i.e., there are no duplicated values).
- Do this in a way that you find suitable, but when you Knit this Lab,
- the results that demonstrate this claim should be visible in the HTML file.
- ```{r}
- duplicate.bool = FALSE
- for (i in (length(sprint.m.df["Date"]) - 1)) {
- if (sprint.m.df["Date"][i] == sprint.m.df["Date"][i + 1]) {
- duplicate.bool = TRUE
- }
- }
- duplicate.bool
- ```
- Merging data
- ===
- - **3a.** In preparation of merging `sprint.m.df` and `sprint.w.df`, we first
- want to find all the sprints that occur in the same race in both data frames.
- Specifically, remove all the rows in `sprint.m.df` that have a `City.Date`
- that does not occur in `sprint.w.df`. Likewise, remove all the rows in
- `sprint.w.df` that have a `City.Date`
- that does not occur in `sprint.m.df`.
- Then, remove the `City` and `Date` columns in both data frames.
- (Hint: You might be interested in the `%in%` function in R. Try looking this up
- to see what it does.)
- In the end, both `sprint.m.df` and `sprint.w.df` should have 385 rows and 7 columns.
- Print out the first 3 lines of `sprint.m.df` and `sprint.w.df`
- afterwards.
- ```{r}
- in.both = vector(mode="numeric", length=0)
- for (i in (length(sprint.m.df["City.Date"]))) {
- temp = which(sprint.w.df$City.Date == sprint.m.df["City.Date"][i])
- in.both = in.both + temp
- }
- sprint.w.df = sprint.w.df[-in.both]
- head(sprint.m.df, n = 3)
- head(sprint.w.df, n = 3)
- ```
- - **3b.** We now will complete the manual merge of `sprint.m.df` and `sprint.w.df`.
- Here are the sequence steps to do: First, check the order of values in `City.Date` in
- `sprint.m.df` match exactly with those in `sprint.w.df`. Second, use the `cbind()`
- function appropriately to create a new data frame `sprint.df` that has 13 columns.
- The first column should be `City.Date`, the next 6 columns should contain all the
- remaining columns from `sprint.m.df`, and the last 6 columns should contain all the
- remaining columns form `sprint.w.df`. Of course, each row should correspond to
- sprints from the same `City.Date`. Print out the first 3 lines of `sprint.df`
- afterwards.
- ```{r}
- idential(sprint.m.df["City.Date"], sprint.w.df["City.Date"])
- sprint.df = cbind(sprint.m.df, sprint.w.df, by = "City.Date")
- head(sprint.df, n = 3)
- ```
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement