You've successfully subscribed to MyPad Blog
Great! Next, complete checkout for full access to MyPad Blog
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.
Success! Your billing info is updated.
Billing info update failed.

Recursively download data - Tidyverse way

Recursively download data - Tidyverse way

Thinking of trying my hand at participating in this week's #TidyTuesday exercise.

I'll take this in small steps and hopefully get to the end.

Here is an approach to get the data into Rstudio!

setwd("~/Dropbox/pandora/My-Projects/repos/diary/writing/")

# credit: https://github.com/WSJ/measles-data
# [Tom Mock on Twitter: "The @R4DScommunity welcomes you to week 9 of #TidyTuesday! We're exploring Measles vaccination!! 📁 https://t.co/sElb4fcv3u 📰 https://t.co/69BCOyRJm2 #r4ds #tidyverse #rstats #dataviz https://t.co/Klj5yeUm26" / Twitter](https://twitter.com/thomas_mock/status/1232034689281601536)
library(data.table)

# Data for each individual school
url_all <- "https://github.com/WSJ/measles-data/blob/master/all-measles-rates.csv?raw=true"
dt_all <- fread(url_all)

# More generalized data by state counties or state school districts
url_state_overview <- "https://github.com/WSJ/measles-data/raw/master/state-overviews.csv"
dt_state_overview <- fread(url_state_overview)


# Same data as all-measles-rates, but seperated by state
# example here
url_state_arizona <- "https://github.com/WSJ/measles-data/raw/master/individual-states/arizona.csv"
dt_state_arizona <- fread(url_state_arizona)

# need to learn to use tidyverse
library(tidyverse)

df_all <- as_tibble(dt_all)

# get state data as filters on df_all
df_all %>% 
select(state) %>% 
distinct() %>% 
 .$state %>% 
walk(., ~ assign(paste0("df_",tolower(.x)), df_all %>% filter(state==.x), envir=.GlobalEnv))


# lets actually download the state data! ... straight from github
# WSJ provided it in a folder organized by state name
get_state_measles_data <- function(state) {
    tryCatch({
      print(glue("downloading data for ... {state}"))

      # supressing message ... silent .. https://github.com/tidyverse/readr/issues/530
      d <- read_csv(glue("https://github.com/WSJ/measles-data/raw/master/individual-states/{state}.csv"), col_types = cols())
      assign(glue("df_{state}"), d, envir=.GlobalEnv) 
    },
    error = function(error) {
      print(glue("error while downloading data for ... {state}"))
    })
}

# get the states
df_all %>% 
select(state) %>% 
distinct() %>% 
mutate(state = tolower(state)) %>% 
.$state -> states


library(glue)
glue("# of states: {NROW(states)}")

walk(states, get_state_measles_data)

# working with lists in tidy: [jennybc/repurrrsive: Recursive lists to use in teaching and examples, because there is no iris data for lists.](https://github.com/jennybc/repurrrsive)