Second Data Lab R Version *(1) - (3) : Load the March2010.csv file into R *(4) : Get a feel for the data. Helpful functions: - summary # summarize a variable or whole data set - class # ask R what kind of variable it considers something to be - table # for discrete variables, get a table of the values it takes - head/tail # look at the first/last several entries of a variable or whole data set - names # get the names of all of the variables of a data set Basic data.table syntax: DT[i, j, by] i : which rows do we want? j : what do we want to do with those rows? by: how would we like to group the j operation? So if we wanted to compute average height of men by age, we might do: DT[gender == "M", mean(height), by = age] *(5) : Read the following - http://cps.ipums.org/cps-action/variables/WTSUPP - ?weighted.mean *(6) : Answer the following with the data - What percent of survey respondents are married? - What percent of the _population_ is married? (need sample weights) *(7) : Repeat (6) for the sample restricted to household heads. *(8) : Create a logical variable that is TRUE if an individual lives in California, FALSE otherwise Helpful syntax: Continuing the average male height by age example above, we could have used := to add the result to our table as follows: DT[gender == "M", avg_height := mean(height), by = age] After running this, there will be a new column, called avg_height, which will be equal to, for every a male of age A, the average male height for that age. The variable will be MISSING (NA) for all females (anyone with gender != "M"). Suppose we wanted to add the average height by age AND gender. We could have done: DT[ , avg_height := mean(height), by = .(age, gender)] *(9) : Estimate the percent of individuals who live in California *(10) : Create an individual income variable named "iincome" equal to household income divided by the square root of the number of individuals in the household. *(11) : Draw histograms and kernel density plots of iincome for the whole sample Helpful functions: - hist - density *(12) : Plot and save a histogram of the income distribution for each state. Helpful syntax: R saves plots "to external devices" through calls to the device name. There's a device for creating .png files, .jpeg files, .bmp, .png files, etc. I recommend using either .pdf files (vector graphics) or .png files. To use them, we wrap the plot call in png and dev.off() like so: png("path/to/output_image.png") plot(x, y) # other plotting commands # ... dev.off() The png call tells R to start saving plots to the file in directory path/to named output_image.png. The dev.off() command tells R to stop adding things to that file and "save it/close it." I'll also show you some tricks to do this very concisely once you play around trying to do it a bit.