// analysis using stata workshop, econ311 // kbott | instructional technology services | reed college // analysis questions? feedback? kbott@reed.edu // general data questions? contact the data@reed team at data@reed.edu //////////////////// INTRO AND BASICS ** give Stata permission to not pause at the end of every results screen. . set more off // preloaded datasets, useful for training/learning . sysuse dir ** select census dataset . sysuse auto ** summarize full dataset or one variable (# obs, mean, stddev, min, max) . summarize ** ...one variable . sum rep78 ** describe full dataset . describe ** or one variable (var name, storage type, disp format value label variable label ** . d rep78 ** step one is always LOOK AT YOUR DATA! ** open data browser . browse * note: can also just type " br " // . br ** codebook, full dataset or one variable (type, range, units, unique, missing, mean, stdev, %tiles) . codebook . codebook rep78 ** clear out data in memory (good practice; not required for system data) . clear . sysuse auto ** tabulate = variable, frequency, percent, cumulative % ** by variable . tabulate foreign ** in pairs of variables . tabulate foreign rep78 ** hint for homework3 . tabulate make rep78 // KBOTT'S FIRST-GLANCE TOOLBOX // .sum // .codebook // .d // .tab // .inspect // .browse // .list (in, if) ** browse = view data in data browser . browse if foreign == 1 . browse if foreign != 1 ** another way of stating this // browse if foreign ~= 1 ** & joins multiple . browse if mpg > 5 & mpg < 20 ** view subsets by range of values using in . browse make mpg in 1/10 // ALTERING DATA -- these do exactly what they sound like. check help doc for syntax //. sort //. drop //. keep //. replace //. gen //. egen ** gen and egen have some of the same functionality egen = extended generate /////////////////////////////////// DATA ANALYSIS /////////// VISUALIZE CAR REPAIR DATA (stock dataset) -- HISTOGRAMS . sysuse auto . hist price, freq . hist price, freq bin(5) . hist price, freq bin(15) . hist price if foreign==1, freq bin(15) /////////// VISUALIZE CAR REPAIR DATA (stock dataset) -- SCATTERPLOTS . twoway scatter mpg weight . twoway scatter mpg weight || lfit mpg weight // ah, but you can be lazier in those commands... . scatter mpg weight . scatter mpg weight || lfit mpg weight ////////// ANALYZE CAR REPAIR DATA (stock dataset) ** correlation . correlate mpg weight ** regression . regress mpg weight ** regression -- can also do by specific groups OR for subsets . by foreign: regress mpg weight . regress mpg weight if foreign==1 //////// HOMEWORK #3 // measures of central tendency + dispersion //// summarize //// tabulate /// ... other commands? // visualization //// scatter // analysis //// regress //// predict