


Roy Mill, Step #4: Thank God for the egen Commandīack to top Creating Indicator Variables (Dummy Variables)Īn indicator variable denotes whether something is true, which is 1, or false, which is 0. See by prefix with min(), max(), sum(), mean() etc. egen make5 = ends(make), trim last parses out the last portion from make. The punct() trim head|last|tail option further allows one to choose the portion of the string to take out: head, the first substring last, the last substring or tail, the remaining substring following the first parsing character. egen make4 = ends(make), punct(.) takes out either the portion precedes the “. The punct() option allows one to change where to parse the substring the default is to parse on the space. egen make3 = ends(make) takes out the car make from the combination of make and model by the space between the two. The four methods of transforming numeric to categorical variables that we have come across so far:Įgen newvar = ends() takes out whatever precedes the first space in the string, or the entire string if the string variable does not contain a space. egen price5 = cut(price), group(5) generates price5 into 5 groups of the same size. egen price4 = cut(price),at(3291,5000,15906) recodes price into price4 with three intervals [3291,5000), [5000, 15906), and [5000, 15906).Įgen newvar = cut( var),group( #) alternatively divides the newly defined variable into groups of equal frequencies. # specifies the cut-offs with its left-side being inclusive. Generates a new group id with values from 1 to 4 for the categorical variable region and then converts the id variable to a string.Įgen newvar = cut( var),at( #,#,…,#) provides one more method of recoding numeric to categorical variables. gen car_space2 = (headroom+length)/2 where if any of the variables has missing values, generate will ignore the entire rows and return missing values.Įgen group_id = group( old_group_var) creates a new group id with numeric values for the categorical variable.

In this example neither variable contains missing values.Ĭompare this method to the generate method: If both are missing, egen newvar = rowmean() will then return a missing value. Note that if in some cases one of the two variables headroom and length is missing, egen newvar = rowmean() will ignore the missing observations and use the non-missing observations for calculation. egen car_space = rowmean(headroom length) creates an arbitrary measure for car space using the mean of headroom and car length. egen total_weight = total(weight) if !missing(weight), by(foreign) Therefore, if we want to include only the nonmissing cases, we need to Note that egen newvar = total() treats missing values as 0.

egen total_weight=total(weight), by(foreign) creates the total car weight by car type. Type help egen to view a complete list and descriptions of the functions that go with egen.īack to top Generating new variables total() Commonly used functions include but are not limited to mean(), sd(), min(), max(), rowmean(), diff(), total(), std(), group() etc. Egen is the extended generate and requires a function to be specified to generate a new variable.Įgen newvar = function(arguments) creates the new variable.
