Home > Department of Geography and Environment > Who's who > profiles > Eric Neumayer > spundir - Stata ado-file: create undirected dyad spatial effect variable

spundir - Stata ado-file: create undirected dyad spatial effect variable

Eric Neumayer, LSE, Department of Geography and Environment

Thomas Plümper, Vienna University of Economics, Department of Socioeconomics

Syntax:

 spundir lagvar [if] [in], weightvar(varname) i(varname) j(varname)
                 link(options) [options]

options description:

time(varname)   contains the numeric time variable

exclusive exclusive undirected dyad contagion

norowst spatial effect variable not row-standardized

nomerge no automatic merge of spatial effect variable into original dataset

labelname(name) name of label given to spatial effect variable

sename(name) name to be given to created spatial effect variable

filename(name) name of file to which spatial effect variable saved

Description:

spundir generates an undirected dyad contagion spatial effect variable for analysis of spatial dependence in undirected dyad data. It can create spatial effect variables for spatial lag, spatial-x and spatial error models. See Neumayer and Plümper (2010) for a discussion of the difference between monadic and dyadic data. See Plümper and Neumayer (2010, 2016) for a discussion of model specification in the analysis of spatial dependence.

Background information:

Dyadic data consists of observations in which two units form a pair (the dyad). In directed dyadic data, the interaction between two dyad members ij initiates with i and is directed toward j. In the directed dyad ij, unit i is called the source, while unit j is called the target of the interaction. It is different from the directed dyad ji, where, in contrast, unit j is the source and unit i is the target. In contrast, in undirected dyadic data, whilst one can distinguish unit i from unit j, it is either not possible to distinguish between the dyad ij and the dyad ji or one does not want to make such a distinction.

Normally, to generate spatial effect variables for dyadic data, one would need a so-called 4-adic dataset, which connects dyads with dyads. In many applications, such a dataset would be far too large to be handled by a standard PC. Fortunately, this ado-file can be used without such a dataset as it parses through a virtual 4-adic dataset generated from a standard dyadic dataset. Users should be warned, however, that it can take from several seconds to several minutes, hours or days to generate the spatial effect variable, depending on the size of the dyadic dataset.

To generate a spatial effect variable for undirected dyadic data, one thus merely needs an undirected dyadic dataset that contains at least four variables. One variable must identify the unit i, while a second variable identifies the unit j. Third, a variable to be spatially lagged (e.g., the directed dyadic dependent variable in spatial lag models). Fourth, a weighting or connectivity variable that links unit i with unit j. This weighting variable may or may not be directed. However, if the weighting variable is directed then one needs a fully symmetric dyadic dataset (see below). Users need not worry about creating weights linking unit i with other units k or linking unit j with other units m. The ado-file automatically virtually transforms the connectivity variable linking unit i with unit j such as, instead, to link unit i with other units k or unit j with other units m or simple combinations of the two (sum or product), depending on the choice of the link(options) option.  If the spatial effect variable is to be time-variant, then one additionally needs a fifth variable that identifies time.

Often, undirected dyadic datasets are organised such that if dyad ij is contained in the dataset, then dyad ji is excluded, and vice versa. The reason is that one of the dyads contains redundant information given that the value of the variable to be spatially lagged for ij equals that of ji. If the dataset is in this non-symmetric format, then it must be the case that the dataset contains only those dyads for which i is smaller than or equal to j and excludes all dyads for which i is larger than j, which follows common practice. Thus, for example, if i and j both run from 1 to 4, then the dataset would contain the dyads 1-2, 1-3, 1-4, 2-3, 2-4 and 3-4, but would exclude dyads 2-1, 3-1, 3-2, 4-1, 4-2 and 4-3. (Dyads 1-1, 2-2, 3-3 and 4-4 may also be included if a dyadic relationship of a unit with itself is logically possible, which depends on the type of relationship studied.)

It is, however, possible and sometimes convenient for users that an undirected dyadic dataset is organised such that it contains both dyad ij and dyad ji, even though the value of the dependent variable for these two dyads must be the same. It does not matter whether the dataset is kept in the non-symmetric or symmetric format. Users must, however, organise their data in symmetric format if the weighting variable is to be directed as a directed dyadic weighting variable requires a fully symmetric dyadic dataset.

Some users will prefer to work with two separate datasets: one used for the creation of the spatial effect variable, another one that is the actual estimation dataset, into which the spatial effect variable created from the other dataset needs to be merged by hand. In the case of two separate datasets, use the nomerge option. Some other users will prefer to work with one dataset only that contains all the variables needed for the actual estimations as well as all the variables needed for the creation of the spatial effect variable. In this case, use the default option, which merges the created spatial effect variable automatically into the original dataset, which is also the estimation dataset.

Arguments:

lagvar is the variable to be spatially lagged. It is the undirected dyadic dependent variable in spatial lag models, a selected independent variable in spatial-x models and a saved regression residual in spatial error models.

weightvar(varname) is the weighting or connectivity variable linking source unit i with target unit j. It may or may not be directed.

i(varname) is the identifying variable of unit i. It can be a numeric or string variable.

j(varname) is the identifying variable of unit j. It can be a numeric or string variable.

Options:

link(options) is required. The following options are allowed: ik, ki, jm, mj, ik+jm, ki+mj, ik*jm, and ki*mj. Option ik requests that the virtually transformed W is to represent connectivity from unit i to other units k. Option ki requests connectivity from other units k to unit i. Option jm requests connectivity from unit j to other units m. Option mj requests connectivity from other units m to unit j. Option ik+jm requests that W represents the sum of connectivities invoked by ik and jm. Option ki+mj does the same, but for the sum of connectivities invoked by ki and mj. Option ik*jm requests that W represents the product of connectivities invoked by ik and jm. Option ki*mj does the same, but for the product of connectivities invoked by ki and mj.

timevar is an optional argument. If users wish to generate a time-varying spatial effect variable, then the numeric time variable must be stated here.

exclusive specifies that all dyads containing either i or j as either source or target are excluded from having a spatial effect on dyad ij.

norowst requests that the generated spatial effect variable is not row-standardized. See Plümper and Neumayer (2010, 2016) for an explanation and discussion of row-standardization. Row-standardization is the default option.

nomerge requests that the generated spatial effect variable is not automatically merged into the data set.

sename(name) names the generated spatial effect variable. In the default option, if the weighting matrix is row-standardized, then this variable is called SE_var_undirdyad_rowst for aggregate target contagion. If the weighting matrix is not row-standardized, then this variable is called SE_var_undirdyad_norowst. Any previously existing variable with the same name will be replaced.

labelname(name) names the label of the generated spatial effect variable. The default label given is "Undirected dyad contagion spatial effect variable".

filename(name) requests that a dataset containing the generated spatial effect variable is saved in the current working directory under the defined name. In the default option, if the weighting matrix is row-standardized, then a file is saved in the current working directory called SE_file_undirdyad_rowst. If the weighting matrix is not row-standardized, then the saved file is called SE_file_undirdyad_norowst. Any previously existing file with the same name will be replaced.

Examples:

 spundir y, w(exports) i(country_i) j(country_j) time(year) link(ik)
        sename(se_undirdyad) filename(se_undirdyad_file)

 spundir y, w(exports) i(country_i) j(country_j) time(year) link(mj)
        symmdyads norowst

 spundir y, w(exports) i(country_i) j(country_j) time(year) link(ik*jm)
        norowst nomerge 

Example data-, do- and log-files:

(Data) (Do-file) (Log-file)

Installation:

Type "ssc install spundir" in Stata and follow instructions or download the ado- and help-file below into the relevant folder:

spundir.ado  

spundir.hlp

Questions and Errors:

Send any questions and report any errors to e.neumayer@lse.ac.uk.

References:

Neumayer, Eric and Plümper, Thomas. 2010. Spatial Effects in Dyadic Data, International Organization, 64 (1), pp. 145-165. (pdf)

Plümper, Thomas and Eric Neumayer. 2010. Model Specification in the Analysis of Spatial Dependence, European Journal of Political Research 49 (3), pp. 418-442  (pdf

Neumayer, Eric and Thomas Plümper. 2016. W. Political Science Research and Methods, 4 (1), pp. 175-193 (pdf)

Share:Facebook|Twitter|LinkedIn|