Abstract:

Generalized estimating equations (GEEs) are commonly used to fit regression models with clustered data, including logistic models for binary clustered data. This approach yields consistent estimation of mean model parameters and variance estimation that is robust to misspecification of the working correlation structure. Although the effect of using different forms of the working correlation structure has been well studied, the effect of the assumption that the correlation is the same across clusters has not. While the general approach assumes that clusters have the same intracluster correlation coefficient, the correlation may in fact depend on the values of both clusterlevel and individuallevel covariates. We first identify variance formulae when the true correlation depends on binary covariates, both when the working correlation structure is correctly specified and under various misspecifications. Then we describe the impact of misspecification on the asymptotic and finitesample efficiency of the resulting estimators. Implications for the design and analysis of studies with clustered binary data, including clusterrandomized trials, are discussed.
