In the October 2006 paper "Researcher Incentives and Empirical Methods, by Edward L. Glaeser, NBER Technical WP 329." the author provides 10 recommendations for empirical researchers.
Here are the ten recommendations although I suggest visiting the Economist's View link or to go straight to the paper for further detail and motivation.
In this essay, I make ten points about researcher incentives and statistical work. The first and central point is that we should accept researcher initiative as being the norm, and not the exception. It is wildly unrealistic to treat activity like data mining as being rare malfeasance; it is much more reasonable to assume that researchers will optimize and try to find high correlations. This requires not just a blanket downward adjustment of statistical significance estimates but more targeted statistical techniques that appropriately adjust across data sets and methodologies for the ability of researchers to impact results. Point estimates as well as t-statistics need to be appropriately corrected.
The second point is that the optimal amount of data mining is not zero, and that even if we could produce classical statisticians, we probably would not want to. Just as the incentives facing businessmen produce social value added, the data mining of researchers produces knowledge. The key is to adjust our statistical techniques to realistically react to researcher initiative, not to try and ban this initiative altogether.
The third point is that research occurs in a market where competition and replication matters greatly. Replication has the ability to significantly reduce some of the more extreme forms of researcher initiative (e.g. misrepresenting coefficients in tables), but much less ability to adjust for other activity, like data mining. Moreover, the ability to have competition and replication to correct for researcher initiative differs from setting to setting. For example, data mining on a particular micro data set will be checked by researchers reproducing regressions on independent micro data sets. There is much less ability for replication to correct data mining in macro data sets, especially those that include from the start all of the available data points.
Fourth, changes in technology generally decrease the costs of running tests and increase the availability of potential explanatory variables. As a result, the ability of researchers to influence results must be increasing over time, and economists should respond for regular increases in skepticism. At the same time however, improvements in technology also reduce the cost of competitors checking findings, so the impact of technology on overall bias is unclear.
Fifth, increasing methodology complexity will generally give the researcher more degrees of freedom and therefore increase the scope for researcher activity. Methodological complexity also increases the costs to competitors who would like to reproduce results. This suggests that the skepticism that is often applied to new, more complex technologies may be appropriate.
My sixth point is the data collection and cleaning offers particularly easy opportunities for improving statistical significance. One approach to this problem is to separate the tasks of data collection and analysis more completely. However, this has the detrimental effect of reducing the incentives for data collection which may outweigh the benefits of specialization. At the least, we should be more skeptical of results produced by analysts who have created and cleaned their own data.
A seventh point is that experimental methods both restrict and enlarge the opportunities for researcher action and consequent researcher initiative bias. Experiments have the great virtue of forcing experimenters to specify hypotheses before running tests. However, they also give researchers tremendous influence over experimental design, and this influence increases the ability of researchers to impact results. .
An eighth point is that the recent emphasis on causal inferences seems to have led to the adoption of instrumental variables estimators which can particularly augment researcher flexibility and increase researcher initiative bias. Since the universe of potential instruments is enormous, the opportunity to select instruments creates great possibilities for data mining. This problem is compounded when there are weak instruments, since the distribution of weak instrument t-statistics can have very fat tails. The ability to influence significance by choosing the estimator with the best fit increases as the weight in the extremes of the distribution of estimators increases.
A ninth point is that researcher initiative complements other statistical errors in creating significance. This both means that spurious significance rises spectacularly when there are even modest overestimates in statistical significance that are combined with researcher initiative. This complements also creates particularly strong incentives to fail to use more stringent statistical techniques.
My tenth and final point is that model driven empirical work has an ambiguous impact on researcher initiative bias. One of the greatest values of specialization in theory and empirics is that empiricists end up being constrained to test theories proposed by others. This is obviously most valuable when theorists produce sharp predictions about empirical relationships. On the other hand, if empirical researchers become wedded to a particular theory, they will have an incentive to push their results to support that theory.
No comments:
Post a Comment