The study contributes to the development of "standards" for the application of matching algorithms in empirical evaluation studies. Various distance measures and matching processes that are dicussed in the current literature are compared among each other in a simulation study. Supplementary to former studies, the simulation setup strongly orientates on real evaluation situations. This reality orientation requires to focus on small samples, and di_erently scaled variables must be considered explicitly in the matching process. In order to approximate realistic distributions, the random variables in the simulation are generated after the example of the German Microcensus. In the simulation, the Mahalanobis distance and two Balancing Scores are considered because their use is recommended in evaluation literature. Additionally, statistical aggregated distance measures not yet used for empirical evaluation are included. The choice of matching algorithms is orientated on the results of former studies: Replacement Matching, Random Matching, Optimal Nearest Neighbor Matching, Ridge Matching and Optimal Full Matching are analyzed. The matching outcomes of the analyzed distance measures are compared by nonparametrical scale-specific tests for identical distributions of the characteristics in the participant’s and the control group. In small samples, aggregated distance measures are the better choice for summarizing similarities in diffently scaled variables compared to commonly used measures. Regarding the Mean Square Error and its parts, bias and variance, using Optimal Full Matching results in favourable matching outcomes. In terms of the sum of the squared distances - as an approximation for the similarity of the variable’s distributions -, Replacement Matching is able to identify the best control groups. The expected superiority of Optimal Nearest Neighbor Matching is not confirmed by the simulation results. |