Assortative mating

Definition

Assortative mating relates to the phenomenon that people tend to form couples with others, who are similar to themselves. For example, educational assortative mating relates to the phenomenon that people tend to marry and/or cohabit with others from their own educational group. Moreover, people typically do not just happen to match with their like, but sort into couples assortatively. In other words, people tend to prefer educationally homogamous relationships (where the spouses have the same level of educational attainment) rather than pairing up with others who are less educated than themselves. Or, what is empirically equivalent to such an aggregate feature of preferences, is that marital social norms lead more people to form educationally homogamous couples than we would see under random matching.

Educational assortative mating and inequality

Using data on marriages and cohabitations for measuring changes in inequality has many advantages. Most importantly, the data on the education level of couples are far more comparable both across countries and over time than many alternative data on inequality. For instance, data on couples are limitedly subject to various measurement problems, while we cannot say the same about data on individuals’ income. Moreover, census data on who is married to whom (officially or unofficially) are available for much more countries than micro-level data on income (e.g. tax declarations).

The extent to which the degree of sorting along education, or the strength of marital preferences for educationally homogamous mating at the aggregate level, or marital social norms for educationally homogamous mating change from one generation to the next is informative about the change in inequality: if inequality between different education groups increases from one generation to the next, then, all else being equal, the phenomenon under study will be more pronounced in the later generation. Therefore, the trend in overall inequality can be quantified by measuring changes in assortative mating between subsequent generations. Although inequality is multidimensional, the marriage market aggregates it along all dimensions relevant to couple formation (e.g. health, wealth, income).

Assortative mating along any other trait (e.g. race, religion, wealth), inequality and inclusion

The extent to which the degree of sorting along a certain trait (e.g. race, religion, wealth) changes from one generation to the next is informative about the change in inequality and inclusion: if inequality between different groups defined by a given trait, or social inclusion of a group changes from one generation to the next, then, all else being equal, the phenomenon under study (i.e., assortative mating/ marital homophily/ marital preferences at the aggregate level) will be more pronounced in the later generation. Therefore, the trend in overall inequality and inclusion can be quantified by measuring changes in assortative mating between subsequent generations.

Challenge in measurement

The degree of sorting, aggregate marital preferences and marital social norms (the non-structural determinants of the prevalence of homogamy) are not directly observable. However, we can quantify them through their effects on the proportion of educationally homogamous couples after controlling for some other determinants. The other determinants are the structural availability (i.e., the distributions of marriageable men and women along the educational trait) and the interaction between the structural availability and the non-structural factors (i.e., the extent to which preferences/norms/sorting tend to adjust to availability). The challenge in measurement stems from the fact that it is far from trivial to control for the structural availability and the interaction effect.

Evolution of measuring assortative mating

Measuring assortative mating directly with statistical indicators

Initially, statistical indicators were developed and used in the assortative mating literature to characterize the non-structural factors. The statistical indicators were computed from the joint discrete educational distribution of couples for each generation separately (see Naszodi 2023). Different statistical indicators control for the structural factors differently.

In the following, we present 10 statistical indicators. The indicators are computed from the following joint discrete educational distribution of couples and single individuals in a given generation (with two educational levels distinguished):

\begin{darray}{ll c c c c} \textcolor{white}{Q} & & & \textcolor{pink}{Women} \\ & & & \textcolor{pink}{\text{in couple}} & \\ & & \textcolor{pink}{L} & \textcolor{pink}{H} & \textcolor{pink}{sum} & \textcolor{#97c4e2}{single} \\ &\textcolor{#97c4e2}{L} & \textcolor{#aaaaaa}{a} & \textcolor{#aaaaaa}{b} & \textcolor{#aaaaaa}{a+b} & \textcolor{#97c4e2}{e} \\ \textcolor{#97c4e2}{\text{Men in c.}} & \textcolor{#97c4e2}{H} & \textcolor{#aaaaaa}{c} & \textcolor{#aaaaaa}{d} & \textcolor{#aaaaaa}{c+d} & \textcolor{#97c4e2}{f} \\ & \textcolor{#97c4e2}{sum} & \textcolor{#aaaaaa}{a+c} & \textcolor{#aaaaaa}{b+d} & \textcolor{#aaaaaa}{a+b+c+d} & \textcolor{#97c4e2}{e+f} & \\ & \textcolor{pink}{single} & \textcolor{pink}{g} & \textcolor{pink}{h} & \textcolor{pink}{g+h} & \\ \end{darray}

(I1) Odds-ratio (it is the most widely used indicator according to Chiappori, Costa Dias, and Meghir 2021):

\begin{equation} \text{OR}(Q)= ad/(bc) \;. \end{equation}

(I2) Matrix determinant (suggested by Permanyer, Esteve, and Garcia 2013 and applied by Permanyer, Esteve, and Garcia 2019):

\begin{equation} \text{det}(Q)= ad-bc \;. \end{equation}

(I3) Covariance coefficient (applied by Class, Dingemanse, Araya-Ajoy, and Brommer 2017):

\begin{equation} \text{cov}(Q)= \frac{\text{det}(Q)}{(a+b+c+d)^2} \;. \end{equation}

(I4) Correlation coefficient (applied by Kremer 1997 and Fernandez, Guner, and Knowles 2005):

\begin{equation} \rho(Q)=\frac{\text{det}(Q)}{\sqrt{(a+b)(c+d)(a+c)(b+d)}}\;. \end{equation}

(I5) Regression coefficient (obtained by regressing either the male partners’ education on the female partners’ education, or the other way around. The latter indicator was applied by Greenwood, Guner, Kocharkov, and Santos 2014):

\begin{equation} \beta_{wm}(Q)=\text{cov}(Q) \left[\frac{b+d}{a+b+c+d}-\left(\frac{b+d}{a+b+c+d}\right)^2 \right] \; \text{or} \end{equation}

\begin{equation*} \beta_{mw}(Q)=\text{cov}(Q) \left[\frac{c+d}{a+b+c+d}-\left(\frac{c+d}{a+b+c+d}\right)^2 \right] \;, \end{equation*}

depending on whether wives’ dichotomous education variable (taking the value of 0 or 1) is explained by husbands’ dichotomous education variable (taking the value of 0 or 1), or the other way around.

(I6) Aggregate marital sorting parameter (proposed and applied by Eika, Mogstad, and Zafar 2019):

\begin{equation*} {MSP}^{{agg}}(Q)= \frac{{MSP}_L(Q) a +{MSP}_H(Q) d }{a+d} \end{equation*}

is the weighted average of the marital sorting parameters – i.e., MSP_L(Q) and MSP_H (Q) – along the diagonal of the contingency table. Unlike MSP^agg(Q), the marital sorting parameters are local measures of sorting:

for L,L couples it is

\begin{equation*}{MSP}_L(Q)= \frac{{a}/(a+b+c+d)}{a^{\text{counterf.}}/(a^{\text{counterf.}}+b^{\text{counterf.}}+c^{\text{counterf.}}+d^{\text{counterf.}})} \;, \end{equation*}

while for H,H couples, it is

\begin{equation*} {MSP}_H(Q)= \frac{{d}/(a+b+c+d)}{d^{\text{counterf.}}/(a^{\text{counterf.}}+b^{\text{counterf.}}+c^{\text{counterf.}}+d^{\text{counterf.}})} \;, \end{equation*}

where MSP_L(Q) captures the probability that an L-type man marries an L-type woman, relative to the probability under a counterfactual. Whereas MSP_H (Q) captures the same likelihood ratio, but for the H,H-type couples.

If the counterfactual is the random matching, as it is in the paper by Eika et al. (2019), then the denominator of MSP_L(Q) is (a+b)(a+c)/(a+b+c+d)² and the denominator of MSP_H (Q) is (c+d)(b+d)/(a+b+c+d)² making MSP_L(Q) = a(a+b+c+d)/[(a+b)(a+c)] and MSP_H (Q) = d(a+b+c+d)/[(c+d)(b+d)] . Finally, under the random counterfactual,

\begin{equation}{MSP}^{{agg}}(Q)= \frac{a+b+c+d}{a+d} \left(\frac{a^2 } {(a+b)(a+c)} + \frac{d^2 } {(c+d)(b+d)} \right) \;. \end{equation}

(I7) V-value (proposed by Fernandez and Rogerson 2001 and applied also by Abbott, Gallipoli, Meghir, and Violante 2019):

\begin{equation} \text{V}(Q)=\frac{\text{det}(Q)}{A}\;, \end{equation}

where A = (c + d)(a + c) if b ≥ c and A = (b + d)(a + b) if c > b.

(I8) Marital surplus matrix (proposed by Choo and Siow 2006):

\begin{equation} \text{MSM}(Q)= \begin{bmatrix} a/\sqrt{eg} & b/\sqrt{eh} \\ c/\sqrt{fg} & d/\sqrt{fh} \end{bmatrix} \;. \end{equation}

(I9) LL-indicator (proposed by Liu and Lu 2006). The formula of the Simplified LL

\begin{equation} \text{LL}^s(Q)=\frac{d - \text{int}(R) }{\text{min}(b+d, c+d )-\text{int}(R) }\;, \end{equation}

where R = (c + d)(b + d)/(a + b + c + d). The original LL-indicator is identical to its simplied version in case of positively assorted trait such as education.

(I10) Matrix-valued generalized LL-indicator (proposed by Naszodi and Mendonca 2021):

\begin{equation} \text{LL}^{\text{gen}}_{j,k} (Q)= \text{LL}( V_j Q W^T_k ) \;, \end{equation}

where LL^gen_j,k (Q) is the (j, k)-th element of the LL^gen matrix in case of Q is an n × m matrix with n ≥ 2, or m ≥ 2 , or both. Further, V_j is the 2 × n matrix

\begin{bmatrix} \overbrace{1 \cdots 1}^{\text{j}} \overbrace{0 \cdots 0}^{\text{n-j}}\\ {0 \cdots 0} {1 \cdots 1} \end{bmatrix}

and W^T_k is the m × 2 matrix given by the transpose of

\begin{bmatrix} \overbrace{1 \cdots 1}^{\text{k}} \; \overbrace{0 \cdots 0}^{\text{m-k}}\\ {0 \cdots 0}\; {1 \cdots 1} \end{bmatrix}

with j ∈ {1, . . . , n − 1}, and k ∈ {1, . . . , m − 1}.

Measuring assortative mating indirectly with counterfactual decompositions

The next generation of measures of the non-structural factors, represent ceteris paribus effects calculated with counterfactual decompositions. In particular, they represent the ceteris paribus effect of changing non-structural factors from one generation to the next generation, while keeping the structural factor fixed across the generations.

Unlike the statistical indicators, the decomposition-based indicators are designed to control for the interaction effect. Also, they are scalar-valued cardinal measures obtained by projecting the statistical indicators (in some cases characterizing the non-structural factors on an ordinal scale studied and in some cases characterizing it with a matrix-valued indicator) on a scalar-valued cardinal variable, such as the share of homogamous couples. Thereby, the decomposition-based indicators allow us to assess whether inequality/inclusion has changed substantially or just slightly from one generation to the next, also whether inequality/inclusion measured in the same generation is just slightly different or substantially different in two countries under comparison. (Neither the ordinal statistical indicators, nor the matrix-valued statistical indicators allow us to perform such inter-temporal and cross-country comparisons).

The competing decomposition-based indicators control for the structural factors, and thereby also for the interaction effects, differently. Next, we introduce 7 methods for constructing the counterfactuals, each define a particular indicator.

(M1) Iterative Proportional Fitting (henceforth IPF) algorithm, where the non-structural factor (typically referred to as the degree of marital sorting) is kept fixed with the unchanged odds-ratios. The Breen and Salazar (2009) paper is an example for an early application of the IPF in the context of analyzing assortative mating with counterfactual decomposition.

(M2) Matrix Determinant-based Approach (henceforth MDbA), where the non-structural factor is controlled for with the matrix determinant. It was suggested by Permanyer et al. (2013) and applied by Permanyer et al. (2019).

(M3) Minimum Euclidean Distance Approach (henceforth MEDA), where the non-structural factor is kept fixed with the unchanged scalar-valued V-indicator. The V-indicator interprets as the weight minimizing the Euclidean distance between the matrix to be transformed and the matrix obtained as the convex combination of the two extreme cases of random and perfectly assortative matching (where one can marry out of his or her group only if nobody from the opposite sex in his or her group is available). MEDA was applied by Fernandez and Rogerson (2001) and Abbott et al. (2019).

(M4) Choo and Siow (2006) model-based approach (henceforth CSA), where the non-structural factor is kept fixed with the unchanged marital surplus matrix.

(M5) NM-method, where the non-structural factor (referred to as the aggregate marital preferences over a single dimensional spousal trait) are controlled for by the unchanged (generalized) LL-indicator. It was proposed by Naszodi and Mendonca (2021) and first applied by Naszodi and Mendonca (2019).

(M6) GNM-approach, where the aggregate marital preferences over multiple traits (e.g., spousal education level and race) is kept fixed with the unchanged (generalized) trait-specific LL-indicators. It was conceptualized by Naszodi (2021a), while being implemented and applied by Naszodi and Mendonca (2023).

(M7) GS-approach, where matching is made by the Gale-Shapley matching algorithm, while the nonstructural factor, i.e., the aggregate marital preferences over a single dimensional trait is kept fixed with the unchanged gender-specific and education level-specific distributions of the reservation points. It was proposed and applied by Naszodi and Mendonca (2022), who used the search criteria on a dating site as a proxy for the reservation points.

Selecting the suitable indicator and the method for constructing counterfactual

Naszodi (2023) presents (i) an analytical approach of selecting indicators, (ii) a literature review and (iii) a brief introduction to the historical evolution of the American social security system. Based on her analysis, it is the Liu-Lu indicator and the (generalized Liu-Lu indicator-based) NM-method that best suit for characterizing assortative mating with an ordinal indicator and constructing counterfactuals, respectively.

She finds the best cardinal indicator for sorting to be the marital educational inequality indicator (henceforth, MEI-indicator). It captures inter-generational changes in the degree of sorting by quantifying the ceteris paribus effect of the changing non-structural determinants on the prevalence of homogamy. The corresponding ceteris paribus effect is computed with the NM-method-based counterfactual decomposition. A positive (/negative) value of this indicator signals that the overall inequality (covering all dimensions relevant on the marriage market including income, wealth, health, etc) is growing (/diminishing) between the groups studied.

Time series of the inequality indicators

80 country MEI

Tab 2

Tab 3

Tab 4

Tab 5

Tab 6

Tab 7

80 country MEI

2 Western EU countries + USA MEI – U-shaped

7 countries in the Western world MEI – U-shaped

2 Asian countries MEI – U-shaped

4 countries in Latin America (upper middle-income) MEI – U-shaped

countries in Latin America MEI – non-U-shaped

African countries MEI – decreasing