Casual observation suggests that in most U.S. urban labor markets, immigrants have more immigrant coworkers than native-born workers do. While seeming obvious, this excess tendency to work together has not been precisely measured, nor have its sources been quantified. Using matched employer–employee data from the U.S. Census Bureau Longitudinal Employer-Household Dynamics (LEHD) database on a set of metropolitan statistical areas (MSAs) with substantial immigrant populations, we find that, on average, 37 % of an immigrant’s coworkers are themselves immigrants; in contrast, only 14 % of a native-born worker’s coworkers are immigrants. We decompose this difference into the probability of working with compatriots versus with immigrants from other source countries. Using human capital, employer, and location characteristics, we narrow the mechanisms that might explain immigrant concentration. We find that industry, language, and residential segregation collectively explain almost all the excess tendency to work with immigrants from other source countries, but they have limited power to explain work with compatriots. This large unexplained compatriot component suggests an important role for unmeasured country-specific factors, such as social networks.
Over the last several decades, labor markets in many U.S. cities have absorbed large inflows of new immigrants. These new workers appear to be rapidly absorbed into local labor markets, as evidenced by unemployment rates very similar to those of natives (Chiswick et al. 1997). How does this happen? One hypothesis is that social networks act as a conduit between immigrant populations and jobs. Waldinger and Lichter (2003) argued that after initial immigrants establish a “beachhead” in specific occupations or geographic areas, social networks draw new immigrants to those same occupations or areas. Consistent with this, Patel and Vella (2013) found strong evidence that new immigrants to an area take up the occupations of earlier arrivals from their home country and that this pattern helps raise new immigrant earnings. Similarly, Model (1993) found that immigrants who share employment opportunities through their networks obtain employment in higher-paid occupations.
Although occupational sorting may be an important outcome of social networks, ultimately employment requires a connection to a specific firm. Thus, we expect networks to operate partly by helping new immigrants to gain employment with firms that employ others in their social network—likely, other immigrants and, in particular, compatriots. Quantifying and understanding how immigrants sort into workplaces is important because it is increasingly evident that the identity of an individual’s employer has an important role in determining their economic outcomes. Estimates of the determinants of an individual’s earnings have typically found that only about 20 % of earnings variation is accounted for by observable worker characteristics, such as education and work experience. Analysis of matched employer–employee data shows that roughly one-half of variation in earnings is accounted for by differences in mean pay across firms, even after controlling for unobserved worker effects.1 Evidence also indicates that widening differences in pay across firms accounts for much of the recent rise in earnings inequality (see, e.g., Barth et al. 2011; Davis and Haltiwanger 1991). These findings imply that sorting across employers is critically important in determining earnings. Thus, differences in how immigrants and natives sort across firms are likely to be important for understanding why economic outcomes for immigrants differ from those of natives.
We are not the first to address this question. Hellerstein and Neumark (2008) found substantial sorting across workplaces by race and ethnicity, and Hellerstein et al. (2011) found evidence that those living in the same neighborhoods are particularly likely to work together, although neither study looked specifically at immigrants. Portes and Wilson (1980) found that not only do Cuban immigrants in Miami work together, but many work for firms owned by Cubans; and García-Pérez (2009) found that immigrant-owned small firms are particularly likely to hire immigrants.
Our work here adds to this literature in three ways. First, we systematically quantify the relative contributions of worker, employer, and locational characteristics in explaining the extent to which immigrants work with different employers than natives do. Second, we examine how concentration varies across 18 source countries. Third, we analyze how the likelihood of working with compatriots differs from the likelihood of working with immigrants from other countries.
We find that immigrants are much more likely to have immigrant coworkers than are natives, but at the same time, only a small share work in immigrant-only workplaces. Using a multivariate framework, we find that observable worker, employer, and locational characteristics together account for at least one-half of observed concentration. We find that immigrants are particularly likely to work with compatriots, but they are also somewhat more likely to work with immigrants from other countries than are natives. Our rich set of worker, firm, and locational characteristics accounts for virtually all the excess probability that immigrants work with immigrants from other countries, but leaves most compatriot concentration unexplained. These findings suggest that although immigrants work together partly because they often have similar skill levels and work in similar jobs, unmeasured country-specific factors also play an important role. A natural interpretation of these unmeasured factors is that country-specific social networks are at work.
Our work draws primarily on the literature explaining sorting of workers into firms. This literature has identified four types of sorting that may contribute to segregated workplaces: (1) based on productive characteristics of workers, (2) based on the information available to workers and employers, (3) resulting from the residential location of workers relative to business locations, and (4) resulting from preferences of workers and employers. Because we have no direct measures of tastes, our empirical analysis focuses on factors (1)–(3).2
There is substantial evidence of segregation by skill. For example, Kremer and Maskin (1996) found a high and rising correlation between coworker skill levels in firms during the 1970s and 1980s in the United States, Britain, and France. A positive correlation in skills may occur either because a firm demands workers of a particular skill level or because coordination within a firm requires that workers share a common skill, such as speaking a particular language.3 Skill-based sorting could lead to workplace segregation of immigrants from natives because immigrants are more likely than natives to have an eighth grade education or less but are also more likely to have an advanced degree. Therefore, employers that hire exclusively low-skilled or exclusively high-skilled workers will tend to have above-average immigrant employment shares.
If a shared language increases worker productivity, employers may choose workforces in which everyone speaks the same language. If so, immigrants from non-English-speaking countries will be particularly likely to be segregated and may also be particularly likely to work with immigrants who speak their language. Lang (1986) developed a formal model of wage differences that arise because employers must pay a premium for bilingual workers who can bridge the language barrier. His model implied that complete segregation would occur if sufficient capital were owned by each language group. Several authors have found evidence consistent with such segregation by language. Hellerstein and Neumark (2008) (hereafter, HN) found evidence that Hispanics with poor English-language skills are particularly likely to work with other Hispanics. Portes and Wilson (1980) found that Cuban immigrants in Miami work together, and many work in firms owned by other Cubans. García-Pérez (2009) also found supporting evidence that immigrant-owned small firms (mostly Hispanic- or Asian-owned) are more likely to hire immigrants than are native-owned small firms.
Information-based theories focus on mechanisms that match workers to jobs. For example, if people interact outside of work mostly with others who have similar characteristics, employer use of employee referrals and/or employee use of personal contacts to find jobs will increase workplace segregation. Holzer (1987, 1988) and Montgomery (1991) found evidence that use of referrals and personal contacts may lower the costs of finding good matches. Elliot (2001) found that recent Latino immigrants are more likely than blacks or Latino natives to use personal contacts to find jobs. Weak English skills explain much of this difference. A greater reliance on referrals in small workplaces combined with a concentration of recent immigrants in small firms also contribute to the difference.
Information flows may combine with residential segregation to generate workplace segregation. Immigrants’ places of residence are spatially concentrated (see, e.g., Iceland 2009), and neighbors may provide important job contacts and references. Several studies have found that those working in the same place are disproportionately from the same neighborhoods. For example, Ellis et al. (2007) and Wright et al. (2010) found strong links between the residential concentration of immigrant groups in Los Angeles and their concentration by workplace tract and industry. Using data from Boston, Bayer et al. (2008) found that a worker is about one-third more likely to work with other residents of their census block as to work with residents of other blocks in their block group.4
Hellerstein et al. (2008) (hereafter, HNM) also presented evidence of the importance of neighborhood network effects. Using a matched employer–employee data set that they developed, they found that for whites, another worker living in the same census tract has twice the probability of working in the same establishment as what one would expect from randomness. They found particularly large effects for Hispanics with poor English language skills and Hispanics who are immigrants. We draw on their work for ways to capture the importance of network effects in determining the distribution of workers. Because our aim is to identify the importance of these effects in accounting for immigrant concentration, whereas HNM’s goal was to establish the importance of networks for labor markets more generally, our results are not directly comparable with theirs. However, given their extensive work in this area, it is worth briefly clarifying how our analysis differs from their work.
The core differences stem from our more complete data on the immigrant status of coworkers. Using the 1-in-6 decennial long-form sample, HN matched workers’ write-in reports on place of work to employer addresses on the U.S. Census Business Register (a list of all employer establishments). They matched 29 % of long-form workers to their work location, giving them a sample of roughly 1-in-20 workers in the United States. In order to calculate the fraction of coworkers who are immigrants, immigrant status must be observed for at least two workers at an establishment. In HNM’s sample, requiring at least one coworker in the long-form data reduces a worker’s probability of inclusion from 1-in-20 to about 1-in-205 for workers at three-employee establishments, while having almost no effect on the probability of inclusion for workers at establishments with 80 or more employees.5
HNM recognized these issues. To account for them, they used an elegant simulation approach that compares observed segregation with what one would expect to observe in their sample if employers hired randomly, drawing on statistical methods developed in Carrington and Troske (1997). If observed concentration is significantly greater than expected, this is taken as evidence of nonrandom hiring. HNM also carried out these simulations, allowing hiring to be random within a limited number of strata. If within strata the observed and expected concentration are the same, HNM took this as evidence that these strata explain the unconditional level of worker concentration. This method works well as long as the number of stratification variables is small.
Because our data (described in more detail in the next section) contain immigrant status for all coworkers of each worker in our sample, we do not need to take within-establishment sampling variation into account in our analysis. This allows us a more flexible approach than that used in HN, which in turn makes it possible for us to examine a wider set of characteristics. Our results will show that controlling for many characteristics simultaneously matters in this context. For example, we find that adding other controls reduces by about one-third the share of concentration that we would attribute to language proficiency differences, although this factor remains important. Our dense sample also readily permits analysis of concentration by country of origin. The latter is a distinctive feature of our analysis relative to this recent literature and yields some of our most interesting and novel results.
Methodology and Data
We construct a cross-sectional sample of workers in selected MSAs by combining data from the Longitudinal Employer-Household Dynamics (LEHD) database and the 2000 Decennial Census 1-in-6 long form.6 Because April 1 is the reference date for the census, we use information from jobs held in the second quarter of 2000. The LEHD database draws much of its data from complete sets of unemployment insurance (UI) earnings records for a subset of U.S. states. Workers’ earnings records have been matched to characteristics of their employers drawn from quarterly administrative UI reports and from U.S. Census Bureau business censuses and surveys.7 Basic demographic data—including country of birth—are available for all workers. Geocoding of addresses for both employers and places of residence allows us to examine characteristics of both locations. The LEHD data have the important advantage of allowing us to measure country of origin for all coworkers of the individuals in our matched sample. Their main disadvantage for studying immigration is that they include only on-the-books employees, leaving out the self-employed and those working in the informal sector. Thus, they likely have poor coverage of undocumented immigrants. Coverage of employment in agriculture is incomplete, so we exclude that sector.
Each quarterly wage record includes a UI account number that identifies the employee’s firm of employment within a state in a specific quarter. Where firms have more than one location within a state, the LEHD data identify each separate location (establishment). Workers employed by a multi-establishment firm are assigned to specific establishments within a state through multiple imputations based on a rich set of information, including the location of the firm’s establishments in that state, the worker’s place of residence, and the employment histories of both worker and establishment.8
We match to the 2000 long-form sample to obtain two additional variables that are likely to be important in this context: education and English proficiency. Of all UI–covered workers in our sample of MSAs, we match approximately 1 in 10 foreign-born workers and 1 in 9 native workers. Matched workers have a slightly lower immigrant coworker share than does the complete set of UI workers in our sample of MSAs, and there seems to be a tendency for older longer-tenure workers at large establishments and in older, multi-unit firms to be overrepresented in the matched sample. Generally, these differences are small, however. To adjust for differences in match rates associated with observable characteristics, we create weights for the matched sample based on a regression model of the propensity for UI workers to match to the long-form data.9 Using these weights, regression results that exclude education and language controls are very similar whether we base them on the matched sample or the complete UI earnings sample.
We base our analysis on the matched sample but compute our dependent variable (coworker share) and several geographic controls using all applicable workers in the LEHD database. We limit our sample to workers employed in 31 selected metropolitan areas (MSAs) in 11 states (California, Colorado, Florida, Illinois, Maryland, Minnesota, New Jersey, North Carolina, Oregon, Pennsylvania, and Texas), with our choice of areas based on the presence of substantial immigrant populations and the availability of data for a state. Although we use a small number of states, they include five of the six states in which the 2000 foreign-born population exceeded 1 million. In addition to cities with large immigrant populations, we also include several MSAs with smaller immigrant populations but with very rapid growth in foreign-born residents between 1990 and 2000.10 We include all matched employees of nonagricultural businesses located in a sample MSA, regardless of whether they live in the MSA. This gives us a sample of 3.5 million workers, with more than 3,000 immigrant workers in our sample for each of our MSAs.
The average immigrant workforce share across our 31 MSAs is 18.7 %, but immigrants account for less than 11 % of the workforce in eight MSAs, and they account for more than 35 % of the workforce in three MSAs. Even with random assignment to jobs within a local labor market, these substantial differences across areas would make immigrants more likely to work together than to work with natives, simply because immigrants are disproportionately in the MSAs with high immigrant shares. Because our interest is in how workers are matched with employers within a local labor market, we include MSA dummy variables in all our specifications so that estimates are based on within-MSA variation.
Figure 1 plots the cumulative distribution of immigrant coworker shares for natives and for immigrants as of the second quarter of 2000. In our sample of immigrant-rich MSAs, 10 % of natives work in native-only workplaces, but the share of immigrants working for immigrant-only businesses is considerably smaller (2.8 %). About 10 % of the median native’s coworkers are immigrants, but for the median immigrant, the share is about 32 %. For reference purposes, we include a third line giving the cumulative distribution that would apply if immigrants and natives were randomly assigned to employers in a manner that preserves the size distribution of employment. This simulated distribution depends only on the overall immigrant share and the size distribution of employment. By assumption, the random assignment distribution is identical for immigrants and natives.
Clearly, the observed distributions are inconsistent with random assignment. Because the likelihood of extreme values occurring randomly is quite low in large samples, and because large employers account for a substantial share of employment, about 60 % of workers would have between 17 % and 20 % immigrant coworkers if workers were grouped randomly. The share with only native coworkers would be well below the 10 % observed for natives (but only a bit above the 2.2 % observed for immigrants), and the share of employees working only with immigrants would be close to zero. Overall, it is apparent that native-born workers are far less likely to have immigrants as coworkers than are immigrants.
Our analysis focuses on the mean difference in coworker shares between immigrants and natives, given in the first row of Table 1. For the average native in our set of MSAs, about 14 % of coworkers are immigrants, and 37 % of the coworkers of immigrants are immigrants. The immigrant-native difference in coworker means—our measure of concentration—is 22.9, indicating substantial concentration.
The following rows of Table 1 give demographic information that might help explain this concentration. Immigrants are relatively underrepresented among those younger than age 25, reflecting the fact that many arrive in the United States as young adults. Men substantially outnumber women among working immigrants; among working natives, men are more narrowly in the majority. Differences between immigrant and native women in rates of labor force participation likely contribute to these gaps. Immigrants are much more likely to not have completed high school than are natives, but immigrants are also overrepresented among those with advanced degrees.
The category “Speaks English very well” consists of those who report that they speak English “very well” along with those who speak only English at home. Unsurprisingly, immigrants are more likely than natives to fall into categories other than “very well,” but even the category “Not at all” includes some natives.12 Mean log earnings on the primary (highest earnings) job are very similar for immigrants and natives, and immigrants are more likely than natives to work for their 2000-Q2 employer in at least one of the surrounding quarters. Differences in job tenure likely contribute to the slightly higher earnings of immigrants because most transitory jobs will involve less than three full months of work and thus are likely to have particularly low quarterly earnings. These jobs may also be associated with relatively low wage rates and part-time work.
We find only minor differences between immigrants and natives in broadly defined employer characteristics. Immigrants are more likely to work in the smallest establishments and less likely to work in the largest, but overall, the differences by employer size are small, as are differences by establishment age. However, immigrants are less likely than natives to work for multi-unit firms. Immigrants are more concentrated in manufacturing than are natives, but the differences by broad sector are otherwise not particularly large.
The last three rows of Table 1 give means for three additional measures that we construct to explore the relationship between workplace concentration and neighborhood networks. Each of these is based on information on worker tract of employment and/or tract of residence.13 Because we have data only on those who work, we base these variables on workers residing in a particular tract rather than all residents of the tract.
The first measure is simply the share of immigrants in a worker’s tract of residence, which we use to control for residential segregation. Neighbors act as contacts and references for job opportunities, so concentration of immigrants in the neighborhood can contribute to immigrant concentration in the workplace. As can be seen in Table 1, immigrants in our sample of MSAs are substantially more likely to live in tracts with high immigrant shares than are natives, but even so, the majority of their neighbors are natives.
We construct a second variable for each worker by calculating the share of employees at other businesses located close to his employer who also live in the worker’s residential tract. The denominator is the number of employees working for other employers in a worker’s tract of employment. The numerator is the number among that group who live in the worker’s residential tract.14 Proximity or convenient transportation links may make residents of certain neighborhoods likely to work at a particular location, resulting in a relationship between workplace and residence. This measure of the general propensity for workplace and residence locations to be connected will control for commuting patterns that influence concentration. We refer to this as our shared commute index. For the average worker, there is not a strong association between the tract of the employer and particular tracts of residence: the mean for this variable is only 0.3 % for immigrants and 0.5 % for natives.
Our third measure is intended as a proxy for the presence of a specific type of neighborhood-based social network. Neighborhood contacts and references may make neighbors more likely to be coworkers. For each worker, we calculate the fraction of their coworkers who also reside in the worker’s tract of residence. So, for example, if a business hired three workers from each of four different residential tracts, each worker would have a neighborhood network index of 2/11, given that two of their 11 coworkers would be from their neighborhood. The mean of the network index is small: for both immigrants and natives, 1.9 % of coworkers live in the same tract. However, it is larger than the shared commute index, suggesting that residential location is more strongly connected to firm of employment than geographic area of employment.
Our empirical approach is based on a series of regressions with the coworker share as the dependent variable and individual workers on their primary job as the unit of analysis. To ease computation with over 3 million workers, we use linear regression rather than adopting an approach that accounts for the limited range of the dependent variable. As Fig. 1 illustrates, most of the mass of the distribution is not at either 1 or 0, which mitigates some of the problems inherent in the linear model. There is a strong positive correlation in the coworker share among employees of the same business that generates a downward bias in conventionally estimated standard errors in all worker-level regressions. To avoid this, we use the Huber-White variance estimator, allowing for arbitrary correlation of errors among employees of the same establishment.
Accounting for Immigrant Concentration
Table 2 presents estimates of γIbase and γImain in rows 1 and 2. In the remaining rows, the contributions of sets of covariates are given as percentages of total within-MSA concentration (i.e., δk/γIbase). In the first column, average within-MSA concentration (γIbase) is 17.1; that is, the average share of coworkers who are immigrants is 17.1 percentage points more for immigrants than for natives working in the same MSA. Comparing that with the overall difference (22.9) reported in Table 1, MSA effects alone account for about one-quarter of the total concentration. Controlling for observable employee and employer characteristics reduces estimated concentration from 17.1 (γIbase) to 8.3 (γImain), which is roughly a 50 % reduction. Three factors stand out as important in the decomposition: English language skills, industry of employment, and the share of a worker’s neighbors who are immigrants. Together, these account for 48 % of within-MSA concentration, with the next runners-up (education and the interaction of firm age with multi-unit status) contributing about 1 % each.16
Language skills make a large contribution to explaining concentration both because most of those who do not speak English well are immigrants, and because of the substantial increase in coworker share associated with reduced English proficiency even when controlling for numerous other factors. Given the large share of U.S. immigrants of Hispanic origin, it is worth comparing our findings with HN’s findings on the importance of language for Hispanic/white concentration. Using the same language grouping (and controlling only for MSA), HN found that about one-third of all Hispanic/white within-MSA concentration is attributable to segregation by language. In our sample, if we include only language and MSA controls, language explains 28 % of overall immigrant concentration. Using the broader set of controls given in Table 2, we attribute about 18 % of overall concentration to language. In both cases, language is important, but looking at language separately from other factors produces results that overstate its importance. This highlights the value of the multivariate approach used here.
The substantial contribution of industry comes about because the distribution of employment across detailed industries is quite different for immigrants and natives. This seems somewhat surprising given that the distribution across sectors in Table 1 shows only modest differences. To explore this, we split the contribution into differences in immigrant employment by sector and then into the contributions of detailed industry within-sector. This split is somewhat sensitive to how the detail is specified, but using the modal three-digit industry within each sector as omitted categories (as we do here), differences across broad sectors (particularly the high share of immigrants in manufacturing) and differences across detailed industries within services both appear to be important.
The other striking result is the almost one-third contribution of residential segregation across census tracts within MSAs. As noted in previous literature, this points to a very strong relationship between living and working with immigrants. Note that neither of the other tract-level variables (the network index and the shared commute index) accounts for much of the concentration. The network variable has a positive and statistically significant effect (not reported here), which is consistent with the hypothesis that network effects increase the likelihood of working with immigrants. However, the network variable cannot account for much immigrant concentration because there is little difference between immigrants and natives in its mean value. This latter point is important for interpreting the results of Table 2. Other factors such as establishment size and firm age interacted with multi-unit status have statistically significant estimated effects (not reported here) but account for relatively small shares of immigrant concentration because mean values differ little between immigrants and natives.
The second column of Table 2 performs the same exercise except that for immigrants, the dependent variable is the fraction of coworkers from non-U.S. countries other than their own. With this specification, γI is positive if immigrants are more likely to have coworkers from other source countries than natives are to have immigrant coworkers. Row 1 shows that roughly one-quarter of workplace concentration stems from this cross-source-country concentration. Row 2 shows that controlling for worker, firm, and locational factors entirely explains this excess probability of working with noncompatriots. Looking at the decomposition, it is clear that firm and locational factors play a more important role than do worker characteristics in explaining why immigrants work with noncompatriots. This concentration largely reflects that some industries have large immigrant workforces (from various countries of origin) and that living in a tract with a high fraction of immigrants has a strong association with the chances of working with both immigrants from other countries and with compatriots. In the next section, we examine this novel finding about the difference in how much we can account for own country versus other country concentration for immigrants.
Our data permit further exploring patterns of concentration by examining how they vary by country of origin. That is, we can estimate how likely it is for an immigrant from Mexico (for example) to have coworkers who are from Mexico versus those who are from El Salvador or China. We then explore differences across countries of origin in the factors accounting for within country-of-origin concentration. To make this manageable, we rank countries of origin by their share of employment in our sample and then carry out some analyses separately for immigrants from the 18 largest source countries. Table 3 lists these countries and gives their sample shares in the first column.17 In the row labeled “Other,” we group immigrants from the many source countries with smaller shares than those on our list. The remaining columns of Table 3 present statistics on three factors that are potentially important for patterns of concentration: English language proficiency, the share of neighbors who are immigrants, and levels of education.
As we emphasized in discussion of the Gelbach decomposition, for a characteristic to have a sizable effect on concentration, the difference between its mean for immigrants and for natives must be large. Unsurprisingly, immigrants from other English-speaking countries such as Great Britain, Canada, and Jamaica report English skills much like those of natives, but immigrants from Germany are also very unlikely to report difficulties with English. For these countries, language skills cannot be important, but for many of the other source countries with a large share of members with limited English—such as the Dominican Republic and China—language skills could play a sizable role in explaining differences in concentration.
Immigrants from Canada, Germany, and Great Britain also have patterns of residential segregation that closely resemble those of natives, with natives accounting for over 80 % of neighbors. Immigrants from Guatemala and El Salvador stand out as being most likely to live with immigrants from other countries, with Mexico accounting for the majority of their nonnative neighbors. Haitian and Dominican immigrants stand out as having large shares of compatriots among their neighbors, particularly given their sample shares.
Finally, the last two columns of Table 3 show the education distribution by group. As discussed earlier, the immigrant population as a whole includes larger shares of both those who have not completed high school and college graduates than the native population. As Table 3 shows, China is the only single source country that shows this pattern. Immigrants from other countries of origin tend to be overrepresented in either the lower or upper tail of the education distribution relative to natives, but not in both tails.
Table 4 presents estimates of concentration by country of origin for our 18 source countries. Each estimate is from a separate regression. The specifications used are analogs to Eqs. (2) and (3) but with different dependent variables: the country-specific coworker share in the “Own country” columns (e.g., share of coworkers who are Mexican immigrants in row 1), and the immigrant coworker share excluding that country of origin for the “Other country” columns (e.g., share of non-Mexican immigrants among coworkers in row 1). Each estimate is the coefficient on an indicator variable for being an immigrant from that row’s country.18 The first and second columns include only country and MSA dummy variables as controls; in the third and fourth columns, we add the other sets of variables used in Table 2. However, we split the residential segregation measure used in Table 2 into 18 country-specific shares and the remainder, which is the share of neighbors who are immigrants from countries other than those listed.
The first entry indicates that for the average Mexican immigrant, the share of coworkers who are Mexican is 15.7 percentage points higher than the share for the average native within the same MSA. The entry in the second column shows that for Mexican immigrants, the share of coworkers who are immigrants from other countries is only 2.1 percentage points higher than the share of non-Mexican immigrant coworkers for natives.
For most countries of origin, immigrants are much more likely to work with their compatriots than with other immigrants. There are two types of exceptions. Immigrants from three countries (Germany, Great Britain, and Canada) are roughly as likely to work with compatriots or other immigrants as natives. The other exceptions are immigrants from El Salvador, Guatemala, Taiwan, Jamaica, and the Dominican Republic—countries with sizable own-country effects, but even larger other-country effects. Based on results that we do not present here, for Salvadorans and Guatemalans, this largely reflects a propensity to work with immigrants from Mexico. Given such a propensity, the large other-immigrant effect likely reflects the fact that Mexican immigrants greatly outnumber Salvadoran and Guatemalan immigrants in our sample of MSAs. Immigrants from Taiwan are quite likely to work with immigrants from mainland China; Dominican immigrants are quite likely to work with Cubans; and Jamaicans, with Haitians. Although some of these cross-country patterns suggest the importance of a shared language, countries with a shared language may share other characteristics as well. There is no such tendency for Cubans to work with Mexicans, Salvadorans, or Guatemalans, despite a shared language.
The third and fourth columns of Table 4 report estimates of the same coefficients when we include our full set of covariates. A comparison of the first and third columns shows how much the added controls contribute to accounting for concentration measures by country of origin. For Mexico, adding covariates reduces own-country concentration by close to one-half, from 15.7 to 8.7—roughly similar to the magnitudes we observed in Table 2 for all immigrants. There is a similarly large reduction in concentration for Cubans as well as reductions in the range of 20 % to 30 % for Salvadoran, Guatemalan, Haitian, Jamaican, and Dominican immigrants. However, among Asian immigrant groups—particularly Korean and Japanese immigrants—adding covariates only modestly reduces concentration.
Although observable factors only partially explain compatriot concentration, for most countries of origin, these factors fully explain the excess tendency to work with immigrants from other countries. With the full set of controls, only immigrants from El Salvador, Guatemala, China, and Taiwan appear substantially more likely than natives to work with immigrants from other countries; even for these countries, covariates explain more than two-thirds of the excess noncompatriot concentration for all but Taiwan. The final row in Table 4 shows that the average unexplained other-country concentration is 0, whereas about two-thirds of own-country concentration remains unexplained.19
Table 5 presents the Gelbach decomposition for own-country concentration, and Table 6 presents the other-country decomposition. We group variables as we did in Table 2 except that we split the residential segregation measure between compatriots and other immigrants. The three factors that account for most of overall concentration are also the primary factors when we look at concentration by country: residential segregation, English language skills, and industry of employment. However, the importance of these factors differs for own-country versus other-country concentration and varies substantially across country groups. Residential segregation accounts for virtually all (92 %) the explained variation in own-country concentration for Cubans as well as the majority of explained variation for all countries except for Mexico, India, and the Philippines. Residential segregation has substantial explanatory power for Cubans because 31 % of the neighbors of Cuban immigrants are Cuban, but less than 1 % of natives’ neighbors are Cuban. For both Cubans and natives, having a neighbor who is Cuban makes it more likely to have Cuban coworkers, but the large difference in the propensity for a neighbor to be Cuban dominates here. We stress the accounting nature of this exercise. Those who live with immigrants from a particular country are quite likely to work with immigrants from that country as well, suggesting that common factors underlie those patterns, but not that one causes the other. Looking back at Table 3, it is clear that immigrant neighborhoods often include immigrants from several countries of origin, rather than consisting of ethnic enclaves for a single country of origin.
The industry distribution of employment accounts for more than one-half of the explained concentration for immigrants from India and the Philippines, and residential segregation and the industry distribution of employment each count for about 40 % of explained variation for immigrants from Mexico. Although we found that English language skills were an important factor for overall concentration, these skills make relatively small contributions to explaining own-country concentration for all countries but Mexico.
In contrast, Table 6 shows that English proficiency is the most important factor in accounting for other-country concentration for 12 of the l8 countries. The countries in which language is unimportant are the six countries from which at least 95 % of immigrants speak English well or very well. It seems clear that part of other-country concentration reflects coworkers who speak languages that are shared by more than one country of origin: particularly, Spanish and Chinese. However, it is likely that low English language proficiency is correlated with low levels of other skills and that this pattern results partly from firms hiring low-skilled immigrants from several countries of origin. This seems likely to reflect concentration in jobs where verbal communication is not very important rather than jobs in which a shared non-English language facilitates communication.
Industry also plays a larger role in accounting for other-country concentration than own-country concentration, reflecting immigrant-intensive industries in which employers often hire from more than one country of origin. Finally, the other-country results also show evidence of a strong tie between living with other immigrants and working with them. With the exception of countries with little overall concentration, living with noncompatriot immigrants accounts for 12 % to 26 % of other-country concentration. Some of this may reflect how shared languages affect residential patterns, but many immigrants have coworkers from source countries that do not share their language. For example, immigrants from Taiwan and China are disproportionately likely to live with each other, but both groups are also more likely than natives to live with other Asian immigrants. Within MSAs, Chinese immigrants are 2.4 % more likely than natives to live with immigrants from Taiwan; but they are 2.9 % more likely than natives to live with immigrants from Vietnam and roughly 3 % more likely to work with immigrants from other Asian countries.
Before concluding, we briefly discuss richer specifications that permit a full set of interactions between our explanatory variables and dummy variables for own and other group. In unreported results, we find that such specifications don’t change the overall messages of our decompositions, but the interactions yield some additional insights.20 In Table 7, we report only the coefficients on main and interaction terms involving our language and residential segregation measures—two variables that are of particular interest in these richer specifications. Each specification used here includes residential shares for our 18 largest countries (plus an other category), but in each row, we present only the coefficients on terms involving residential shares for the country used to define the dependent variable. So in the first row, we report the main/own/other coefficients on the share of neighbors who are from Mexico.
The “main” effects in the first column give the effect of speaking English poorly for natives. In the first row, for natives, speaking English poorly is associated with a 1.6 percentage point increase in the probability of working with immigrants from Mexico, relative to natives who speak English very well. The “own” effect gives the difference in the effect between workers from the designated country and natives, and the “other” column gives differences in the effect between immigrants from other countries and natives. For immigrants from Mexico, speaking English poorly has a slightly larger effect on the probability of working with immigrants from Mexico than it does for natives; for immigrants from other countries, the effect is smaller than for natives. Looking down the table, the effect of speaking English poorly on the probability that natives work with any of our source countries is tiny for all countries except Mexico. This reflects that most natives with limited English proficiency speak Spanish and that Mexico is our largest source country. Combining main and interaction effects, the implied effect of not speaking English well for immigrants from Mexico (1.9 + 1.6 = 3.5) is within the range of implied effects for own group for other countries. Similarly, although the “other” interaction has a relatively large negative coefficient in the row for Mexico, it is offset by the main effect.
The own-country effects vary widely across countries, with the largest effects generally found among immigrants from Asia and Poland. One possible explanation is that the language effects are particularly strong for those whose first language is linguistically distant from English. The other-country effects are generally small, particularly compared with the own-country effects.
In contrast, all the main effects and most of the own-/other-country effects for residential segregation are positive; in all rows of Table 7, having neighbors from a particular country of origin is positively associated with having coworkers from that country for both natives and immigrants. There is, however, considerable variation across countries. Asian countries (e.g., China, Japan, Korea, Taiwan, and Vietnam) all have especially large own-country effects, implying that immigrants from those countries with many compatriots among their neighbors are much more likely to work with compatriots than those who live primarily with natives.
Using matched employer–employee data that comprehensively cover employment in our sample of MSAs, we find that immigrants are much more likely to work with one another—and hence are less likely to work with natives—than would be expected given random allocation of workers. This is driven partly by the distribution of immigrants across MSAs, but within MSAs, substantial concentration remains. Immigrants who work together are quite likely to be compatriots, particularly those who have poor English language skills. However, immigrants from different countries of origin are also more likely to work together than to work with natives.
We find substantial differences between immigrants and natives in three factors that have strong associations with concentration: industry of employment, the share of immigrants among neighbors, and (unsurprisingly) English language skills. As a result, these three factors (especially own-country residential segregation) are the most important in accounting for overall concentration. In contrast, although having coworkers who are neighbors has a substantial positive relationship with the share of immigrants among an immigrant’s coworkers, natives and immigrants differ little in the extent to which coworkers are neighbors. Thus, the share of coworkers who are neighbors accounts for very little concentration.
Sizable contributions from language and detailed industry in accounting for immigrant concentration are consistent with an important role for sorting on productive characteristics. Those speaking limited English are quite likely to work with compatriots, which is consistent with the need for a shared language to facilitate coordination within the workplace. Differences in the industries employing natives and immigrants likely reflect sorting on the kinds of skills that the two groups bring to the labor market. However, traditional measures of skill levels—education and earnings—do not account for much concentration.
Our results highlight the importance of taking several factors into account simultaneously. For example, other studies have found a role for language skills in explaining related outcomes. We find that language matters, but its contribution is diminished when we take into account own-country residential segregation and employer characteristics.
The characteristics we measure fully account for patterns of other-country concentration. In contrast, we can account for less than one-third of compatriot workplace concentration for 15 of the 18 source countries we consider—Mexico, Cuba, and the Dominican Republic are the exceptions. We interpret our success in explaining other-country concentration as suggesting that we have done reasonably well in identifying factors that lead nonnatives to end up grouped in workplaces. However, that success makes the large unexplained compatriot component of concentration a puzzle: workers’ excess tendency to work with their compatriots must be largely associated with factors we have not measured. Identifying such factors should be a high priority for future research. We view country-specific social networks centered on something other than U.S. tract of residence as one likely candidate.
We thank the NIH for financial support and participants at the SOLE and WEAI meetings and workshops at the University of Chicago, the University of Kentucky, and the Center for Economic Studies for helpful comments. Any opinions and conclusions expressed herein are those of the authors and do not necessarily represent the views of the U.S. Census Bureau or the Comptroller of the Currency. All results have been reviewed to ensure that no confidential information is disclosed.
Abowd et al. (1999) presented evidence on this for France. Abowd et al. (2005) summarized related and largely consistent evidence for the United States. Understanding why firms matter so much for key economic outcomes for workers remains an open and active research question.
We do not have space here to fully review the broad range of social science research that has considered the clustering of firms and workers across geographic locations. For example, in the geography literature, Allen Scott has written extensively in this area; see Scott (1988, 2006) and references therein. We also note the contributions of Massey (1984) and Nee et al. (1994).
Cabrales et al. (2008) emphasized a different skill-based sorting mechanism: if a worker’s utility depends on both absolute and relative wages and movement of workers is costless, complete segregation by skill is optimal.
Our neighborhood network index described later yields patterns similar to that found by Bayer et al. (2008). We find that the mean fraction of an employer’s workforce that lives in the same tract is about 1.9 % for both natives and immigrants. The roughly equivalent statistic from Bayer et al. (2008) is 0.94 %.
Even at establishments with at least two matched long-form workers, HNM have data on only a subset of workers, which leads to variation in estimates of worker characteristics. With a 1-in-20 sampling rate, only 43 % of 50-employee establishments would have five or more matches. To see the effects of this sampling variation, consider an establishment with 50 workers, 25 of them immigrants; if only four workers are matched, the probability of observing the actual immigrant share (=0.5) is only 37.5 %; with 12.5 % probability, the establishment would have a measured immigrant share of either 0 or 1.
Lengermann et al. (2004) also took advantage of the LEHD database to explore variation in immigrant concentration across employers, but with a focus on explaining immigrant–native earnings differences.
See Abowd et al. (2006) for a full description of the database.
See Abowd et al. (2006) for details. We use all 10 implicates in our analysis for the establishment characteristics and assign each a 1/10 weight.
The model uses the following variables: worker age and sex; 11 country-of-origin groups; log earnings; whether the worker was employed for each of quarters 1, 2, and 3 of 2000; three-digit industry; MSA; working population density; establishment age and size; and the number of establishments owned by the firm. With this set of controls, immigrant status has a relatively modest negative association with the probability of matching. Our objective is for weighted estimates to match statistics that we compute from the full set of LEHD workers. Tables S1 and S2 in Online Resource show that even on an unweighted basis, we do well. Comparing the first column of Tables S1 and S2 illustrates the close correspondence between means for the full sample and the weighted matched sample. We match about 70 % of long-form respondents who live in our sample of MSAs and report working for a nonagricultural private sector employer or a state or local government, but we also match many long-form respondents who do not report an in-scope job.
Recent work by Abraham et al. (2013) used a match of the CPS data with the LEHD data to address coverage issues. They found that substantial numbers of in-scope CPS workers are not present in the LEHD data and vice versa. CPS workers who do not show up in LEHD are disproportionately likely to have low earnings, to have short job durations, and to be elderly. Controlling for such factors, immigrant status has only a modest negative effect on matching.
More precisely, we started from the list of MSAs used in Singer (2004). We drop 14 of Singer’s 45 MSAs because we do not have the data we need for those areas.
In computing the coworker share, we equally weight all coworkers regardless of whether they hold other jobs. However, the set of observations used in our regressions includes only the job where an individual received their highest earnings in that quarter (primary job).
Among natives who speak English poorly or not at all, roughly 90 % speak Spanish at home. Natives include those born in Puerto Rico (but working in our set of MSAs). Natives from Puerto Rico account for roughly 70 % of natives who report not speaking English at all and 20 % of those speaking English poorly.
Census tracts are small geographic areas with a population between 1,500 and 8,000 individuals. They are designed to be relatively homogeneous with respect to socioeconomic characteristics. The limited distance between residents of a census tract—both in terms of geography and socioeconomic factors—suggests that the likelihood of interactions among residents of the same tract is high relative to the likelihood of interactions between residents of different tracts.
In our sample, there are on average 49 employers per tract (excluding tracts that are strictly residential). Seven percent of tracts with employment have only one employer, and for those tracts, the variable is 0. Only 9 % of workers in our sample work in single-employer tracts.
This decomposition nests the well-known Oaxaca-Blinder decomposition.
Online Resource 1 contains additional analysis of differences in concentration by firm size. We find greater concentration at small firms, but this does not account for much of the overall concentration because immigrants and natives do not differ greatly in terms of the propensity to work at small businesses. Online Resource 1 also includes an analysis regarding the statistical relationship between concentration and business size.
Our list differs from the top 18 based on overall U.S. employment in that it includes Taiwan but excludes Colombia. Our ordering of countries by share of employment also differs somewhat.
The final row of Table 4 gives the average across our 18 groups for each column, which is closely related to the top rows of Table 2. Here, the sum of own- and other-country concentration is 18.3 with only MSA and country-of-origin controls, and 8.8 with the full set of controls. Both figures are slightly larger than the corresponding estimates from Table 2 (17.1 and 3.3, respectively). One reason for these differences is that Table 2 estimates include immigrants from countries with smaller populations in the United States, which tend to have lower levels of concentration.
See Online Resource 1, section A.1, for the estimating equations.