Language is a universal human ability, acquired readily by young children, who otherwise struggle with many basics of survival. And yet, language ability is variable across individuals. Naturalistic and experimental observations suggest that children’s linguistic skills vary with factors like socioeconomic status and children’s gender. But which factors really influence children’s day-to-day language use? Here, we leverage speech technology in a big-data approach to report on a unique cross-cultural and diverse data set: >2,500 d-long, child-centered audio-recordings of 1,001 2- to 48-mo-olds from 12 countries spanning six continents across urban, farmer-forager, and subsistence-farming contexts. As expected, age and language-relevant clinical risks and diagnoses predicted how much speech (and speech-like vocalization) children produced. Critically, so too did adult talk in children’s environments: Children who heard more talk from adults produced more speech. In contrast to previous conclusions based on more limited sampling methods and a different set of language proxies, socioeconomic status (operationalized as maternal education) was not significantly associated with children’s productions over the first 4y of life, and neither were gender or multilingualism. These findings from large-scale naturalistic data advance our understanding of which factors are robust predictors of variability in the speech behaviors of young learners in a wide range of everyday contexts.
Harnessing a global sample of >40,000 h of child-centered audio capturing young children’s home environment, we measured contributors to how much speech 0- to 4-y-olds naturally produce. Amount of adult talk, age, and normative development were the sole significant predictors; child gender, socioeconomic status, and multilingualism did not explain how often children vocalized or how much adult talk they heard. These findings (strengthened by our validation of existing automated speech algorithms) open up interesting conversations regarding early language development to the broader public, including parents, clinicians, educators, and policymakers. The factors explaining variance also inform our understanding of humans’ unique capacity for learning and potentially large-scale applications of machine technology to everyday human behavior.