Dan Villarreal talk November 3 on auto-coding

Dr. Dan Villarreal (University of Pittsburgh) is visiting the Sociolinguistics Lab in early November. He’ll be giving a talk, open to the public, on Thursday November 3, 2022. Dan’s presentation is of special interest to us because it’s about automating analyses of large-scale datasets. As we build a corpus of Michigan speech in the MI Diaries project, we’ve been using automatic speech recognition (ASR) to speed up our transcription time, and working with MSU’s Institute for Cyber-Enabled Research (ICER) to move some of our data processing to their supercomputer.

Dr. Villarreal is also giving a talk to the SoConDi group at University of Michigan on Nov 4th, 2022, 3-4pm. If you are interested in joining that talk, please contact Yongqing Ye (yeyongqi@msu.edu) or Suzanne Wagner (wagnersu@msu.edu) for the Zoom link.

Sociolinguistic auto-coding: Applications and pitfalls

Dan Villareal, University of Pittsburgh

Time: Thursday, Nov 3, 4:30-6:15pm

Location: Wells Hall B342 and on Zoom

Zoom link:  https://msu.zoom.us/j/98418360065   Meeting ID: 984 1836 0065 passcode: sociolab.

Researchers in sociophonetics and variationist sociolinguistics have increasingly turned to computational methods to automate time-consuming research tasks such as data extraction (e.g., Fromont & Hay 2012), phonetic alignment (e.g., McAuliffe et al. 2017), and accurate vowel measurement (e.g., Barreda 2021). In this talk, I discuss the advantages and challenges of using sociolinguistic auto-coding (SLAC), a method in which machine learning classifiers assign variants to variable data (Kendall et al. 2021; McLarty, Jones & Hall 2019; Villarreal et al. 2020; Villarreal under review). 

Villarreal et al. (2020) trained random forest classifiers of two sociolinguistic variables of New Zealand English, non-prevocalic /r/ (varying between Present vs. Absent) and intervocalic medial /t/ (Voiced vs. Voiceless), using over 4,000 previously hand-coded tokens (per variable). Cross-validation revealed accuracy rates of 84.5% for /r/ and 91.8% for /t/. In addition to binary predictions, these auto-coders calculate classifier probabilities: the likelihood that a given /r/ token was Present, or a /t/ token was Voiced. In a listening experiment in which 11 phonetically trained listeners coded 60 /r/ tokens, we found a significant positive linear relationship between classifier probability and human judgments; this indicates that classifier probability successfully captures listeners’ perception of phonetically gradient rhoticity. Finally, auto-coders can report which features were most important in classification, helping to shed light on acoustically complex variables like /r/. In short, SLAC can be used for at least three specific functions: binary coding, gradient ‘coding’, and feature selection. 

Like other machine learning (ML) methods, however, there are inherent concerns about SLAC’s fairness—that is, whether it generates equally valid predictions for different speaker groups  (e.g., Koenecke et al. 2020). First, given that there are multiple definitions of ML fairness that are mutually incompatible (Berk et al. 2018; Corbett-Davies et al. 2017; Kleinberg et al. 2017), fairness metrics must be decided upon within individual research domains; I argue for three fairness metrics relevant to the domain of sociolinguistic auto-coding. Second, I re-analyze Villarreal et al.’s (2020) /r/ auto-coder for fairness; I find poor performance on all three fairness metrics, with women’s tokens coded more accurately than men’s (88.8% vs. 81.4%). Third, to remedy these imbalances, I used the same data to test a variety of unfairness-mitigation strategies from the ML fairness literature; I find substantial improvement with respect to fairness, albeit at the expense of predictive performance. 

Given these fairness issues, I reconsider SLAC under Markl’s (2022) premise that some speech and language technologies are too inherently flawed to use. I argue that while SLAC does not fit into this category, its potential users and consumers deserve a “warts and all” awareness of its drawbacks. To that end, I close with concrete recommendations for using SLAC in large-scale research projects. 

References 

Barreda, Santiago. 2021. Fast Track: fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard 7(1). https://doi.org/10.1515/lingvan-2020-0051. 

Fromont, Robert & Jennifer Hay. 2012. LaBB-CAT: An annotation store. Proceedings of Australasian Language Technology Association Workshop 113–117. 

Kendall, Tyler, Charlotte Vaughn, Charlie Farrington, Kaylynn Gunter, Jaidan McLean, Chloe Tacata & Shelby Arnson. 2021. Considering performance in the automated and manual coding of sociolinguistic variables: Lessons from variable (ING). Frontiers in Artificial Intelligence 4(43). https://doi.org/10.3389/frai.2021.648543. 

Markl, Nina. 2022. Language variation and algorithmic bias: Understanding algorithmic bias in British English automatic speech recognition. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22), 521–534. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3531146.3533117. 

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner & Morgan Sonderegger. 2017. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. In. 

McLarty, Jason, Taylor Jones & Christopher Hall. 2019. Corpus-based sociophonetic approaches to postvocalic r-lessness in African American Language. American Speech 94. https://doi.org/10.1215/00031283-7362239. 

Villarreal, Dan. under review. Sociolinguistic auto-coding has fairness problems too: Measuring and mitigating bias. Linguistics Vanguard

Villarreal, Dan, Lynn Clark, Jennifer Hay & Kevin Watson. 2020. From categories to gradience: Auto-coding sociophonetic variation with random forests. Laboratory Phonology 11(6). 1–31. https://doi.org/10.5334/labphon.216. 

Continue Reading Dan Villarreal talk November 3 on auto-coding

MI Diaries app gets NEH grant to go open-source

We are delighted to announce that Dr. Betsy Sneller, Assistant Professor of Linguistics and co-Director of the Sociolinguistics Lab, was awarded a $99,908 grant from the National Endowment for the Humanities (NEH) Digital Humanities Advancement Grant (DHAG) program. The new project, “Building and Disseminating an App for Ethnographic Remote Audio Recording”, is an innovative extension of the MI Diaries project. The goal is to provide other researchers with a convenient and accessible method of collecting speech data. In order to do that, Dr. Sneller’s team will develop an open-source code that anyone would be able to use to create a self-recording mobile app for their project. 

The inspiration for the project came from the successful adaptation of the MI Diaries app for the study of Judaism through cultural arts led by Laura Yares, Assistant Professor of Religious Studies at MSU, who will serve on the advisory council for the DHAG grant. Co-Director of the Sociolinguistics Lab, Dr. Suzanne Evans Wagner, is also a faculty advisor to the project.

Continue Reading MI Diaries app gets NEH grant to go open-source

The interdisciplinary water cooler

Flyer for Yares and Sneller 2021 University Interdisciplinary Colloquium talk

Sociolinguistics Lab co-director Dr. Betsy Sneller will give a high-profile, university-wide talk on November 5th that is open to the public. Her co-presenter, Dr. Laura Yares, met Dr. Sneller at an informal College of Arts and Letters workshop in October 2020 about pivoting research to remote methods in response to the Covid-19 pandemic. Dr. Yares and her collaborators were looking for a way to capture participants’ reactions to a popular Netflix show, Shtisel. Upon learning about the MI Diaries project’s mobile app for self-recorded audio entries, Dr. Yares met with Dr. Sneller and co-investigator Dr. Suzanne Wagner to talk about adapting it for her project. Come and hear about this serendipitous cross-disciplinary conversation, and its broader implications, courtesy of the MSU Center for Interdisciplinarity.

Abstract

Can common research technologies serve diverse disciplinary needs? Even disciplines that seem on the surface to have little in common can benefit from casual conversations about the challenges and methods that they might share. In this talk, we show how a simple smartphone app developed for a project analyzing language during the pandemic (MI Diaries) was successfully adapted for a Religious Studies project examining learning about Judaism through the cultural arts (Shtisel Diary). By reflecting on these two case-studies we highlight how the tools that we use to conduct research can be just as interdisciplinary as research projects themselves. 

Details

Friday, November 5, 2021
12PM-1PM EDT via Zoom

Zoom Linkhttps://msu.zoom.us/j/96411904159
Passcode: msuc4i

Continue Reading The interdisciplinary water cooler

Taylor Swift’s use of tentative speech

Credit: Pinterest user costryme

Students in LIN 471 Sociolinguistics conduct original research projects on style-shifting by a public figure. Abby Jarosziewicz, an English major with a concentration in Pop Culture, submitted her project on Taylor Swift in Fall 2019, and continued it as an Honors Option in Spring 2020.

Abby examined Swift’s use of “tentative speech”, first labeled by Robin Lakoff (1975) in the seminal book Language and Women’s Place. Lakoff identified numerous examples of hesitant or tentative speech, from which Abby chose two: hedges (e.g. “that was kind of rude”) and disclaimers (e.g. “I think that….”). The questions she asked were:

  • Does Taylor Swift’s overall use of tentative speech decrease over time as she grows in maturity, confidence and relevance?
  • Does Taylor Swift consistently use more tentative speech with male interviewers over time?

Abby found in her fall pilot project that Swift used more tentative speech with men at a single point her career. She hypothesized that this would remain the same throughout her career, because Swift’s power relationship with men has largely not changed. Abby also hypothesized, however, that overall Swift would use less and less tentative speech over time.

To test her hypotheses, Abby selected 12 video interviews conducted for 6 album release press tours (Taylor Swift, Fearless, Speak Now, Red, 1989, Lover) from 2006 to 2019. For each album, one interview was conducted with a male interviewer and one with a female interviewer. 11 of 12 interviewers were white; interviewers were aged 30-65. Abby extracted from the videos every hedge and disclaimer, and calculated their frequency per minute of Swift’s total talk time.

Abby’s hypotheses were upheld. Swift’s overall rate of tentative speech declined across the press tours, from 1.5 per minute during the Taylor Swift launch, to 0.9 during the Lover launch. And at every time point except one, Swift uses more tentative language with the male interviewer than with the female interviewer. The exception is the press tour for Red, in which tentative speech peaks with both interviewer genders, exceeding even the rate for Taylor Swift, at 1.9 tokens/minute.

This study seems to support a narrative in the media about Taylor’s Swift’s growing comfort with public feminism, legal agency and political influence. Nonetheless, more controlled research is required for the findings to be confirmed. Abby points out that there are confounds in the data, such as inconsistency in the ages, ethnicity and familiarity of the interviewers; presence vs absence of a studio audience; and inconsistencies in the amount of talk time per interview and per time point.

Nonetheless, this was a great example of a student taking a class project a step further and asking new questions. Thanks for allowing us to share your results, Abby!

Continue Reading Taylor Swift’s use of tentative speech

MI Diaries project

Life in Michigan has changed very quickly for many people over the past few months. The MI Diaries project aims to document what life is like in Michigan during the Covid-19 pandemic and afterwards, as we move back into normal life.

We are interested in all aspects of how life is changing for Michiganders, from their daily routines to their language. We are looking for Michigan residents interested in submitting periodic oral history recordings during this time – if you are interested in learning more or in participating, please go to our project website: mi-diaries.org.

Diarists earn a $5 Amazon gift card if they record 15+ minutes of audio in a two week period. Or, you can opt to pay it forward and donate your card to another participant.

You can also visit for more information, or find us on Facebook, Twitter and Instagram. Please share this information widely!

The Principal Investigator for the MI Diaries project is Dr. Betsy Sneller, in collaboration with Dr. Suzanne Evans Wagner and students in the MSU Sociolinguistics Lab.

MI Diaries is sponsored by National Science Foundation grant BCS 21199975. Logo of the National Science Foundation

Continue Reading MI Diaries project

Welcome to Betsy Sneller!

The Linguistics program at Michigan State University has hired a new Assistant Professor of sociolinguistics, Dr. Betsy Sneller. Welcome, Betsy!

Betsy’s research seeks to understand the mechanisms of language variation and language change. She’s especially interested in children’s acquisition of phonological variation, including its sociolinguistic patterns, and more generally in how individuals mentally represent and reproduce phonological changes occurring in their speech communities. Her work has employed an unusually broad range of methods, from ethnography to experiments to computational modeling. She has published multiple times in Language Variation and Change, as well as in Language Dynamics and Change and Cognition.

Betsy Sneller

Betsy received her PhD in Linguistics from the University of Pennsylvania in 2018. Her primary advisor was William Labov, and her committee members included Meredith Tamminga and Josef Fruehwald. During her time at Penn, Betsy also collaborated and co-published with Gareth Roberts and Charles Yang, among others. For the last two years, Betsy has been a post-doctoral scholar in Elissa Newport‘s Learning and Development Lab at Georgetown University. She will join Michigan State University in August 2020.

A native of Holland, MI, Betsy is looking forward to collecting and analyzing speech data in her home state. Her MA thesis (2012, University of Essex), was titled “Aw man! The effect of hometown affiliation on NCS shifting in Holland, Michigan”. Betsy then carried out ethnographic, corpus and experimental research in Philadelphia. Some of the publications resulting from this effort include “Phonological rule spreading across hostile lines” (just published in Language Variation and Change) and “Competing systems in Philadelphia phonology” (also in LVC, with William Labov and other co-authors). With Gareth Roberts, Betsy has conducted artificial language learning experiments to test sociolinguistic predictions (“Why some behaviors spread while others don’t“), and she has continued to use this paradigm with children in her Georgetown-based research.

We look forward to welcoming Betsy to the Sociolinguistics Lab later this year!

Continue Reading Welcome to Betsy Sneller!

Mohammed Ruthan defends dissertation on Saudi Arabic

Top left: Yen-Hwei Lin. Top right: Karthik Durvasula, Suzanne Wagner, Mohammed Ruthan, Modi Ruthan, Kaylin Smith. Bottom left: Brahim Chakrani. Bottom right: Yongqing Ye.

Mohammed Ruthan became the Linguistics program’s first PhD student to defend his doctoral dissertation in the new age of social distancing. His defense took place on Friday, March 13th, with just his wife, two friends and two committee members present in person, plus two committee members and various others via Zoom. It might not have been how Mohammed imagined his defense would be, but he handled it all (including various technical issues) with tremendous grace and patience. His dissertation, Aspects of Jazani Arabic, examines the phonology and phonetics of his own southwestern dialect of Saudi Arabic, as well as attitudes to the dialect. It was co-advised by Yen-Hwei Lin and Suzanne Evans Wagner, with much support from Karthik Durvasula and Brahim Chakrani. Once travel restrictions are lifted, Mohammed will return to Saudi Arabia to take up a university teaching position. Congratulations!

Continue Reading Mohammed Ruthan defends dissertation on Saudi Arabic

MSU Socio people at NWAV 48

Current and former Michigan State sociolinguists were recently at the NWAV 48 (New Ways of Analyzing Variation) conference, October 10-12. The Eugene, Oregon location meant that not everyone could make the long trip, but presenters included:

Former MSU Sociolinguistics students Monica Nesbitt (now a post-doc at Dartmouth College) and James Stanford were also there, along with former faculty Dennis Preston and Marisa Brook. We enjoyed a great MSU+affiliates dinner on the Friday night.

Thanks to the members of the lab who gave us valuable feedback on our practice presentations!

Continue Reading MSU Socio people at NWAV 48

SLA meets LVC: Second language acquisition of sociolinguistic variation at SLRF conference

Irina Zaykovskaya (PhD 2019) and Suzanne Evans Wagner are co-convening a colloquium at this week’s Second Language Research Forum (SLRF) conference, hosted by Michigan State University’s Second Language Studies program. The colloquium, held on Friday, September 20th, is titled: Catching interlanguage in action: When SLA meets language variation and changeThe goal is to bring together researchers who study second language acquisition of sociolinguistic variation, using quantitative (and often also qualitative) methods.

Irina’s PhD studies were in the Second Language Studies program, but she took a graduate course in sociolinguistics with Suzanne in 2014, and subsequently decided to take a variationist sociolinguistic approach to her work. Suzanne became her co-advisor, and Irina defended her dissertation (on L2 acquisition of US English vernacular like) in 2019. Researchers like Irina, who work at the interface of SLA and LVC, are still quite rare. SLRF seemed to be a good opportunity to inform other SLA scholars about the insights afforded by LVC approaches. To further support this initiative, Irina has created an online resource hub for people interested in SLA+LVC.

The other panelists include Xiaoshi Li (MSU), Kimberley Geeslin (Indiana University-Bloomington) and Matthew Kanwit (University of Pittsburgh). 

Continue Reading SLA meets LVC: Second language acquisition of sociolinguistic variation at SLRF conference

Rural fieldwork on display at MSU undergraduate conference UURAF

On April 5th, undergraduate sociolinguists Jared Kaczor and Travis Coppernoll presented their poster Football, Church and Free Breakfast: Doing Sociolinguistic Research in Rural Communities Around Lansing at the 2019 Michigan State University Undergraduate Research and Arts Forum (UURAF). The project, which has been running since August, focuses on two small communities in a rural part of mid-Michigan. Jared and Travis have been developing an ethnography via trips to football games, church coffee mornings and local cafés. They have just begun to record sociolinguistic interviews with residents. The goal of the project is to compare rural speech with the Sociolinguistics Lab’s existing corpus of urban speech.

Continue Reading Rural fieldwork on display at MSU undergraduate conference UURAF