Dan Villarreal talk November 3 on auto-coding

Dr. Dan Villarreal (University of Pittsburgh) is visiting the Sociolinguistics Lab in early November. He’ll be giving a talk, open to the public, on Thursday November 3, 2022. Dan’s presentation is of special interest to us because it’s about automating analyses of large-scale datasets. As we build a corpus of Michigan speech in the MI Diaries project, we’ve been using automatic speech recognition (ASR) to speed up our transcription time, and working with MSU’s Institute for Cyber-Enabled Research (ICER) to move some of our data processing to their supercomputer.

Dr. Villarreal is also giving a talk to the SoConDi group at University of Michigan on Nov 4th, 2022, 3-4pm. If you are interested in joining that talk, please contact Yongqing Ye (yeyongqi@msu.edu) or Suzanne Wagner (wagnersu@msu.edu) for the Zoom link.

Sociolinguistic auto-coding: Applications and pitfalls

Dan Villareal, University of Pittsburgh

Time: Thursday, Nov 3, 4:30-6:15pm

Location: Wells Hall B342 and on Zoom

Zoom link:  https://msu.zoom.us/j/98418360065   Meeting ID: 984 1836 0065 passcode: sociolab.

Researchers in sociophonetics and variationist sociolinguistics have increasingly turned to computational methods to automate time-consuming research tasks such as data extraction (e.g., Fromont & Hay 2012), phonetic alignment (e.g., McAuliffe et al. 2017), and accurate vowel measurement (e.g., Barreda 2021). In this talk, I discuss the advantages and challenges of using sociolinguistic auto-coding (SLAC), a method in which machine learning classifiers assign variants to variable data (Kendall et al. 2021; McLarty, Jones & Hall 2019; Villarreal et al. 2020; Villarreal under review). 

Villarreal et al. (2020) trained random forest classifiers of two sociolinguistic variables of New Zealand English, non-prevocalic /r/ (varying between Present vs. Absent) and intervocalic medial /t/ (Voiced vs. Voiceless), using over 4,000 previously hand-coded tokens (per variable). Cross-validation revealed accuracy rates of 84.5% for /r/ and 91.8% for /t/. In addition to binary predictions, these auto-coders calculate classifier probabilities: the likelihood that a given /r/ token was Present, or a /t/ token was Voiced. In a listening experiment in which 11 phonetically trained listeners coded 60 /r/ tokens, we found a significant positive linear relationship between classifier probability and human judgments; this indicates that classifier probability successfully captures listeners’ perception of phonetically gradient rhoticity. Finally, auto-coders can report which features were most important in classification, helping to shed light on acoustically complex variables like /r/. In short, SLAC can be used for at least three specific functions: binary coding, gradient ‘coding’, and feature selection. 

Like other machine learning (ML) methods, however, there are inherent concerns about SLAC’s fairness—that is, whether it generates equally valid predictions for different speaker groups  (e.g., Koenecke et al. 2020). First, given that there are multiple definitions of ML fairness that are mutually incompatible (Berk et al. 2018; Corbett-Davies et al. 2017; Kleinberg et al. 2017), fairness metrics must be decided upon within individual research domains; I argue for three fairness metrics relevant to the domain of sociolinguistic auto-coding. Second, I re-analyze Villarreal et al.’s (2020) /r/ auto-coder for fairness; I find poor performance on all three fairness metrics, with women’s tokens coded more accurately than men’s (88.8% vs. 81.4%). Third, to remedy these imbalances, I used the same data to test a variety of unfairness-mitigation strategies from the ML fairness literature; I find substantial improvement with respect to fairness, albeit at the expense of predictive performance. 

Given these fairness issues, I reconsider SLAC under Markl’s (2022) premise that some speech and language technologies are too inherently flawed to use. I argue that while SLAC does not fit into this category, its potential users and consumers deserve a “warts and all” awareness of its drawbacks. To that end, I close with concrete recommendations for using SLAC in large-scale research projects. 

References 

Barreda, Santiago. 2021. Fast Track: fast (nearly) automatic formant-tracking using Praat. Linguistics Vanguard 7(1). https://doi.org/10.1515/lingvan-2020-0051. 

Fromont, Robert & Jennifer Hay. 2012. LaBB-CAT: An annotation store. Proceedings of Australasian Language Technology Association Workshop 113–117. 

Kendall, Tyler, Charlotte Vaughn, Charlie Farrington, Kaylynn Gunter, Jaidan McLean, Chloe Tacata & Shelby Arnson. 2021. Considering performance in the automated and manual coding of sociolinguistic variables: Lessons from variable (ING). Frontiers in Artificial Intelligence 4(43). https://doi.org/10.3389/frai.2021.648543. 

Markl, Nina. 2022. Language variation and algorithmic bias: Understanding algorithmic bias in British English automatic speech recognition. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’22), 521–534. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3531146.3533117. 

McAuliffe, Michael, Michaela Socolof, Sarah Mihuc, Michael Wagner & Morgan Sonderegger. 2017. Montreal Forced Aligner: Trainable text-speech alignment using Kaldi. In. 

McLarty, Jason, Taylor Jones & Christopher Hall. 2019. Corpus-based sociophonetic approaches to postvocalic r-lessness in African American Language. American Speech 94. https://doi.org/10.1215/00031283-7362239. 

Villarreal, Dan. under review. Sociolinguistic auto-coding has fairness problems too: Measuring and mitigating bias. Linguistics Vanguard

Villarreal, Dan, Lynn Clark, Jennifer Hay & Kevin Watson. 2020. From categories to gradience: Auto-coding sociophonetic variation with random forests. Laboratory Phonology 11(6). 1–31. https://doi.org/10.5334/labphon.216. 

Continue Reading Dan Villarreal talk November 3 on auto-coding

Colloquium talk: Dr. Annette D’Onofrio

Dr. Annette D’Onofrio is joining us to give a colloquium talk this fall! Please see details of the talk below.

Dr. Annette D’Onofrio is an Assistant Professor in the Linguistics Department at Northwestern University. She will present on her work on Chicagoland project, style, and personae.

Time: Thursday (09/15/2022) 4:30-6:15pm Eastern Time

Event: In-person and Zoom

Talk Abstract

Locating sound change reversal: Racialized and age-based patterns of the Northern Cities Shift in a Chicago community

While dialectological work once indicated that American English regional dialects were becoming increasingly disparate over time (e.g. Labov 2014), recent sociolinguistic studies are revealing the opposite trend in some regions, showing movement away from regionally distinctive language features (e.g. Prichard & Tamminga 2012, Dodsworth & Kohn 2012). Specifically, the Inland North region’s characteristic Northern Cities Vowel Shift (NCS), which had been advancing throughout the 20th century (Labov 2007), has begun to reverse its trajectory in some Inland North locales (Driscoll & Lape 2015; Wagner et al. 2016), including in Chicago (McCarthy 2011, Durian & Cameron 2019). In this talk, I explore the ways in which NCS reversal is socially conditioned in one Chicago neighborhood area. I demonstrate how both broader sociohistorical dynamics of migration and racialization, as well as highly localized oppositions and ideologies, inform patterns of vocalic change in this neighborhood.

Continue Reading Colloquium talk: Dr. Annette D’Onofrio

Talk on language choice in Ukraine

The lab’s Visiting Research Scholar, Dr. Irina Zaykoskaya, gave a talk at MSU on April 18, 2022 titled When native language is a matter of choice: The linguistic situation in Ukraine before and during the War. Irina provided some background on multilingualism in Ukraine, historical and 21st century attitudes to the Ukrainian language, and closed by discussing the phenomenon of language rejection. Anecdotal evidence suggests that since Russia’s recent invasion of Ukraine, some Ukrainians have symbolically given up speaking Russian through resistance or disgust. Irina compared this with German-speaking Holocaust refugees in the early 20th century who similarly gave up their language and in some cases lost it altogether. Irina touched on the ethics of gathering data from traumatized individuals, and cautioned that we cannot know the true linguistic situation in Ukraine at this time.

The talk was co-hosted by the MSU Sociolinguistics Lab and the MSU Language Policy and Practice Lab. It was delivered in a hybrid format. We were delighted that so many people could join via Zoom, in addition to the audience in Wells Hall. The talk abstract is below, and the slides can be found here.

abstract

Ukraine is a large and multilingual country, with Ukrainian and Russian especially dominating its linguistic landscape for decades. However, not only are the statuses of these languages different (i.e., Ukrainian being the official state language and Russian currently not having any formal status), but the attitudes towards them among the Ukrainian people differ as well. Even before the Russian attack on Ukraine on February 24, 2022, Ukrainians, including those from the Eastern, historically considered Russian-speaking parts of the country, would demonstrate symbolic preference for Ukrainian over Russian: for example, in a 2020 poll, only 21.8% of Eastern Ukrainians admitted speaking Ukrainian at home but 44.3% of the same respondents named it as their native language, which implies the view of one’s native language as a matter of choice rather than a matter of chance. Now, Russian-speaking Twitter is getting flooded by tweets like “I want lightning to strike me so that I forget the Russian language”. This talk will present an overview of historical events and policies that led to the current linguistic situation in Ukraine as compared to a few other post-Soviet countries, such as Belarus and Latvia. It will also attempt to capture the ongoing shift in attitudes among Ukrainians, from recognizing Russian as the language the enemies speak to perceiving it as the essence of the enemy.

Continue Reading Talk on language choice in Ukraine

Socio Lab meetings in Spring 2022

Once again the lab is meeting on a reduced schedule, to accommodate all of the work members are doing on the MI Diaries project. But we still have some important sessions, so we invite everyone to join us! Meetings will be held virtually via Zoom unless otherwise advertised. Please contact Dr. Suzanne Wagner (wagnersu@msu.edu) if you would like to have the Zoom details, and/or join the lab’s e-mail list, sociolab@list.msu.edu.

Here’s our line-up so far. The meetings are 3:00-4:00pm, Eastern time.

Monday, February 14th, 2022

Yongqing Ye and Adam Barnhardt. Practice talk for Illinois Language & Linguistics Society.

Monday, February 28th, 2022

Suzanne Wagner. Practice talk for CLARe 5.

Monday, March 18th, 2022

Jack Rechsteiner. Practice talk for Penn Linguistics Colloquium.

Monday, March 28th, 2022

Arlo Kaczor. MA thesis project.

Continue Reading Socio Lab meetings in Spring 2022

The interdisciplinary water cooler

Flyer for Yares and Sneller 2021 University Interdisciplinary Colloquium talk

Sociolinguistics Lab co-director Dr. Betsy Sneller will give a high-profile, university-wide talk on November 5th that is open to the public. Her co-presenter, Dr. Laura Yares, met Dr. Sneller at an informal College of Arts and Letters workshop in October 2020 about pivoting research to remote methods in response to the Covid-19 pandemic. Dr. Yares and her collaborators were looking for a way to capture participants’ reactions to a popular Netflix show, Shtisel. Upon learning about the MI Diaries project’s mobile app for self-recorded audio entries, Dr. Yares met with Dr. Sneller and co-investigator Dr. Suzanne Wagner to talk about adapting it for her project. Come and hear about this serendipitous cross-disciplinary conversation, and its broader implications, courtesy of the MSU Center for Interdisciplinarity.

Abstract

Can common research technologies serve diverse disciplinary needs? Even disciplines that seem on the surface to have little in common can benefit from casual conversations about the challenges and methods that they might share. In this talk, we show how a simple smartphone app developed for a project analyzing language during the pandemic (MI Diaries) was successfully adapted for a Religious Studies project examining learning about Judaism through the cultural arts (Shtisel Diary). By reflecting on these two case-studies we highlight how the tools that we use to conduct research can be just as interdisciplinary as research projects themselves. 

Details

Friday, November 5, 2021
12PM-1PM EDT via Zoom

Zoom Linkhttps://msu.zoom.us/j/96411904159
Passcode: msuc4i

Continue Reading The interdisciplinary water cooler

Lab meetings in Spring 2021

Once again our lab meetings will be on Monday afternoons, at the later and longer time of 4:30-6:00pm. General lab meetings, for student presentations, idea workshopping, guest speakers etc will alternate bi-weekly with a new reading group. The group’s topic will be language and age. We’ll read about the acquisition, calibration and incrementation of ongoing language changes from childhood to adolescence. We’ll also tackle post-adolescent lifespan change and age grading.

Meetings will be held on Zoom and/or Microsoft Teams. To hear further announcements, join the Socio Lab’s mailing list here. If for any reason you think you’re not getting messages, contact Dr. Suzanne Wagner, wagnersu@msu.edu.

Continue Reading Lab meetings in Spring 2021

Mohammed Ruthan defends dissertation on Saudi Arabic

Top left: Yen-Hwei Lin. Top right: Karthik Durvasula, Suzanne Wagner, Mohammed Ruthan, Modi Ruthan, Kaylin Smith. Bottom left: Brahim Chakrani. Bottom right: Yongqing Ye.

Mohammed Ruthan became the Linguistics program’s first PhD student to defend his doctoral dissertation in the new age of social distancing. His defense took place on Friday, March 13th, with just his wife, two friends and two committee members present in person, plus two committee members and various others via Zoom. It might not have been how Mohammed imagined his defense would be, but he handled it all (including various technical issues) with tremendous grace and patience. His dissertation, Aspects of Jazani Arabic, examines the phonology and phonetics of his own southwestern dialect of Saudi Arabic, as well as attitudes to the dialect. It was co-advised by Yen-Hwei Lin and Suzanne Evans Wagner, with much support from Karthik Durvasula and Brahim Chakrani. Once travel restrictions are lifted, Mohammed will return to Saudi Arabia to take up a university teaching position. Congratulations!

Continue Reading Mohammed Ruthan defends dissertation on Saudi Arabic

A lowkey presentation at American Dialect Society

A couple of summers ago, members of the Socio Lab got into a heated side-discussion about the pragmatics of adverbial lowkey, as in:

  1. I lowkey like pineapple on pizza.
  2. Lowkey I’m hoping the Cavs will lose.

There was debate about whether sentences like this were grammatical for each of us (they mostly weren’t for anyone over 30), and whether the lowkey meant ‘secret’, ‘kinda’, or a whole bunch of other things (here the group split even more finely, undergrads vs grads). Danielle Brown, an undergraduate at the time, decided to investigate further for her senior thesis. She learned that there was no published research on adverbial lowkey, but that undergraduates at two other institutions had conducted some investigations of their own. By coincidence, they were the students of MSU PhD alumni Ai Taniguchi (Carleton University) and Greg Johnson (then at Louisiana State University). Danielle built on their work and fielded a judgment survey to friends and family in her social network. Respondents were presented with sentences like (1) and (2) above, and given a list of possible adverbial substitutions for lowkey such as honestly and discourse particles such as well. Danielle discovered that when lowkey is in sentence-initial position, as in (2) above, people often selected discourse particle substititons. This aligned with an intuition expressed by some students in the lab that low key in sentence-initial position is already becoming semantically bleached, becoming similar to sentence-initial like e.g. Like I’m hoping the Cavs will lose.

After her BA graduation, Danielle teamed up with MA Linguistics student Morgan Momberg to refine her survey and field it to a much larger number of respondents. This time they considered the effect of the ‘popularity’ on the interpretation of lowkey. They presented their results in a talk titled Lowkey opinion or lowkey fact: Exploring the acceptability of sentence-initial lowkey at the annual meeting of the American Dialect Society in New Orleans in January 2020. As they report in their abstract,

The emerging adverbial use of lowkey has received little attention, especially in sentence-initial position. In a judgment survey (N=52), respondents rated the felicitousness of sentence-initial lowkey in fictional scenarios across three conditions we call ‘unpopular’, ‘popular’ and ‘factual’. As hypothesized, lowkey was most felicitous with unpopular opinions, e.g. Lowkey this lasagna tastes awful in a scenario where everyone eats lasagna, followed by popular opinions e.g. Lowkey this lasagna tastes amazing, and factual statements e.g. Lowkey everyone is eating lasagna. Our survey results suggests possible pragmatic variance in the use of sentence-initial lowkey.

Continue Reading A lowkey presentation at American Dialect Society

Yongqing Ye wins CALMS 5-minute linguist competition

Ai Taniguchi presents the prize to Yongqing Ye

Socio Lab member Yongqing Ye was the winner of yesterday’s lightning talk competition at CALMS (Careers, Alumni & Linguistics at Michigan State). Competing against students and professors, Yongqing’s talk Pointing to the past in Mandarin Chinese was a funny and easy-to-follow explanation of deictic de. Giving a five minute talk is hard enough, but giving a short talk on an abstract topic is even harder! Not only that, but Yongqing stepped in at the 11th hour when another student was unable to present as planned. The judge, Dr. Ai Taniguchi (PhD Linguistics 2017) praised Yongqing’s accessible approach. Ai herself won the 2019 Linguistic Society of America’s 5-Minute Linguist competition, and we were glad to have her expert eye on the proceedings.

Another Socio Lab member, Dr. Irina Zaykovskaya, explained How I learned to stop worrying and love the word like. Her talk got an honorable mention from Ai. Irina used an array of colorful images and lots of humor to show how people bring social judgments about e.g. “party girls” and “nerd girls” to their judgments of discourse particle like.

Irina Zaykovskaya explains in five minutes why “like” is, like…. cool!
Continue Reading Yongqing Ye wins CALMS 5-minute linguist competition

MSU Socio people at NWAV 48

Current and former Michigan State sociolinguists were recently at the NWAV 48 (New Ways of Analyzing Variation) conference, October 10-12. The Eugene, Oregon location meant that not everyone could make the long trip, but presenters included:

Former MSU Sociolinguistics students Monica Nesbitt (now a post-doc at Dartmouth College) and James Stanford were also there, along with former faculty Dennis Preston and Marisa Brook. We enjoyed a great MSU+affiliates dinner on the Friday night.

Thanks to the members of the lab who gave us valuable feedback on our practice presentations!

Continue Reading MSU Socio people at NWAV 48