Computing surprisal measures based on probabilities of preceding items

mkachlicka

Dear CNSP Community!

I've challenged myself to look at representations of phonological and lexical surprisal in continuous speech and started searching the literature and code repos for useful resources that would help me with this task.

Surprisal would be based on the probability of a phoneme/word given the preceding phonemes or words, so having established these probabilities, it should be a reasonably straightforward task (I think).

But there is also a question of which model to use to estimate these probabilities.

Can you please recommend some resources, repos or papers discussing a similar approach (in addition to Brodbeck et al.'s 2022 eLife paper on local vs global representations) or any other materials that would help me set up the feature extraction pipeline?