Skip to content

Commit 07c8bf6

Browse files
committed
blog fixes
1 parent a063efb commit 07c8bf6

File tree

4 files changed

+180
-51
lines changed

4 files changed

+180
-51
lines changed

content/blog/language-inspired-approaches-phoneme-classification.md

Lines changed: 41 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -47,13 +47,27 @@ citations:
4747
year: 2025
4848
url: "https://arxiv.org/abs/2506.02098"
4949
bibtex: "@article{ozdogan2025libribrain,\n title={LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale},\n author={Özdogan, Miran and Landau, Gilad and Elvers, Gereon and Jayalath, Dulhan and Somaiya, Pranav and Mantegna, Francesco and Woolrich, Mark and Parker Jones, Oiwi},\n journal={arXiv preprint arXiv:2506.02098},\n year={2025}\n}"
50+
- id: "mantegna2025braininsp"
51+
title: "Brain-Inspired Approaches to Speech Detection"
52+
authors: ["Francesco Mantegna", "Gereon Elvers", "Oiwi Parker Jones"]
53+
journal: "PNPL Blog"
54+
year: 2025
55+
url: "https://neural-processing-lab.github.io/2025-libribrain-competition/blog/brain-inspired-approaches-speech-detection"
56+
bibtex: "@misc{mantegna2025brainInspired,\n title={Brain-Inspired Approaches to Speech Detection},\n author={Mantegna, Francesco and Elvers, Gereon and Parker Jones, Oiwi},\n year={2025},\n url={https://neural-processing-lab.github.io/2025-libribrain-competition/blog/brain-inspired-approaches-speech-detection},\n note={Blog post}\n}"
57+
- id: "landau2025speechref"
58+
title: "The Speech Detection task and the reference model"
59+
authors: ["Gilad Landau", "Gereon Elvers", "Miran Özdogan", "Oiwi Parker Jones"]
60+
journal: "PNPL Blog"
61+
year: 2025
62+
url: "https://neural-processing-lab.github.io/2025-libribrain-competition/blog/speech-detection-reference-model"
63+
bibtex: "@misc{landau2025speechref,\n title={The Speech Detection task and the reference model},\n author={Landau, Gilad and Elvers, Gereon and Özdogan, Miran and Parker Jones, Oiwi},\n year={2025},\n url={https://neural-processing-lab.github.io/2025-libribrain-competition/blog/speech-detection-reference-model},\n note={Blog post}\n}"
5064
---
5165

5266
### **Introduction**
5367

5468
In the 2025 PNPL Competition ([Landau et al. 2025](https://arxiv.org/abs/2506.10165)), phoneme classification is presented as a categorical problem—given neural signals, predict which of the 39 ARPABET phonemes was heard. In [a previous blog](https://neural-processing-lab.github.io/2025-libribrain-competition/blog/brain-inspired-approaches-speech-detection/), we suggested some neuroscience-inspired ideas for the speech detection task. Here, we suggest linguistics-inspired ideas for phoneme classification.
5569

56-
##### **The ARPABET Phoneme Set**
70+
### **The ARPABET Phoneme Set**
5771

5872
Before exploring the idea of classifying phonetic features, let's establish the complete ARPABET inventory we're working with:
5973

@@ -82,11 +96,11 @@ Before exploring the idea of classifying phonetic features, let's establish the
8296

8397
**Total: 39 phonemes** (10 vowels + 5 diphthongs + 24 consonants)
8498

85-
##### **Why Phonetic Features Matter**
99+
### **Why Phonetic Features Matter**
86100

87101
Phonetic features offer several compelling advantages over direct phoneme classification, particularly for MEG data where training examples may be limited:
88102

89-
### **1. Data Efficiency Through Shared Structure**
103+
#### **1. Data Efficiency Through Shared Structure**
90104

91105
Consider these phoneme pairs and their shared features:
92106

@@ -96,7 +110,7 @@ Consider these phoneme pairs and their shared features:
96110

97111
Instead of learning 39 independent phoneme categories, the model can learn combinations of ~30 features. This shared structure enables transfer learning—knowledge about [voicing] learned from /p/ vs /b/ pairs transfers to help distinguish /s/ vs /z/, /f/ vs /v/, and other voicing contrasts. In addition, the model is able to learn a more abstract representation of speech, structured around how the phonemes are articulated in the mouth, which could also benefit the classification of other phonemes such as those not in the English language without further training data.
98112

99-
### **2. Handling Low-Frequency Phonemes**
113+
#### **2. Handling Low-Frequency Phonemes**
100114

101115
Some phonemes occur infrequently in speech corpora. Looking at the actual distribution from the LibriBrain dataset ([Özdogan et al. 2025](https://arxiv.org/abs/2506.02098)), phoneme frequencies vary dramatically:
102116

@@ -120,23 +134,23 @@ Direct classification might struggle with limited training data for rare phoneme
120134

121135
The model can then recognise /ŋ/ as the intersection of [nasal] + [velar] + [voiced], even with limited direct /ŋ/ examples.
122136

123-
### **3. Biological Plausibility**
137+
#### **3. Biological Plausibility**
124138

125139
Neurophysiological evidence suggests the brain may encode speech in terms of articulatory features rather than whole phonemes (e.g. [Mesgarani et al. 2014](https://pmc.ncbi.nlm.nih.gov/articles/PMC4350233/)). MEG signals might naturally align with feature-based representations, potentially improving classification accuracy.
126140

127-
### **4. Graceful Degradation**
141+
#### **4. Graceful Degradation**
128142

129143
When predictions are uncertain, feature-based models provide partial information. Instead of a completely wrong phoneme prediction, you might get the correct manner of articulation ([fricative]) even if the place of articulation ([alveolar] vs [postalveolar]) is incorrect.
130144

131-
##### **Universal Phonetic Features: The IPA Foundation**
145+
### **Universal Phonetic Features: The IPA Foundation**
132146

133147
The International Phonetic Alphabet (IPA) provides a systematic framework for describing speech sounds based on how they are articulated.
134148

135149
![IPA consonant chart](/blog/blog4-language-inspired-approaches-phoneme-classification/blog4-picture1.png)
136150

137151
*Figure: The IPA consonant chart shows all known and even possible consonants in human languages produced with air from the lungs (from [Wikipedia](https://en.wikipedia.org/wiki/International_Phonetic_Alphabet#/media/File:IPA_chart_2020.svg), accessed 21 September 2025)*
138152

139-
### **Consonant Features**
153+
#### **Consonant Features**
140154

141155
For the 24 consonants in English, we can define features based on three primary dimensions:
142156

@@ -170,7 +184,7 @@ For the 24 consonants in English, we can define features based on three primary
170184
- **Voiced**: vocal cords vibrate during production
171185
- **Voiceless**: no vocal cord vibration
172186

173-
### **Vowel Features**
187+
#### **Vowel Features**
174188

175189
For the 10 vowels, the core IPA features include:
176190

@@ -184,7 +198,7 @@ This finer-grained system that specifies multiple degrees of height, for example
184198

185199
*Figure: Positions in the IPA vowel chart correspond to tongue position (from [blog](https://www.languagejones.com/blog-1/2016/12/24/why-the-international-phonetic-alphabet-ipa-is-the-best-thing-ever), accessed 21 September 2025)*
186200

187-
### **Diphthong Features**
201+
#### **Diphthong Features**
188202

189203
For diphthongs, ARPABET uses the following digraphs (AY, AW, EY, OY, OW). Diphthongs can be approximated by a sequence of two other IPA symbols for the English monophthong vowels, each with their own articulatory features:
190204

@@ -198,11 +212,11 @@ These differ from English monophthongs like **IH** /ɪ/, which map to single IPA
198212

199213
Incidentally, some analyses of English vowels prefer features like **tenseness** (tense vs lax) to distinguish vowels like /i/, /o/, /u/ from /ɪ/, /ɔ/, /ʊ/. This is not an IPA feature but rather represents a language-specific categorisation that correlates in the IPA with the more precise height and backness distinctions above.
200214

201-
##### **Complete IPA-Based Feature Set for ARPABET**
215+
### **Complete IPA-Based Feature Set for ARPABET**
202216

203217
Here's a template binary feature matrix for the ARPABET phonemes:
204218

205-
### **Consonants**
219+
#### **Consonants**
206220

207221
| Phoneme | Bilabial | Labiodental | Dental | Alveolar | Post-alveolar | Palatal | Labial-velar | Velar | Glottal | Plosive | Fricative | Affricate | Nasal | Liquid | Glide | Voiced |
208222
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
@@ -231,7 +245,7 @@ Here's a template binary feature matrix for the ARPABET phonemes:
231245
| /w/ | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
232246
| /j/ | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 |
233247

234-
### **Vowels (Monophthongs)**
248+
#### **Vowels (Monophthongs)**
235249

236250
| Phoneme | Close | Near-close | Close-mid | Mid | Open-mid | Near-open | Open | Front | Near-front | Central | Near-back | Back | Rounded |
237251
| ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- | ----- |
@@ -252,11 +266,11 @@ This means that there are no separate entries for diphthongs which solves the pr
252266

253267
Note that affricates like /dʒ/ could also be separated into simpler phonemes (/d/ and /ʒ/), but the same difficulty with assigning features to them does not arise.
254268

255-
##### **Alternative Feature Sets**
269+
### **Alternative Feature Sets**
256270

257271
While IPA-based features provide a solid foundation based on articulatory features, it is not the only option. For example, sets that mix articulatory features (as in the IPA) with other kinds of features can be seen in prior studies of the brain.
258272

259-
### **Mixed Feature Sets in Neuroscience**
273+
#### **Mixed Feature Sets in Neuroscience**
260274

261275
The influential work by [Mesgarani et al. (2014)](https://pmc.ncbi.nlm.nih.gov/articles/PMC4350233/) used surgical cortical recordings from human superior temporal gyrus (STG) during natural speech listening to demonstrate that individual brain sites show selectivity to distinct phonetic features rather than whole phonemes. Their study used a mixed feature set combining the following:
262276

@@ -272,7 +286,7 @@ With this combination of feature sets, they found that:
272286

273287
One of the limitations, however, with the Mesgarani study was that they used an incomplete set of 14 phonetic features that do not distinguish all English phonemes. To be clear, 14 binary features could be enough to distinguish 39 phonemes, as we explain below. But the choice of specific features used in the study does not end up separating all phonemes. So additional features would be needed to produce a complete classification system - though as we note near the conclusion, there are probably benefits even for the use of partial feature sets.
274288

275-
##### **The Mathematics of Feature Space**
289+
### **The Mathematics of Feature Space**
276290

277291
How many binary features do we need to uniquely represent n phonemes? The theoretical minimum follows from information theory:
278292

@@ -282,7 +296,7 @@ For 39 ARPABET phonemes: ⌈log₂(39)⌉ = ⌈5.29⌉ = 6 binary features
282296

283297
However, this assumes optimal encoding. Linguistically-motivated features typically require more dimensions for interpretability and biological plausibility.
284298

285-
### **Why Binary Features?**
299+
#### **Why Binary Features?**
286300

287301
By convention, linguists often encode phonetic properties as binary features (present=1, absent=0). Features like [±voiced] or [±nasal] naturally divide phonemes into two groups. From a machine learning perspective, binary features also allow us to perform binary classification, which has numerous benefits including simplifying the hypothesis space for the model, making training more efficient and inference more robust, and avoiding the need to impose arbitrary ordinal relationships between categories.
288302

@@ -303,7 +317,7 @@ With the phoneme **IH** /ɪ/, we can attribute the following features:
303317

304318
The rest of the vowel phoneme features can be assigned in a similar way. The full feature attribution can be seen above in the full IPA-based binary feature matrix.
305319

306-
### **Diphthong Challenge**
320+
#### **Diphthong Challenge**
307321

308322
Diphthongs like /aɪ/ move between two vowel targets, making single feature assignment problematic. We present here several ways to model diphthongs with their pros and cons:
309323

@@ -337,15 +351,15 @@ We can use higher-level (more abstract) phonological distinctions. For example,
337351
**Pros**: Captures diphthong-specific properties, linguistically motivated
338352
**Cons**: Requires domain knowledge, may miss low-level articulatory details
339353

340-
##### **Brute Force Feature Discovery**
354+
### **Brute Force Feature Discovery**
341355

342356
Since the "correct" feature set for neural representation is unknown, we can systematically search the feature space:
343357

344-
### **Approach 1: Exhaustive Binary Search**
358+
#### **Approach 1: Exhaustive Binary Search**
345359

346360
For k binary features representing n phonemes, we have a total of 2^(n×k) possible feature assignments. With the constraint that each phoneme must have a unique feature vector (i.e. considering only assignments where each of the n assigned feature vectors are distinct from each other), we can train a classifier to map MEG data to the assigned binary features and measure how accurately it predicts the binary features from the neural input. The set of binary feature vectors with the highest model accuracy would be the best set to use. However, although this approach is tractable for small k, it grows exponentially which may be computationally infeasible for larger k.
347361

348-
### **Approach 2: Evolutionary Search**
362+
#### **Approach 2: Evolutionary Search**
349363

350364
A similar but less exhaustive approach involves modifying the best-performing feature sets and evaluating model performance on the mutated feature sets. The following steps can be taken for the evolutionary search approach:
351365

@@ -355,7 +369,7 @@ A similar but less exhaustive approach involves modifying the best-performing fe
355369
4. Apply mutation/crossover to generate new candidates
356370
5. Repeat until convergence
357371

358-
##### **General Implementation Strategy**
372+
### **General Implementation Strategy**
359373

360374
Here, we provide some general implementation tips which could help with the development of robust and accurate models for phoneme classification.
361375

@@ -369,11 +383,11 @@ Here, we provide some general implementation tips which could help with the deve
369383
5. **Search Strategically**: Use evolutionary or gradient methods to discover novel feature combinations
370384
6. **Validate Interpretability**: Ensure discovered features have linguistic or neurobiological interpretation
371385

372-
##### **Practical Implementation for the Competition**
386+
### **Practical Implementation for the Competition**
373387

374388
The competition task requires mapping from brain data to probability distributions over all 39 ARPABET phonemes. However, you can implement feature-based classification internally while still meeting this requirement through conversion:
375389

376-
### **Feature-to-Phoneme Conversion Pipeline**
390+
#### **Feature-to-Phoneme Conversion Pipeline**
377391

378392
1. **Train feature classifiers**: Build separate binary classifiers for each phonetic feature (e.g. [voiced], [fricative], [front])
379393
2. **Predict feature probabilities**: For each input, obtain probability estimates for all features
@@ -382,7 +396,7 @@ The competition task requires mapping from brain data to probability distributio
382396
- **Learned mapping**: Train a secondary classifier to map from feature space to phoneme space
383397
- **Probabilistic matching**: Use Bayesian inference to compute P(phoneme|features)
384398

385-
### **Example Conversion Methods**
399+
#### **Example Conversion Methods**
386400

387401
**Distance-based approach:**
388402

@@ -435,15 +449,15 @@ def bayesian_feature_mapping(feature_probs: np.ndarray,
435449
return phoneme_probs
436450
```
437451

438-
### **Incremental Strategy**
452+
#### **Incremental Strategy**
439453

440454
Since conversion back to phonemes is always possible, you don't need a complete feature set:
441455

442456
1. **Start with consonants only**: Implement features for the 24 consonants (manner, place, voicing), use direct classification for vowels/diphthongs
443457
2. **Add vowel subsets**: Gradually incorporate vowel features (height, backness, rounding) as you refine the approach
444458
3. **Handle diphthongs last**: These are the most complex; initially treat them as single units or use simplified approximations
445459

446-
### **Potential Benefits of Partial Feature Implementation**
460+
#### **Potential Benefits of Partial Feature Implementation**
447461

448462
- **Reduced complexity**: Focus on phoneme classes where features are most clear-cut (consonants)
449463
- **Faster iteration**: Test feature-based approaches without solving all edge cases upfront
@@ -478,14 +492,3 @@ Landau, G., Özdogan, M., Elvers, G., Mantegna, F., Somaiya, P., Jayalath, D., K
478492

479493
Özdogan, M., Landau, G., Elvers, G., Jayalath, D., Somaiya, P., Mantegna, F., Woolrich, M., & Parker Jones, O. (2025). LibriBrain: Over 50 Hours of Within-Subject MEG to Improve Speech Decoding Methods at Scale. NeurIPS, Datasets & Benchmarks Track. [https://arxiv.org/abs/2506.02098](https://arxiv.org/abs/2506.02098)
480494

481-
### Citation
482-
483-
```bibtex
484-
@misc{pnpl_blog2025phoneme_ideas,
485-
title={Language-Inspired Approaches to Phoneme Classification},
486-
author={Kwon, Teyun and Cho, SungJun and Elvers, Gereon and Mantegna, Francesco and Parker Jones, Oiwi},
487-
year={2025},
488-
url={https://neural-processing-lab.github.io/2025-libribrain-competition/blog/language-inspired-approaches-phoneme-classification},
489-
note={Blog post}
490-
}
491-
```

0 commit comments

Comments
 (0)