|
| 1 | +^{:kindly/hide-code true |
| 2 | + :clay {:title "DSP Study Group - Reading audio data from WAV-files" |
| 3 | + :quarto {:author [:daslu :onbreath] |
| 4 | + :description "Exploring WAV-files for DSP in Clojure." |
| 5 | + :category :clojure |
| 6 | + :type :post |
| 7 | + :date "2025-11-09" |
| 8 | + :tags [:dsp :math :music] |
| 9 | + :image "wav.png" |
| 10 | + :draft true}}} |
| 11 | +(ns dsp.wav-files |
| 12 | + (:require [scicloj.kindly.v4.kind :as kind] |
| 13 | + [clojure.java.io :as io] |
| 14 | + [tech.v3.datatype.functional :as dfn] |
| 15 | + [tablecloth.api :as tc] |
| 16 | + [scicloj.tableplot.v1.plotly :as plotly]) |
| 17 | + (:import (javax.sound.sampled AudioFileFormat |
| 18 | + AudioInputStream |
| 19 | + AudioSystem) |
| 20 | + (java.io InputStream) |
| 21 | + (java.nio ByteBuffer |
| 22 | + ByteOrder))) |
| 23 | + |
| 24 | +;; **Exploration from the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)** |
| 25 | +;; *Second meeting - Nov. 08th 2025 and some follow-up investigation* |
| 26 | + |
| 27 | +;; Welcome! These are notes from our second study group session, where |
| 28 | +;; we're learning digital signal processing together using |
| 29 | +;; Clojure. We're following the excellent book |
| 30 | +;; [**Think DSP** by Allen B. Downey](https://greenteapress.com/wp/think-dsp/) (available free online). |
| 31 | +;; |
| 32 | +;; **Huge thanks to Professor Downey** for writing such an accessible and free introduction to DSP, and for sharing with us the work-in-progress notebooks of [Think DSP 2](https://allendowney.github.io/ThinkDSP2/index.html). |
| 33 | + |
| 34 | +;; Along with this study group came the idea to have an online |
| 35 | +;; creative coding festival around Clojure in the first months of |
| 36 | +;; 2026. In this meeting we spent some time brainstorming on how that |
| 37 | +;; might look and what the scope could be. The remaining time of the |
| 38 | +;; session we looked into downloading and reading WAV-files in |
| 39 | +;; Clojure. |
| 40 | + |
| 41 | +;; ## Why WAV Files? |
| 42 | +;; |
| 43 | +;; The notebooks in Think DSP 2 work with WAV files loaded from GitHub |
| 44 | +;; as a basis for further processing, so we need a way to load these |
| 45 | +;; as well. After obtaining the file, we need to get at the audio data |
| 46 | +;; it contains. |
| 47 | + |
| 48 | +;; ## Simplified WAV Format |
| 49 | + |
| 50 | +;; First, let's take a superficial look at what data WAV files |
| 51 | +;; contain, before we dive into getting the data. A simple WAV file |
| 52 | +;; consists of a header and pure audio data following it. There are |
| 53 | +;; several iterations on specifications for the WAV format and the |
| 54 | +;; format allows for quite some flexibility in placing different |
| 55 | +;; metadata in the file, as well as different encodings. |
| 56 | + |
| 57 | +^:kindly/hide-code |
| 58 | +(kind/mermaid |
| 59 | + "--- |
| 60 | +config: |
| 61 | + theme: 'forest' |
| 62 | +--- |
| 63 | +
|
| 64 | +block |
| 65 | + columns 1 |
| 66 | + block:wav |
| 67 | + columns 5 |
| 68 | + block:HeaderId |
| 69 | + columns 1 |
| 70 | + HeaderLabel[\"Header\"] |
| 71 | + end |
| 72 | +
|
| 73 | + block:F1 |
| 74 | + columns 1 |
| 75 | + FrameLabel1[\"Frame\"] |
| 76 | + end |
| 77 | +
|
| 78 | + block:F2 |
| 79 | + columns 1 |
| 80 | + FrameLabel2[\"Frame\"] |
| 81 | + end |
| 82 | +
|
| 83 | + block:F3 |
| 84 | + columns 1 |
| 85 | + FrameLabel3[\"Frame\"] |
| 86 | + end |
| 87 | +
|
| 88 | + block:FN |
| 89 | + columns 1 |
| 90 | + FrameLabelN[\"...\"] |
| 91 | + end |
| 92 | + end") |
| 93 | + |
| 94 | + |
| 95 | +;; The WAV (Waveform Audio File Format) file format is a |
| 96 | +;; RIFF (Resource Interchange File Format) file which stores data in |
| 97 | +;; **chunks**. Each **chunk** consists of a **tag** and **data**. Lets |
| 98 | +;; consider a partial example, which corresponds to the way the WAV |
| 99 | +;; file we want to read is arranged: |
| 100 | + |
| 101 | +^:kindly/hide-code |
| 102 | +(kind/mermaid |
| 103 | + "--- |
| 104 | +config: |
| 105 | + theme: 'forest' |
| 106 | +--- |
| 107 | +
|
| 108 | +block |
| 109 | + columns 1 |
| 110 | + block:wav |
| 111 | + columns 3 |
| 112 | + block:HeaderId |
| 113 | + columns 1 |
| 114 | + HeaderLine1[\"RIFF\"] |
| 115 | + HeaderLine2[\"WAVE\"] |
| 116 | + end |
| 117 | +
|
| 118 | + block:HeaderId2 |
| 119 | + columns 1 |
| 120 | + HeaderLine3[\"fmt \"] |
| 121 | + HeaderLine4[\"1\"] |
| 122 | + HeaderLine5[\"44100\"] |
| 123 | + HeaderLine6[\"16\"] |
| 124 | + end |
| 125 | +
|
| 126 | + block:data |
| 127 | + columns 1 |
| 128 | + DataLabel[\"data\"] |
| 129 | + ChanF1[\"ch0\"] |
| 130 | + ChanF2[\"ch0\"] |
| 131 | + ChanF2[\"ch0\"] |
| 132 | + ChanF3[\"ch0\"] |
| 133 | + ChanFN[\"...\"] |
| 134 | + end |
| 135 | + end") |
| 136 | + |
| 137 | +;; The header comprises of the **tag** `RIFF`, its **chunk** tagged |
| 138 | +;; with the specific format `WAVE` and a **subchunk** `fmt `, which |
| 139 | +;; describes the contained audio data. This represents some of the |
| 140 | +;; header information in a WAV file with a single, 16-bit mono sound |
| 141 | +;; channel and 44.100 samples per second. |
| 142 | + |
| 143 | +;; As we learned in the [first session](https://clojurecivitas.github.io/dsp/intro.html) |
| 144 | +;; of the DSP study group: |
| 145 | +;; > Sound waves are continuous vibrations in the air. To work with them on a computer, |
| 146 | +;; > we need to **sample** them - take measurements at regular intervals. The **sample rate** |
| 147 | +;; > tells us how many measurements per second. CD-quality audio uses 44,100 samples per second. |
| 148 | + |
| 149 | +;; These **samples** are stored in the WAV files `data` tagged |
| 150 | +;; **subchunk**. Since this is mono sound, there is one **frame** with |
| 151 | +;; one **channel** per **sample**. For multiple **channels**, each |
| 152 | +;; **frame** consists of all channels and their respective **sample**. |
| 153 | + |
| 154 | +;; ## Libraries We're Using |
| 155 | +;; |
| 156 | +;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay) |
| 157 | +;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay) |
| 158 | +;; - **[dtype-next](https://github.com/cnuernber/dtype-next)** - Efficient numerical arrays and vectorized operations (like NumPy for Clojure) |
| 159 | +;; - **[Tablecloth](https://scicloj.github.io/tablecloth/)** - DataFrame library for data manipulation and transformation |
| 160 | +;; - **[Tableplot](https://scicloj.github.io/tableplot/)** - Declarative plotting library built on Plotly |
| 161 | +;; - **[javax.sound.sampled](https://docs.oracle.com/en/java/javase/25/docs/api/java.desktop/javax/sound/sampled/package-summary.html)** - Some classes from the Java standard libraries sound package to read WAV Files. |
| 162 | + |
| 163 | +(require '[scicloj.kindly.v4.kind :as kind] |
| 164 | + '[clojure.java.io :as io] |
| 165 | + '[tech.v3.datatype.functional :as dfn] |
| 166 | + '[tablecloth.api :as tc] |
| 167 | + '[scicloj.tableplot.v1.plotly :as plotly]) |
| 168 | +^:kindly/hide-code |
| 169 | +(kind/code |
| 170 | + "(import '(javax.sound.sampled AudioFileFormat |
| 171 | + AudioInputStream |
| 172 | + AudioSystem) |
| 173 | + '(java.io InputStream) |
| 174 | + '(java.nio ByteBuffer |
| 175 | + ByteOrder))") |
| 176 | + |
| 177 | + |
| 178 | +;; ## Downloading a WAV File |
| 179 | +(defn copy [uri file] |
| 180 | + (with-open [in (io/input-stream uri) |
| 181 | + out (io/output-stream file)] |
| 182 | + (io/copy in out))) |
| 183 | + |
| 184 | +^:kindly/hide-code |
| 185 | +(def tuning-fork-file |
| 186 | + "18871__zippi1__sound-bell-440hz.wav") |
| 187 | + |
| 188 | +^:kindly/hide-code |
| 189 | +(def tuning-fork-url |
| 190 | + (str "https://github.com/AllenDowney/ThinkDSP/raw/master/code/" tuning-fork-file)) |
| 191 | + |
| 192 | +^:kindly/hide-code |
| 193 | +(def tuning-fork-file |
| 194 | + "18871__zippi1__sound-bell-440hz.wav") |
| 195 | + |
| 196 | +^:kindly/hide-code |
| 197 | +(def tuning-fork-file-compressed |
| 198 | + "18871__zippi1__sound-bell-440hz-compressed.wav") |
| 199 | + |
| 200 | +^:kindly/hide-code |
| 201 | +(def tuning-fork-path |
| 202 | + (str "src/dsp/" tuning-fork-file)) |
| 203 | + |
| 204 | +^:kindly/hide-code |
| 205 | +(def tuning-fork-path-compressed |
| 206 | + (str "src/dsp/" tuning-fork-file-compressed)) |
| 207 | + |
| 208 | +(copy tuning-fork-url tuning-fork-path) |
| 209 | + |
| 210 | +;; ## Playing a WAV File |
| 211 | +;; |
| 212 | +;; Kindly can embed a player with a URL, but the sample is extremely |
| 213 | +;; loud (it is a tuning fork struck in front of a microphone), so we |
| 214 | +;; don't embed this player. |
| 215 | +^:kindly/hide-code |
| 216 | +(kind/code "(kind/audio {:src tuning-fork-url})") |
| 217 | + |
| 218 | +;; Here we use a compressed and loudness normalized version of the |
| 219 | +;; original file, so you can safely listen to it. |
| 220 | +(kind/audio {:src tuning-fork-file-compressed}) |
| 221 | + |
| 222 | +;; ## Reading Metadata from the WAV File |
| 223 | +;; |
| 224 | +;; We define a function to collect some metadata from the file. |
| 225 | +(defn audio-format [^InputStream is] |
| 226 | + (let [file-format (AudioSystem/getAudioFileFormat is) |
| 227 | + format (.getFormat file-format)] |
| 228 | + {:is-big-endian? (.isBigEndian format) |
| 229 | + :channels (.getChannels format) |
| 230 | + :sample-rate (.getSampleRate format) |
| 231 | + :sample-size-bits (.getSampleSizeInBits format) |
| 232 | + :frame-length (.getFrameLength file-format) |
| 233 | + :encoding (str (.getEncoding format))})) |
| 234 | + |
| 235 | +(with-open [wav-stream (io/input-stream tuning-fork-path)] |
| 236 | + (def wav-format |
| 237 | + (audio-format wav-stream))) |
| 238 | + |
| 239 | +wav-format |
| 240 | + |
| 241 | +;; `:is-big-endian?` specifies the byte order of audio data with more |
| 242 | +;; than 8 `:sample-size-bits`. `:sample-size-bits` is the number of |
| 243 | +;; bits comprising a sample. The `:frame-length` is the total amount |
| 244 | +;; of frames contained in the audio data. |
| 245 | + |
| 246 | +;; We don't use much of that information for now, but it'll let us |
| 247 | +;; peek at what kind of WAV file we're working with in the future and |
| 248 | +;; we can use the information to extend our function for extracting |
| 249 | +;; audio data, which we define next. |
| 250 | + |
| 251 | +;; ## Reading Audio Data from the WAV File |
| 252 | +;; |
| 253 | +;; The bulk of work here is handled by the ``AudionInputStream``, but |
| 254 | +;; since it only reads bytes for us, we have to put these together |
| 255 | +;; into the correct datatype for each frame manually. For now we just |
| 256 | +;; put the data for 16-bit mono WAV files into a short-array. |
| 257 | +(defn audio-data [^InputStream is] |
| 258 | + (let [{:keys [frame-length]} (audio-format is) |
| 259 | + format (-> (AudioSystem/getAudioFileFormat is) |
| 260 | + AudioFileFormat/.getFormat) |
| 261 | + ^bytes audio-bytes (with-open [ais (AudioInputStream. is format frame-length)] |
| 262 | + (AudioInputStream/.readAllBytes ais)) |
| 263 | + audio-shorts (short-array frame-length) |
| 264 | + bb (ByteBuffer/allocate 2)] |
| 265 | + (dotimes [i frame-length] |
| 266 | + (ByteBuffer/.clear bb) |
| 267 | + (.order bb ByteOrder/LITTLE_ENDIAN) |
| 268 | + (.put bb ^byte (aget audio-bytes (* 2 i))) |
| 269 | + (.put bb ^byte (aget audio-bytes (inc (* 2 i)))) |
| 270 | + (aset-short audio-shorts i (.getShort bb 0))) |
| 271 | + audio-shorts)) |
| 272 | + |
| 273 | +(with-open [wav-stream (io/input-stream tuning-fork-path)] |
| 274 | + (def wav-shorts |
| 275 | + (audio-data wav-stream))) |
| 276 | + |
| 277 | +;; The difference between the WAV file bytes and the audio data we |
| 278 | +;; read is 44 bytes, which is the size of the default header and |
| 279 | +;; container. |
| 280 | +(with-open [wav-stream (io/input-stream tuning-fork-path)] |
| 281 | + (- (count (.readAllBytes wav-stream)) |
| 282 | + (* 2 (count wav-shorts)))) |
| 283 | + |
| 284 | +;; ## Striking the Fork |
| 285 | +;; |
| 286 | +;; Now that we have read the data we can reduce its amplitude, so we |
| 287 | +;; can listen to it safely. |
| 288 | +^kind/audio |
| 289 | +{:samples (dfn// wav-shorts 4000000.0) |
| 290 | + :sample-rate (:sample-rate wav-format)} |
| 291 | + |
| 292 | +;; In fact, the function `audio-data` above is quite similar to how [Clay](https://github.com/scicloj/clay/blob/main/src/scicloj/clay/v2/item.clj#L420) writes the audio data to a file for us to listen to in the browser, just the reverse of what we did for reading. |
| 293 | + |
| 294 | +;; ## Visualizing Waves |
| 295 | +;; |
| 296 | +;; Let's take a look at the sound of a tuning fork. |
| 297 | +(let [{:keys [frame-length sample-rate]} wav-format] |
| 298 | + (-> {:time (dfn// (range frame-length) |
| 299 | + sample-rate) |
| 300 | + :value wav-shorts} |
| 301 | + tc/dataset |
| 302 | + (plotly/layer-line {:=x :time |
| 303 | + :=y :value}))) |
| 304 | + |
| 305 | +;; ## What we learned |
| 306 | +;; |
| 307 | +;; In the second session and some pairing beyond we prepared for our |
| 308 | +;; forthcoming sessions on Think DSP by: |
| 309 | + |
| 310 | +;; - **WAV file format** - Learning about the structure of simple WAV files |
| 311 | +;; - **File download** - Downloading files with Java |
| 312 | +;; - **WAV file metadata** - Reading metadata of a WAV file |
| 313 | +;; - **WAV file audio data** - Reading the bytes in the audio data container and converting them to an appropriate data type |
| 314 | +;; |
| 315 | +;; ## Next Steps |
| 316 | +;; |
| 317 | +;; In our next study group meetings, we'll explore the book step by step, and learn more about sounds and signals, |
| 318 | +;; harmonics and the Forier transform, non-periodic signals and spectograms, noise and filtering, and more. |
| 319 | +;; |
| 320 | +;; Join us at the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)! |
| 321 | +;; |
| 322 | +;; --- |
| 323 | +;; |
| 324 | +;; *Again, huge thanks to Allen B. Downey for Think DSP. If you find this resource valuable, |
| 325 | +;; consider [supporting his work](https://greenteapress.com/wp/) or sharing it with others.* |
0 commit comments