Skip to content

Commit 6d5fcdc

Browse files
authored
Merge pull request #168 from ClojureCivitas/dsp-reading-wav-files
Working with WAV files for the DSP study group draft
2 parents 909be9f + 881a875 commit 6d5fcdc

File tree

3 files changed

+325
-0
lines changed

3 files changed

+325
-0
lines changed
2.25 MB
Binary file not shown.

src/dsp/wav.png

15.7 KB
Loading

src/dsp/wav_files.clj

Lines changed: 325 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,325 @@
1+
^{:kindly/hide-code true
2+
:clay {:title "DSP Study Group - Reading audio data from WAV-files"
3+
:quarto {:author [:daslu :onbreath]
4+
:description "Exploring WAV-files for DSP in Clojure."
5+
:category :clojure
6+
:type :post
7+
:date "2025-11-09"
8+
:tags [:dsp :math :music]
9+
:image "wav.png"
10+
:draft true}}}
11+
(ns dsp.wav-files
12+
(:require [scicloj.kindly.v4.kind :as kind]
13+
[clojure.java.io :as io]
14+
[tech.v3.datatype.functional :as dfn]
15+
[tablecloth.api :as tc]
16+
[scicloj.tableplot.v1.plotly :as plotly])
17+
(:import (javax.sound.sampled AudioFileFormat
18+
AudioInputStream
19+
AudioSystem)
20+
(java.io InputStream)
21+
(java.nio ByteBuffer
22+
ByteOrder)))
23+
24+
;; **Exploration from the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)**
25+
;; *Second meeting - Nov. 08th 2025 and some follow-up investigation*
26+
27+
;; Welcome! These are notes from our second study group session, where
28+
;; we're learning digital signal processing together using
29+
;; Clojure. We're following the excellent book
30+
;; [**Think DSP** by Allen B. Downey](https://greenteapress.com/wp/think-dsp/) (available free online).
31+
;;
32+
;; **Huge thanks to Professor Downey** for writing such an accessible and free introduction to DSP, and for sharing with us the work-in-progress notebooks of [Think DSP 2](https://allendowney.github.io/ThinkDSP2/index.html).
33+
34+
;; Along with this study group came the idea to have an online
35+
;; creative coding festival around Clojure in the first months of
36+
;; 2026. In this meeting we spent some time brainstorming on how that
37+
;; might look and what the scope could be. The remaining time of the
38+
;; session we looked into downloading and reading WAV-files in
39+
;; Clojure.
40+
41+
;; ## Why WAV Files?
42+
;;
43+
;; The notebooks in Think DSP 2 work with WAV files loaded from GitHub
44+
;; as a basis for further processing, so we need a way to load these
45+
;; as well. After obtaining the file, we need to get at the audio data
46+
;; it contains.
47+
48+
;; ## Simplified WAV Format
49+
50+
;; First, let's take a superficial look at what data WAV files
51+
;; contain, before we dive into getting the data. A simple WAV file
52+
;; consists of a header and pure audio data following it. There are
53+
;; several iterations on specifications for the WAV format and the
54+
;; format allows for quite some flexibility in placing different
55+
;; metadata in the file, as well as different encodings.
56+
57+
^:kindly/hide-code
58+
(kind/mermaid
59+
"---
60+
config:
61+
theme: 'forest'
62+
---
63+
64+
block
65+
columns 1
66+
block:wav
67+
columns 5
68+
block:HeaderId
69+
columns 1
70+
HeaderLabel[\"Header\"]
71+
end
72+
73+
block:F1
74+
columns 1
75+
FrameLabel1[\"Frame\"]
76+
end
77+
78+
block:F2
79+
columns 1
80+
FrameLabel2[\"Frame\"]
81+
end
82+
83+
block:F3
84+
columns 1
85+
FrameLabel3[\"Frame\"]
86+
end
87+
88+
block:FN
89+
columns 1
90+
FrameLabelN[\"...\"]
91+
end
92+
end")
93+
94+
95+
;; The WAV (Waveform Audio File Format) file format is a
96+
;; RIFF (Resource Interchange File Format) file which stores data in
97+
;; **chunks**. Each **chunk** consists of a **tag** and **data**. Lets
98+
;; consider a partial example, which corresponds to the way the WAV
99+
;; file we want to read is arranged:
100+
101+
^:kindly/hide-code
102+
(kind/mermaid
103+
"---
104+
config:
105+
theme: 'forest'
106+
---
107+
108+
block
109+
columns 1
110+
block:wav
111+
columns 3
112+
block:HeaderId
113+
columns 1
114+
HeaderLine1[\"RIFF\"]
115+
HeaderLine2[\"WAVE\"]
116+
end
117+
118+
block:HeaderId2
119+
columns 1
120+
HeaderLine3[\"fmt \"]
121+
HeaderLine4[\"1\"]
122+
HeaderLine5[\"44100\"]
123+
HeaderLine6[\"16\"]
124+
end
125+
126+
block:data
127+
columns 1
128+
DataLabel[\"data\"]
129+
ChanF1[\"ch0\"]
130+
ChanF2[\"ch0\"]
131+
ChanF2[\"ch0\"]
132+
ChanF3[\"ch0\"]
133+
ChanFN[\"...\"]
134+
end
135+
end")
136+
137+
;; The header comprises of the **tag** `RIFF`, its **chunk** tagged
138+
;; with the specific format `WAVE` and a **subchunk** `fmt `, which
139+
;; describes the contained audio data. This represents some of the
140+
;; header information in a WAV file with a single, 16-bit mono sound
141+
;; channel and 44.100 samples per second.
142+
143+
;; As we learned in the [first session](https://clojurecivitas.github.io/dsp/intro.html)
144+
;; of the DSP study group:
145+
;; > Sound waves are continuous vibrations in the air. To work with them on a computer,
146+
;; > we need to **sample** them - take measurements at regular intervals. The **sample rate**
147+
;; > tells us how many measurements per second. CD-quality audio uses 44,100 samples per second.
148+
149+
;; These **samples** are stored in the WAV files `data` tagged
150+
;; **subchunk**. Since this is mono sound, there is one **frame** with
151+
;; one **channel** per **sample**. For multiple **channels**, each
152+
;; **frame** consists of all channels and their respective **sample**.
153+
154+
;; ## Libraries We're Using
155+
;;
156+
;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay)
157+
;; - **[Kindly](https://scicloj.github.io/kindly-noted/kindly)** - Visualization protocol that renders our data as interactive HTML elements (through Clay)
158+
;; - **[dtype-next](https://github.com/cnuernber/dtype-next)** - Efficient numerical arrays and vectorized operations (like NumPy for Clojure)
159+
;; - **[Tablecloth](https://scicloj.github.io/tablecloth/)** - DataFrame library for data manipulation and transformation
160+
;; - **[Tableplot](https://scicloj.github.io/tableplot/)** - Declarative plotting library built on Plotly
161+
;; - **[javax.sound.sampled](https://docs.oracle.com/en/java/javase/25/docs/api/java.desktop/javax/sound/sampled/package-summary.html)** - Some classes from the Java standard libraries sound package to read WAV Files.
162+
163+
(require '[scicloj.kindly.v4.kind :as kind]
164+
'[clojure.java.io :as io]
165+
'[tech.v3.datatype.functional :as dfn]
166+
'[tablecloth.api :as tc]
167+
'[scicloj.tableplot.v1.plotly :as plotly])
168+
^:kindly/hide-code
169+
(kind/code
170+
"(import '(javax.sound.sampled AudioFileFormat
171+
AudioInputStream
172+
AudioSystem)
173+
'(java.io InputStream)
174+
'(java.nio ByteBuffer
175+
ByteOrder))")
176+
177+
178+
;; ## Downloading a WAV File
179+
(defn copy [uri file]
180+
(with-open [in (io/input-stream uri)
181+
out (io/output-stream file)]
182+
(io/copy in out)))
183+
184+
^:kindly/hide-code
185+
(def tuning-fork-file
186+
"18871__zippi1__sound-bell-440hz.wav")
187+
188+
^:kindly/hide-code
189+
(def tuning-fork-url
190+
(str "https://github.com/AllenDowney/ThinkDSP/raw/master/code/" tuning-fork-file))
191+
192+
^:kindly/hide-code
193+
(def tuning-fork-file
194+
"18871__zippi1__sound-bell-440hz.wav")
195+
196+
^:kindly/hide-code
197+
(def tuning-fork-file-compressed
198+
"18871__zippi1__sound-bell-440hz-compressed.wav")
199+
200+
^:kindly/hide-code
201+
(def tuning-fork-path
202+
(str "src/dsp/" tuning-fork-file))
203+
204+
^:kindly/hide-code
205+
(def tuning-fork-path-compressed
206+
(str "src/dsp/" tuning-fork-file-compressed))
207+
208+
(copy tuning-fork-url tuning-fork-path)
209+
210+
;; ## Playing a WAV File
211+
;;
212+
;; Kindly can embed a player with a URL, but the sample is extremely
213+
;; loud (it is a tuning fork struck in front of a microphone), so we
214+
;; don't embed this player.
215+
^:kindly/hide-code
216+
(kind/code "(kind/audio {:src tuning-fork-url})")
217+
218+
;; Here we use a compressed and loudness normalized version of the
219+
;; original file, so you can safely listen to it.
220+
(kind/audio {:src tuning-fork-file-compressed})
221+
222+
;; ## Reading Metadata from the WAV File
223+
;;
224+
;; We define a function to collect some metadata from the file.
225+
(defn audio-format [^InputStream is]
226+
(let [file-format (AudioSystem/getAudioFileFormat is)
227+
format (.getFormat file-format)]
228+
{:is-big-endian? (.isBigEndian format)
229+
:channels (.getChannels format)
230+
:sample-rate (.getSampleRate format)
231+
:sample-size-bits (.getSampleSizeInBits format)
232+
:frame-length (.getFrameLength file-format)
233+
:encoding (str (.getEncoding format))}))
234+
235+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
236+
(def wav-format
237+
(audio-format wav-stream)))
238+
239+
wav-format
240+
241+
;; `:is-big-endian?` specifies the byte order of audio data with more
242+
;; than 8 `:sample-size-bits`. `:sample-size-bits` is the number of
243+
;; bits comprising a sample. The `:frame-length` is the total amount
244+
;; of frames contained in the audio data.
245+
246+
;; We don't use much of that information for now, but it'll let us
247+
;; peek at what kind of WAV file we're working with in the future and
248+
;; we can use the information to extend our function for extracting
249+
;; audio data, which we define next.
250+
251+
;; ## Reading Audio Data from the WAV File
252+
;;
253+
;; The bulk of work here is handled by the ``AudionInputStream``, but
254+
;; since it only reads bytes for us, we have to put these together
255+
;; into the correct datatype for each frame manually. For now we just
256+
;; put the data for 16-bit mono WAV files into a short-array.
257+
(defn audio-data [^InputStream is]
258+
(let [{:keys [frame-length]} (audio-format is)
259+
format (-> (AudioSystem/getAudioFileFormat is)
260+
AudioFileFormat/.getFormat)
261+
^bytes audio-bytes (with-open [ais (AudioInputStream. is format frame-length)]
262+
(AudioInputStream/.readAllBytes ais))
263+
audio-shorts (short-array frame-length)
264+
bb (ByteBuffer/allocate 2)]
265+
(dotimes [i frame-length]
266+
(ByteBuffer/.clear bb)
267+
(.order bb ByteOrder/LITTLE_ENDIAN)
268+
(.put bb ^byte (aget audio-bytes (* 2 i)))
269+
(.put bb ^byte (aget audio-bytes (inc (* 2 i))))
270+
(aset-short audio-shorts i (.getShort bb 0)))
271+
audio-shorts))
272+
273+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
274+
(def wav-shorts
275+
(audio-data wav-stream)))
276+
277+
;; The difference between the WAV file bytes and the audio data we
278+
;; read is 44 bytes, which is the size of the default header and
279+
;; container.
280+
(with-open [wav-stream (io/input-stream tuning-fork-path)]
281+
(- (count (.readAllBytes wav-stream))
282+
(* 2 (count wav-shorts))))
283+
284+
;; ## Striking the Fork
285+
;;
286+
;; Now that we have read the data we can reduce its amplitude, so we
287+
;; can listen to it safely.
288+
^kind/audio
289+
{:samples (dfn// wav-shorts 4000000.0)
290+
:sample-rate (:sample-rate wav-format)}
291+
292+
;; In fact, the function `audio-data` above is quite similar to how [Clay](https://github.com/scicloj/clay/blob/main/src/scicloj/clay/v2/item.clj#L420) writes the audio data to a file for us to listen to in the browser, just the reverse of what we did for reading.
293+
294+
;; ## Visualizing Waves
295+
;;
296+
;; Let's take a look at the sound of a tuning fork.
297+
(let [{:keys [frame-length sample-rate]} wav-format]
298+
(-> {:time (dfn// (range frame-length)
299+
sample-rate)
300+
:value wav-shorts}
301+
tc/dataset
302+
(plotly/layer-line {:=x :time
303+
:=y :value})))
304+
305+
;; ## What we learned
306+
;;
307+
;; In the second session and some pairing beyond we prepared for our
308+
;; forthcoming sessions on Think DSP by:
309+
310+
;; - **WAV file format** - Learning about the structure of simple WAV files
311+
;; - **File download** - Downloading files with Java
312+
;; - **WAV file metadata** - Reading metadata of a WAV file
313+
;; - **WAV file audio data** - Reading the bytes in the audio data container and converting them to an appropriate data type
314+
;;
315+
;; ## Next Steps
316+
;;
317+
;; In our next study group meetings, we'll explore the book step by step, and learn more about sounds and signals,
318+
;; harmonics and the Forier transform, non-periodic signals and spectograms, noise and filtering, and more.
319+
;;
320+
;; Join us at the [Scicloj DSP Study Group](https://scicloj.github.io/docs/community/groups/dsp-study/)!
321+
;;
322+
;; ---
323+
;;
324+
;; *Again, huge thanks to Allen B. Downey for Think DSP. If you find this resource valuable,
325+
;; consider [supporting his work](https://greenteapress.com/wp/) or sharing it with others.*

0 commit comments

Comments
 (0)