derefr 6 days ago

It's so strange (and frustrating) to me that "Bluetooth audio" means "you pass the Bluetooth hardware PCM samples, and it encodes them itself in hardware; or the Bluetooth driver decodes packets in hardware to PCM samples, and then passes them to userspace."

It reminds me of the telephone network, where even though the whole thing is just another packet-switched network these days, the abstraction exposed to the handset is an analogue baseband audio signal.

---

Why can't we get another type of "Bluetooth audio", that works like VoIP does between handsets and their PBXes — where the two devices will:

1. do a little handshake to negotiate a set of hardware-accelerated audio codecs the devices (not the Bluetooth transceivers!) both support, in descending order of quality, constrained by link throughput + noise; and then

2. open a (lossy, in-order) realtime "dumb pipe" data carrier channel, into which both sides shove frames pre-encoded by their separate audio codec chip?

Is this just AVDTP? No — AVDTP does do a capabilities negotiation, sure, but it's a capabilities negotiation about the audio codecs the Bluetooth transceiver chip itself has been extended with support for — support where, as above, userland and even the OS kernel both just see a dumb PCM-sample pipe.

What I'm talking about here is taking audio-codec handling out of the Bluetooth transceiver's hands — instead just telling the transceiver "we're doing lossy realtime data signalling now" and then spraying whatever packets you the device want to spray, encoded through whatever audio-codec DSP you want to use. No need to run through a Bluetooth SIG standardization process for each new codec.

(Heck, presuming a PC/smartphone on the send side, and a sufficiently-powerful smart speaker/TV/sound bar on the receive side, both sides could actually support new codecs the moment they're released, via software updates, with no hardware-acceleration required, doing the codec part entirely on CPU.)

---

Or, if we're talking pie-in-the-sky ideas, how about a completely different type of "Bluetooth audio", not for bidirectional audio streaming at all? One that works less like VoIP, and more like streaming VOD video (e.g. YouTube) does?

Imagine a protocol where the audio source says "hey, I have this 40MB audio file, it's natively in container format X and encoding Y, can you buffer and decode that yourself?" — and then, if the receiver says "yeah, sure", the source just blasts that audio file out over a reliable stream data carrier channel; the receiver buffers it; and then the receiver does an internal streaming decode from its own local buffer from that point forward — with no audio channel open, only a control channel.

Given the "race to sleep" argument, I presume that for the average use-case of "headphones streaming pre-buffered M4As from your phone", this third approach would actually be a lot less battery-draining than pinging the receiver with new frames of audio every few-hundred milliseconds. You'd get a few seconds of intensive streaming, but then the transcievers on both ends could both just go to sleep until the next song is about to play.

Of course, back when the Bluetooth Audio spec was written, something the size of AirPods couldn't have had room to support a 40MB DRAM buffer + external hardware parse-and-decode of M4A/ALAC/etc. But they certainly could today!

2
iknowstuff 6 days ago

While we're at it, it'd be great if we could avoid remuxing e.g. facetime audio, which is AAC, and notification sounds, into a single stream before sending it to bluetooth. Would be nice to avoid the latency and just shove the raw AAC from facetime into the headset, and when a notification ping arrives, send that as a separate audio stream with maybe a different codec

derefr 6 days ago

Yeah, the ultimate Bluetooth audio protocol would probably be a meta-protocol, combining the two ideas I mentioned with a MIDI-like timecoded sequencing protocol. You'd pre-buffer one or more notification sound effects onto the receiver, registering them with audio-session-specific IDs; begin streaming the live audio (the stream gets an ID); and then use the control sequencing stream to say "mix in a copy of registered-stream N [the ping sfx] at time T." (And the sender then cut the sfx off early with another such command, if it wanted.)

mrheosuper 5 days ago

We basically did that in old project. We need to transfer audio from our device to phone, but our device is BLE only, and LE audio was not mature enough.

So we define a custom BLE service and blast audio file through it