blog post with the background story why this was created: https://asahilinux.org/2025/03/progress-report-6-14/#is-this...
Over 20 years ago I had a Toshiba Tablet PC convertible that had a beam forming array of microphones and it came with software that let you point where you wanted to record from.
The use case was for lectures, you could tell the laptop to just record from behind it, pointing the beam in the direction of the professor.
Amazing idea and something I haven't seen since.
In the golden age of mini camcorders, some Sony Handycams had "zoom" microphones which used beam forming to limit gathered sound to roughly the area equal to the what your sensor sees.
Another great idea.
Oh. They still make similar stuff: https://electronics.sony.com/imaging/imaging-accessories/all...
I feel like my iPhone does it. But not sure. Sound definitely changes when you zoom while recording
They do. They rarely mention it but they do:
https://devstreaming-cdn.apple.com/videos/wwdc/2019/249a0jw9...
The only content regarding audio I saw here are slides 124-140, which cover beam-forming but I didn't see anything about a default beam-forming profile tied to virtual zoom.
On current iPhone Pro (16) you can even select the audio mix you want for recorded video after recording.
This is a feature of iPhone, yes. Believe it came around the 11 (?) but it can really help when recording concerts if you're into that sort of thing.
Funny, that’s exactly when I hate it the most! If you zoom mid clip the sound very audibly changes which is not desirable.
Samsung phones have this as well, can be enabled or disabled in the camera settings.
Mine is too old to test the claim, but knowing that it has at least three microphones on board, It'd be absurd if Apple didn't implement it.
It's pretty computationally cheap, too, as long as you've got the math right and an easy way to choose where to aim the beam
It's used widely in fancy videoconferencing setups.
The mic array for the room figures out who's talking and isolates the audio from them.
(Videoconferencing in large rooms has long picked the loudest microphone to use at any time, to avoid mixing in noise from other mics, but then the beamforming makes it that much better.)
I'm wondering if that's why those kinds of setups offer good audio if only one person speaks and there's clear pauses between people, but as soon as you have a quick back and forth or two people talking the audio turns into complete mush.
I wonder how that worked. Assuming the microphones were on the screen plane rather than the body, it wouldn't be able to tell the difference between "straight in front" and "straight behind".
Straight in front is likely to be unobstructed while straight behind is likely to be obstructed by computer hardware. Therefore, straight in front is likely to have crisp sound while straight behind is likely to be muffled and/or distorted by reflections.
Yes, but that's not something you can beamform away with a planar array. Especially if your goal is to record what's going on behind the screen. You need something out of plane. Which they may have had! I don't know the details of the hardware.
As someone who made the mistake of putting a webcam cover over the tiny little microphone hole above the screen (it picks up very little besides impact noises now), it wouldn't be hard to have a mic hole facing in both directions to solve that problem
A single mic facing in both directions, or one mic next to the other but facing opposite directions, doesn't really help. You need separation between them in the direction of wave propagation (so in the front-back dimension of the laptop screen in this example) to tell which direction the sound is coming from.
They're in somewhat random locations, not symmetric and parallel as one might expect.
Sennheiser has a model that is mounted on ceiling. Haven’t seen this live.
https://www.sennheiser.com/en-us/catalog/products/meeting-an...
The attenuation provided by the tablet case / shell is quite significant. I bet they had some extra foam, or, something, to make it even stronger. So the "right behind" signal would be heard only if "right in front" is not readily drowning it.
Idea I've had for years but never got around to testing due to lack of compute:
Use a microphone array and LIDAR for ground truth, and train a diffusion model to "imagine" what the world looks like conditioned on some signal transformations of the microphone data only.
Could be used by autonomous vehicles to "see" pedestrians through bushes, early detect oncoming emergency vehicles, hear bicyclists before they are visible, and lots of other good things.
This already exists, it's the domain of inverse problems. Inverse problems consider a forward problem (in this case wave propagation) depending on some physical parameters or domain geometry, and deduce the parameters or geometry from observations.
Conceptually, it's quite simple, you need to derive a gradient of the output error with respect to the sought information. And then use that to minimize the error (= "loss function" or "objective" depending on field terminology), like you do in neural networks.
In many cases, the solution is not unique, unfortunately. The choice of emitters and receivers locations is crucial in the case you're interested in.
There's a lot of literature on this topic already, try "acoustic inverse problem" on google scholar.
So basically a kind of passive echolocation?
I like it. I think you'd need to be in known motion around the area to build up a picture -- I don't think it would work with a microphone just sitting in place.
Sort of!
If you shut your eyes off and you hear footsteps to your right you have a good idea of exactly what you're hearing -- you can probably infer if it's a child or adult, you can possibly infer if they are masculine or feminine shoes, you can even infer formal vs informal attire based on the sound of the shoes (sneakers, sandals, and dress shoes all sound different), and you can locate their angle and distance pretty well. And that's with just your organic 2-microphone array.
I imagine multiple microphones and phase info could do a lot better in accurately placing the objects they hear.
It doesn't need to build an accurate picture of everything, it just needs to be good at imagining the stuff that actually matters, e.g. pedestrians, emergency vehicles. Where the model decides to place a few rustling leaves or what color it imagines a person to be wearing is less relevant than the fact that it decided there is likely a person in some general direction even if they are not visible.
I just think diffusion models are relatively good at coming up with something explainable and plausible for a given condition, when trained on some distribution of data.
Like "oh I hear this, that, and that -- what reality could explain those observations from the distribution of realities that I have seen?"
Sounds like passive radar moved to the accoustic domain. It's a neat thing and there is some open source work around it. However it's also a good way to run afoul of ITAR, passive radar is still one of those secret sauce, software is a munition type things.
I have a passive radar. It also is a direction finding radio. I didn't have to jump through any hoops.
On recent devices with on-device NPU, could be combined with RF imaging of nearby activity and structure via WiFi 7 Sensing Doppler radar.
From the samsung S10 forward, this is a feature while recording video in zoom mode. I was always really curious how they did it.
My (never finished) master's thesis was about something similar - taking advantage of the fact that (almost) all smartphones have at least two microphones I wanted to locate and separate a speaker in 3D.
A few takeaways:
-The sampling rate is slightly off between devices - approximately ±1 sample per second - not a lot, but you need to take that into account.
-Spectral characteristics in consumer microphones are all over the place - two phones of the same model, right out of the box, will have not only measurable, but also audible differences.
-Sound bounces off of everything, particularly concrete walls.
-A car is the closest thing to an anechoic chamber you can readily access.
-The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
> - A car is the closest thing to an anechoic chamber you can readily access.
I recall a youtuber solving the anechoic chamber problem by finding a big empty field - nothing to reflect off of except the ground - and maybe putting some foam below the experiment.
It doesn't kill environmental noise, of course, but it apparently did a very good job of killing reflections from his own instruments.
In my case wind noise disturbed the signal too much. Normally there's additional processing which deals with it, but I was working with (next to) raw data.
Surely a carpeted closet full of clothes is better than a car
I didn't have such a place at the time, but I found one and results weren't as good as in a car.
Sound deadening generally requires mass to work for lower frequencies and the seats absorbed them all nicely. I got some reflections from - I assume - the windows, but they were manageable in comparison. Didn't even produce much of a standing wave when I tried.
>The Fourier transform of a Gaussian is a Gaussian, which is very helpful when you need to estimate the frequency of a harmonic signal (like speech) with a wavelength shorter than half your window, but just barely.
I get the gaussian link. But, can you explain your point with more detail?
The log of a Gaussian is a parabola, which makes finding where exactly a peak in the spectrum lies a question of solving a quadratic equation.
My plan was to detect the frequency of the speaker by counting (with weights) which distance between peaks is the most common. I wanted to avoid calculating the power cepstrum as I felt that I was running out of computing power on the devices already[0] - a mistaken belief in the long run, but I was too proud of my little algorithm and how stable it was to let go.
[0] Sending raw sample data to a more powerful machine was out of the question as I wanted to remain within the bandwidth offered by Bluetooth at the time due to power consumption considerations.
Wow, this really puts into perspective how much work has to be put into even the most insignificant details of getting Linux to run on (Apple Silicon) Macs. I say "insignificant" with all due respect because, well, the built-in microphone sees very little use (except if you have forgotten your headset).
Or, to quote the progress report (https://asahilinux.org/2025/03/progress-report-6-14/#is-this...): "This is Apple though. Nothing is ever simple."
The built-in microphone is actually excellent, I often use it even when I have my AirPods Pro in because the sound quality is so much better
If you've got headphones with a wraparound microphone on its own arm then it could be better, but everyday headphones are limited by the position of the microphone
Yeah, no matter how good the microphone actually is on a headset, it uses an ancient codec so until we get Bluetooth 5.3 everywhere with lc3 codex then we won't actually have good mic input from headphones and headsets. I predict that this is all going to change this year and next year. But the full stack has to support it from headphones to Bluetooth chips to OS.
Headsets can and do use other codecs already. This is especially true for Enterprise headsets with dongles - these still use Bluetooth but by controlling both sides they can pick codecs.
LE Audio is great though - and is already, as "the full stack" has had support for quite a while... Assuming you don't happen to get your equipment from a certain fruit supplier that is notoriously slow at implementing open standards, almost as if they want to not give you a choice outside buying their own proprietary solutions...
I cannot wait to not take an audio quality hit while the mic is on on the airpods.
Especially since OSX is terrible at input/output preferences.
If you alt(opt)+click the sound icon in the menu bar you can easily select your inputs and outputs. I really just want airpods with a mic and no audio quality hit so I can use it in simracing so I don't have to have an external mic arm.
It switches back on a whim for the most arbitrary things, though. In Windows the same can happen but I can at least temporarily disable an input if it is doing that.
Doing some things like disabling an input/output device, or an internal keyboard, or a webcam. Almost impossible. Even if there are some ways, they change so often. Let's say you have two cameras and an application that always picks the internal one. I couldn't find a way to disable the internal camera so that this app would pick the only available one.
Ah yeah you're right. Does the "Audio MIDI Setup" Mac utility app help you here at all?
It gets close, but no way to truly pin it still. It effectively does the same thing that System Settings > Sound > Output & Input does but with a better UI making it clearer that you are making a change to the primary. But the change is still just as unpinned as it would be from the other location.
It's so strange (and frustrating) to me that "Bluetooth audio" means "you pass the Bluetooth hardware PCM samples, and it encodes them itself in hardware; or the Bluetooth driver decodes packets in hardware to PCM samples, and then passes them to userspace."
It reminds me of the telephone network, where even though the whole thing is just another packet-switched network these days, the abstraction exposed to the handset is an analogue baseband audio signal.
---
Why can't we get another type of "Bluetooth audio", that works like VoIP does between handsets and their PBXes — where the two devices will:
1. do a little handshake to negotiate a set of hardware-accelerated audio codecs the devices (not the Bluetooth transceivers!) both support, in descending order of quality, constrained by link throughput + noise; and then
2. open a (lossy, in-order) realtime "dumb pipe" data carrier channel, into which both sides shove frames pre-encoded by their separate audio codec chip?
Is this just AVDTP? No — AVDTP does do a capabilities negotiation, sure, but it's a capabilities negotiation about the audio codecs the Bluetooth transceiver chip itself has been extended with support for — support where, as above, userland and even the OS kernel both just see a dumb PCM-sample pipe.
What I'm talking about here is taking audio-codec handling out of the Bluetooth transceiver's hands — instead just telling the transceiver "we're doing lossy realtime data signalling now" and then spraying whatever packets you the device want to spray, encoded through whatever audio-codec DSP you want to use. No need to run through a Bluetooth SIG standardization process for each new codec.
(Heck, presuming a PC/smartphone on the send side, and a sufficiently-powerful smart speaker/TV/sound bar on the receive side, both sides could actually support new codecs the moment they're released, via software updates, with no hardware-acceleration required, doing the codec part entirely on CPU.)
---
Or, if we're talking pie-in-the-sky ideas, how about a completely different type of "Bluetooth audio", not for bidirectional audio streaming at all? One that works less like VoIP, and more like streaming VOD video (e.g. YouTube) does?
Imagine a protocol where the audio source says "hey, I have this 40MB audio file, it's natively in container format X and encoding Y, can you buffer and decode that yourself?" — and then, if the receiver says "yeah, sure", the source just blasts that audio file out over a reliable stream data carrier channel; the receiver buffers it; and then the receiver does an internal streaming decode from its own local buffer from that point forward — with no audio channel open, only a control channel.
Given the "race to sleep" argument, I presume that for the average use-case of "headphones streaming pre-buffered M4As from your phone", this third approach would actually be a lot less battery-draining than pinging the receiver with new frames of audio every few-hundred milliseconds. You'd get a few seconds of intensive streaming, but then the transcievers on both ends could both just go to sleep until the next song is about to play.
Of course, back when the Bluetooth Audio spec was written, something the size of AirPods couldn't have had room to support a 40MB DRAM buffer + external hardware parse-and-decode of M4A/ALAC/etc. But they certainly could today!
While we're at it, it'd be great if we could avoid remuxing e.g. facetime audio, which is AAC, and notification sounds, into a single stream before sending it to bluetooth. Would be nice to avoid the latency and just shove the raw AAC from facetime into the headset, and when a notification ping arrives, send that as a separate audio stream with maybe a different codec
Yeah, the ultimate Bluetooth audio protocol would probably be a meta-protocol, combining the two ideas I mentioned with a MIDI-like timecoded sequencing protocol. You'd pre-buffer one or more notification sound effects onto the receiver, registering them with audio-session-specific IDs; begin streaming the live audio (the stream gets an ID); and then use the control sequencing stream to say "mix in a copy of registered-stream N [the ping sfx] at time T." (And the sender then cut the sfx off early with another such command, if it wanted.)
We basically did that in old project. We need to transfer audio from our device to phone, but our device is BLE only, and LE audio was not mature enough.
So we define a custom BLE service and blast audio file through it
I think the bigger issue might be the microphone placement. Humans tend to prefer microphones that are closer to microphones which are further away (this is one reason headsets w/ a boom arm usually sound better than a built-in microphone.) Having the microphone behind you / to the side (as in the case of an AirPod) is not great either. Of course, audio processing can fix a lot of this.
Are AirPods limited to the Bluetooth spec though? I think they extend it.
i don't know the details but airpods pro sound noticeably terrible and bluetooth-y. It's almost shocking.
They extend it in some ways, but I'm not sure if they do in this way. They do sound kind of terrible, but I always assumed it was due to the microphones being way back by your ears. I'm not sure though
Everyday headphones are limited by the fact that people often use Bluetooth, and Bluetooth audio is just terrible tech that hasn't improved by much in the last 10 years, and still can't do more than 16kHZ when doing both input and output at the same time.
I think this isn't a problem if you're using Apple headphones with Apple devices, but anything else falls back to crappy BT quality, usually with some kind of terrible ANC to boot.
FOr me, crappy audio setups and apps trying to do too much audio processing are the primary reason of "Zoom fatigue". I've done a lot of calls over apps that transmit raw, high-quality audio with no processing whatsoever, and the experience is just so much better.
Apple-Apple Bluetooth speech codec is a variation of AAC, I believe. AAC-LD if I remember correctly. But still, having microphones in one's ears is suboptimal. There's a lot of processing required even though the codec is no longer completely awful.
On an unrelated note, I tried doing calls with a stereo mic setup but participants were actually uncomfortable with the ASMR-like effect of the audio.
Plenty of good headsets do beamforming with their microphones as well, just depends on what you're running. Macbook mics are well above average, though, so I agree in most cases they'll be better unless you're picky about your headset mic quality.
This is also a great example counterpoint for the folks who constantly complain about Apple hardware being "overpriced". Most laptop mfgs are happy to just solder on whatever tiny $0.50 compatible MEMS mic and put a little toothpick-sized hole in the case and call it good enough, or add two and rely on whatever generic beam forming that isn't adapted to their specific mic choice, placement, case acoustics, etc the Realtek ALC262 or whatever gives them, and call it a day.
Apple puts a ton of R&D into making things work well. As another example: Macbooks have been, for 15+ years now, the only laptops that I can trust to actually sleep and conserve battery when I close the lid and slip into a backpack for a few-hr flight. Windows and Linux on laptops seem to have about a 70% chance of either not sleeping, not waking up right (esp with hybrid graphics), or trying to do forced Windows updates and killing the battery, then waking back up to 20+ minutes of waiting for updates to resume / finish with no meaningful progress indicator or way to cancel / delay.
Not everything they do is perfect, and I'm not some huge Apple fanboy, but they do offer a significantly better experience IMO and feel "worth" the premium. It's not as if modern gaming laptops are any cheaper than MBPs, but they certainly feel much jankier, with software and UX to match. As an example, the IEC plug on the power supply of my Asus Zephyrus Duo wiggles enough that it disconnects even with different IEC cables. I've had to wrap some electrical tape around the plug body to get it to be less flaky. Asus Armoury Crate is a terrible buggy and bloated piece of software that runs about a dozen background processes to deliver a "gamer" UI to...control fans, RGB lights, and usually fail to provide updates. They also have utilities like https://www.asus.com/us/content/screenxpert3/ and "ROG ScreenPad Optimizer" that are largely buggy garbage, but sometimes required to get their proprietary hardware to work properly.
Does Apple gouge users for extra RAM and SSD space? Absolutely, but you're paying for the R&D as much as the actual hardware. I wish they'd just price that into the base models and make upgrades cheaper, but their pricing strategy seems to be lowering the base entry point to something more appealing with "it barely works" levels of spec, while making increasingly ridiculous margins on higher specs -- an additional $4,600 to go from 1TB -> 16TB on the Mac Studio is pretty bold considering consumer QTY=1 pricing on a fast M.2 SSD is around $600 for 8TB, and I'm sure their BOM costs are around the same for 16TB worth of silicon in huge quantities.
> Macbooks have been, for 15+ years now, the only laptops that I can trust to actually sleep and conserve battery when I close the lid and slip into a backpack for a few-hr flight.
Even the cheapest of Chromebooks sleep and resume reliably. I suspect the reason is not purely R&D, but limiting the number of supported devices/chipsets and testing the supported configuration thoroughly. Chromebook OEMs can only manufacturer specific hardware combinations blessed by Google, and in exchange Google updates the drivers during the support period.
> the only laptops that I can trust to actually sleep and conserve battery when I close the lid
+1 on this one... I can close my lid (from on) and set my M1 air aside for a few weeks and still have plenty of battery left. I don't use it much when not traveling, it's mostly my desktop, work laptop or phone.
Also +1 on the hardware feel... it's got an above average stiffness, keyboard feel (for what little that's worth) and the best touchpad experience hands down. The screen is also on the higher end (I've seen slightly better in some really expensive laptops). All around, it's a pretty great value on the mid-high range. What I don't like is the aging UI/UX, the variance from other platforms (I use Linux and Windows pretty regularly) and some things that I just find harder on the platform in general.
I don't think I'd every buy a maxed out Apple product all the same, I don't use an iPhone or anything else but my laptop. That sometimes makes the ecosystem integrations slightly annoying. That said, my current laptop is still running well, and my prior laptop from over a decade ago is still running fine for my Daughter's needs... though she may get my m1 if/when I move to a Framework 13 (strix halo).
Keep in mind you can't just upgrade a Mac Studio to 16 TB for $4,800. You can go to 8 TB for $2,400, but to move up to 16 TB you also need to upgrade to the Ultra chip for an additional $1,000, which also necessitates moving up to 96 TB RAM. So when all is said and done, you're looking at an additional cost of $6,599.
As a photographer, this is a bit maddening.
For what it's worth, you do get a 10gb nic option and can just connect to a NAS with lots of fast storage and nvme caching drives.
Yeah, for the Mac Studio, which is likely to stay in one place, this probably works well. In actuality, I use a Macbook Pro, which has the same pricing issue.
In my experience, the fastest option for this is NFS without encryption, which is only really viable on a local network as it's hecking insecure (sure, wrap it in Wireguard, but now you're slowing it down again) and over Wifi at least, it's definitely slower than using an NVMe drive plugged into the Macbook, at least for 40 MP files coming out of my Fuji.
The external NVMe drive w/ Thunderbolt works... OK. But it's annoying (both physically and in terms of sleep/wake causing dismount warnings, etc.)
> the only laptops that I can trust to actually sleep
They don't actually sleep. Apple remarketed the concept of never sleeping as "Power Nap".
You can choose to have it actively updating the system or not, but it never actually sleep, just go into a ridiculously low power mode. You'll get the same on Surface Pro laptops or Chromeboks for instance.
Actual sleep only happens when the battery is about to die.
https://support.apple.com/guide/mac-help/turn-power-nap-on-o...
You're confusing sleep with hibernation.
Power Nap is just fancy name for scheduled wakeups; it was supposed to be more but my understanding is that this never really materialized.
I'd want hibernation but it's not offered in most laptops any more, to my knowledge.
I might be confused on the name they chose to market never sleeping (never do full suspend to RAM with CPU shutdown except on special circumstances), as it was announced with Power Nap as the front facing feature.
Hibernation is still 100% an option on Windows. You can even set it to hibernate when you close a laptop's lid
It's part of the OS but the option can be removed by the OEM. I still haven't found a way to get it on an ASUS laptop, same for Surface Pro.
From MS's doc:
> This option was designed for laptops and might not be available for all PCs. (For example, PCs with InstantGo don't have the hibernate option.)
https://support.microsoft.com/en-us/windows/shut-down-sleep-...
counter point; as a gamer I don't want to waste even a penny on a built in microphone on my laptop -> maybe nice to have as a last resort; but even then I could just discord on my phone.
I just want a heatset aux port and I'm GTG. I want my money put into the GPU/CPU/Display/Keyboard.
Now my macbook pro for work? Yeah; high expectations there for AV quality in terms of joining meetings etc.
I hope you do not take notes or brush dust off the macbook whilst in a video call.
Why, are they not able to reject these types of noise? My X1 doesn't even register typing in a video call
Software noise cancellation is actually kind of amazing. During the pandemic when I was doing 8 hours of video calls a day, I paid for Krisp and it eliminated any background noise pretty much perfectly. One time a very loud fire truck was slowly driving by. It was so loud I couldn't even hear myself think and just stopped talking. People were confused because that noise was eliminated but I was just talking very weirdly ;)
In the interim, they raised the price and added a ton of bloat so I don't use it anymore. (The bloat killed it, not the price. And the popup that's like "you're so stupid that you can't even figure out how to enable Krisp Speaker, you idiot". I'm well aware of how to enable it, but I have chosen not to, as I do not want to heavily process the audio that I'm listening to. Only emitting. "Don't ask again" would have probably made them an extra $110 at least.)
Pretty much the same boat. Early 2020 I was spending so much time on calls headphones were just tiring. So it started as "get a microphone + speaker setup that doesn't echo" and just kind of spiraled into a half decade of incremental improvements.
Don't know that I've had anything as loud as a fire truck, but more than a few times I've had a 75lb dog a few feet away from me barking like mad, whining at me, playing by throwing a cow femur up in the air and letting it crash down on the vinyl floor, etc and apologized about the noise only to have people look at me funny and tell me they didn't hear anything but that explains why I seemed like I was having trouble speaking.
I think the only time I had anyone say anything about anything was when I accidentally had an air conditioner blowing directly on my microphone. They couldn't hear it, but my voice was coming through a little less crisp than usual as the noise cancellation was trying to remove the constant, high volume white noise.
Don't know what OS you're on, but on Linux I can definitely recommend Easy Effects (https://flathub.org/apps/com.github.wwmm.easyeffects). Been using RNNoise + Speex along with some other filtering for quite a while now to great effect.
One thing I found worked _really_ well if you're already using an external microphone of some sort--using the webcam microphone as part of a noise gate. On top of the filtering and existing gating, my audio only opens if my webcam _also_ picks up sound of sufficient volume. Lets me keep the microphone in front of my face fairly sensitive while still entirely eliminating echo and most off-axis sounds.
I constantly hear a harsh swoosh when people wipe stuff or drag their palms across their macbooks.
Wow not my experience at all.
The MBP mic is generally preferable to most headset boom mics in my experience with good noise reduction. You also get the benefit of not picking up extraneous mouth noises (gum chewing, coffee slurping, whatever)
I feel like 99% of people I conference with use regular headphones + MBP mic
Main problem with that setup is not being able to hear your own voice in the headphones (feedback, or whatever that's called) which can be annoying sometimes if using NC headphones
> feedback, or whatever that's called
Monitoring.
There are umpteenth ways to do that, and I find headsets themselves do it the most poorly of all (if they have the feature at all).
> The MBP mic is generally preferable to most headset boom mics
Another benefit is not paying the '90s GSM handsfree BT profile codec pain (at the cost of A2DP having slightly higher latency)
> Monitoring
It's called sidetone. Headsets do it so your ears don't feel clogged and to avoid subconscious yelling.
Some headsets let you adjust it either through a regular Sidetone volume control or some dedicated app. Soundcards also often have this feature in the form of a Mic output volume control, done in hardware to reduce latency.
A significant difference in headset quality is in sidetone latency. The heavier the DSP processing required to get a reasonable mic output, the harder it is to hit latency targets. Headset SoCs have dedicated hardware for this - a user-space solution like Apple pulls on their laptops would not be able to meet a usable latency target.
> Another benefit is not paying the '90s GSM handsfree BT profile codec pain
LE Audio includes the LC3 codec, solving this once and for all.
In the meantime while this rolls out, various alternate codecs exist that are fairly widely supported. This is especially true when using fancier headsets with a dedicated bluetooth dongle as they have more flexibility when it comes to codecs and compatibility.
Actually my complaint relates to open office designs, the macbook mic picks up louder people from across the room. So if I do use headphones and the MBP mic, other people will hear random noise blurbs from anywhere in the office .
If you click the orange microphone icon in the menu bar while it’s in use it lets you switch to a mode that only captures your voice
I don't think I recall having a meeting with anyone using plain headphones with the laptop mic instead of a headset of some kind. Wired headphones without a mic are somewhat unusual nowadays to begin with outside audio file circles.
AirPods of various versions is common, as is many other buds. Enterprise headsets like those from EPOS (the old Sennheiser Communication) and Jabra (with or without boom) and speakerphones are common in corporate settings, casual headsets (e.g., Sony, Bose) and wired gaming headsets are common at home.
Well it is simple if you use the whole package as delivered (although Apple has been straying off the road it paved for quite a while now).
The point is, everything they make is vertically integrated. They want to deliver a feature (like Airdrop or Continuity), they will cut across the stack to get it done. If you go the DIY route (which is what Asahi is effectively all about), you get to also DIY the missing software pieces.
The upside is that the entire ecosystem gets to benefit from that work (see e.g. the new DSP in PipeWire). PC hardware is generally crap, and so is Apple's if you omit these extra bits. But "the whole package" sets the bar quite a bit higher. I want to see the FOSS ecosystem meet that bar.
The three-mic array is also found in Intel-based Retina MacBooks, so this might also be useful for proper audio support on that older hardware. (Some early Retina MacBook Pros have a two-mic array only, but most have the full three-mic array.)
I always set my microphone to MacBook's even when wearing a headphones, because the quality is incredibly good even in noisy environments. In Zoom I also set "Original sound for musicians" on if in a quiet location. So much more natural sound.
Because most mics are still using Bluetooth 5.0 I use the microphone on my Mac even when I'm wearing a headset. Otherwise, it puts me into a weird codec mode of ancient history where I get downgraded to a low bit rate and even my audio input to my ears sounds horrible then. So I always use the Mac microphone when possible.
It's more annoying on Linux where you have to manually switch... at least most apps in windows/mac will automagically put my headset in the correct mode.
I always prefer headset too, but I did find it striking how good the audio quality of the built in mic was compared to headset when I tried it once..
I exclusively use the built-in microphone for work meetings. I don't even have any other work-issued microphone unless we count my phone.
You can get surprisingly good results from cheap laptop hardware (as well as fancier hardware like an MBP) using software DSP techniques. One of the things I'm pleased about is that quite a bit of Asahi's audio work is just as applicable to generic laptops as it is to Macs.
I already use the Bankstown bass harmonics synthesis plugin developed for Asahi and a convolution EQ on a cheap HP laptop, with startlingly impressive results, using the Pipewire plugin chain autoload feature also developed for Asahi.
I suspect there are quite a few use cases for this beamformer outside of the Asahi ecosystem as well.
Regarding the SIMD optimizations, the authors may want to look into faer. I haven't had a great experience with its underlying library pulp, as I'm trying to things that go beyond its linear algebra roots, but if the goal is primarily to accelerate linear algebra operations, I think it will go well.
I've got a blog post and associated podcast on Rust SIMD in the pipeline, we'll touch on this.
Github repo https://github.com/chadmed/triforce
> the microphone array found in the following Apple Silicon laptops: > MacBook Pro 13" (M1/M2) > MacBook Air 13" (M1/M2) > MacBook Pro 14" (M1 Pro/Max, M2 Pro/Max) > MacBook Pro 16" (M1 Pro/Max, M2 Pro/Max) > MacBook Air 15" (M2)
Does it mean M2/M3 don't have similar array of microphones or rather not tested?
I'm even curious if this is only supported on Linux or MacOS as well - not sure if apple provides dedicated microphone stream for each mic?
It's made just for Asahi Linux. MacOS does some very similar beamforming math behind the scenes, so it just presents you with a single unified mic.
They list M2 devices. M3 is just not supported by Asahi Linux, so not being listed is just orthogonal to if M3 has any mics like this.
MacOS has its own software deep within the system for handling this; it's only exposed as a normal microphone to application software.
There is a more general discussion on the latest Asahi Linux progress report.
> Unfortunately, PDM mics are very omnidirectional and very sensitive. We cannot get by without some kind of beamforming.
https://asahilinux.org/2025/03/progress-report-6-14/
Also, it turned out that some previous work done for the speaker output was reused here for mic input.
> Thanks to the groundwork laid in PipeWire and WirePlumber for speaker support, wiring up a DSP chain including Triforce for the microphones was really simple. We just had to update the config files, and let WirePlumber figure out the rest!
Much like with the speakers, Apple are trying way too hard to be fancy here
Could the author of this package comment on this statement? I'd be really interested in their opinion of their speaker implementation.What's overly complicated there? The hardware? The software?
As a MBP user and hobbyist audio guy I've been really impressed with the implementation of those speakers, particularly on the larger MBP models.
But I'm just a hobbyist and don't have any knowledge of them other than the driver arrangement (tweeter + dual opposed woofers). It certainly seems like they're pulling the same tricks used by "good" bluetooth speaker designers in order to wring acceptable perf and bass extension from teeny tiny speakers (adaptive EQ etc)
Getting reasonable speaker support in Asahi Linux was a big deal. Part of the problem is that limiting the power usage to prevent overheating requires sophisticated DSP. Without that, you get very limited volume output within safe limits.
Probably the best overview to find out more is here: https://github.com/AsahiLinux/asahi-audio
wow I'm surprised overheating is the bottleneck, I would've assumed clipping would damage the drivers before that
Yup. A little more detail on the overheating part in particular is here: https://github.com/AsahiLinux/speakersafetyd
Fascinating, thanks for sharing! Very surprised they do this at the software level.
> Much like with the speakers, Apple are trying way too hard to be fancy here
It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses - and this is true since multiple generations. Had a MBP from 2014 and multiple friends were astonished about the sound when we watched a movie on the go. Same with the M4 MBP - sounds quality from the speaker is at a level that you probably don't actually need.
I feel like this must be some kind of a language barrier thing - the dev’s name appears to be Spanish, so English may not be their native language. And I think that most native English speakers - as demonstrated by multiple comments asking about it in this thread - would interpret “trying too hard to be fancy” as implying “because you can get similar high-quality results without using such sophisticated techniques”; but it seems like you’re saying (and this makes sense) they meant “because getting such high-quality results is overkill for a consumer laptop”.
Language is fascinating - I can convince myself with enough effort that the latter is just as valid as the former, given the literal meaning of the words, but my linguistic intuition is screaming at me that it’s wrong. How does someone ever learn that? How would a textbook ever explain it?
Agree with you, I was confused why everybody else interpreted in a different way. Am not spanish but german and not a native speaker, so the language barrier thing might be a good explanation.
> It is just a reference that Apple Laptop speakers have been waaay above anything the competition uses
More like the opposite. The MacBook speakers are absolutely rubbish, just like all laptop speakers (there's only so much you can do when constrained to a laptop body). The reason why MacBooks sound good is entirely god-tier signal processing which manages to extract extraordinary performance out of some decidedly very ordinary speakers.
https://github.com/AsahiLinux/asahi-audio#why-this-is-necess...
Not sure what you are saying (or just ranting?) - MBP speaker are the opposite as in the rest of non-apple Laptops have way better sounding speakers? That is definetely not my experience at all.
If they are all rubish together, well, they are laptop speakers - and as such you have to treat them. Still there is nothing preventing some set of laptop speakers being objectively better than others.
They're saying that the physical speakers inside the MacBooks body are not what makes them sound good (and that the physical speakers are on par with other manufacturers) — it's the fancy, custom post-processing that does.
2Quote from their own link: "In the case of Apple Silicon machines, Apple has taken things one step further by including actually good speakers on most modern Macs"
In my experience MBP 2015 sound is pretty thin and high frequencies are prone to clipping at even a moderate volume – soprano vocal parts suffer from this quite a bit. Of course for most uses that’s not a big problem and I’m sure the sound is still much better than that of many other laptops though. But the M series MBP speakers are a crazy improvement.
My guess (without value judgement) is he was referring to the fact that they don't really work without such software
How's hardware supposed to work without software?
Here's a similar situation with the macbook pro's speakers, from the Asahi Linux team (scroll down to "Audio Advances"): https://asahilinux.org/2022/11/november-2022-report/
Similarly they can't be used very effectively without special, complex software that involves physical simulation of the speaker hardware. Doing things this way allows them to reach an amazing level of compactness + volume, but at the cost of complexity
If Apple intended to support platform openness, they'd likely have made such software available to hackers. But they never enthusiastically encouraged that, so people like the Asahi team are left to reverse-engineer and reinvent everything they need that lives in software
With a hardware DSP? It's gonna have software in it, but doing this kind of processing in the upper most top level OS stack is certainly a choice.
It seems like a good choice. It’s computationally extremely light and you can update it much more easily with new features (they actually did this once - to let you change the beamforming mode in the menu bar)
It is also notoriously time sensitive however, and while likely the hardware can already ensure the synchronization between mics, processing in the OS itself necessarily means buffering for a significant period so you don't run the risk of draining the pipe in a non-realtime system.
Seems like a common pattern lately that apples hardware people continues to be top notch and the software group is slacking.
That's not at all the takeaway. macOS has the requisite software built-in; the hardware is designed in such a way that it requires software assistance to function, which is a choice that has advantages and disadvantages. The OP exists for situations where you aren't running Apple's own beamforming software on this hardware (to my understanding)
I don't think that's really fair here? The comment suggests the hardware doesn't work well without relatively complex software to support it, which seems to be the case on macos. That suggests the software group are keeping up their end at least.
I have a feeling that this package is for folks that want to run Linux distros on the laptops, and have access to the same capabilities as native MacOS.
I'm confused too. These days, "spatial audio" on speakers (different from on headphones) and beamforming mics is starting to feel standard, at least on premium hardware.
Dumb, noisy, cramped, unbalanced audio just doesn't cut it anymore.
if you think fake 5.1ch sounds better, not like better for enjoying action movies, you've never had exposure to a >$99 pair of bookshelf speakers with a non-USB powered class D amp. change my mind.
Huh? Who's talking about bookshelf speakers?
This is about laptop speakers that just pass audio directly through, vs. laptop speakers that process the audio including spatially. Yes, it sounds dramatically better. And it's not just about "fake 5.1" but even just mono or stereo.
External speakers are a totally different conversation.
For the software to perform beamforming it must be provided the discrete microphone inputs, as opposed to being provided some sort of pre-mixed feed. As such, why is Apple "trying way too hard to be fancy here" if you can just use one of those mics? Or is the alternative that they do the "beamforming" in hardware regardless of the OS?
> if you can just use one of those mics?
They're extremely omnidirectional and very sensitive. With a single mic with no beamforming you get basically all of the sounds from every part of the room, including and especially horribly loud sounds from (eg.) the keyboard and mouse.
Apple selected their microphones based on the constraints their system had (beam formed array) rather than the "usual" laptop microphone which is physically not very sensitive and highly directional towards the front of the laptop, and in turn, those microphones are not particularly useful without beam forming.
Other laptops with beamformed arrays simply don't expose the raw mics to userland, by doing the beamforming in firmware, but this of course comes with its own set of issues.
> Other laptops with beamformed arrays simply don't expose the raw mics to userland
Not always true, back in the Windows XP days (!!!) some laptops would expose the array to software and let the user configure where the mics record from.
It is unfortunate that user control has been polished out of modern systems in exchange for "it just kind of works".
Avoiding extra coprocessor and/or avoiding patent dispute like they did with speakers (which differ from a H-K patent by not having a discrete chip implementing it)
> This is an attempt at a beamformer armed only with first year undergrad level engineering maths and some vague idea of the principles gleaned from various webpages and PDFs
Not certain if OP is saying they are currently an undergrad, but impressive if so
It would be great if this was implemented in a way that also other manufacturers can easily start building mic arrays such that it would make them immediately useful.
I would be surprised if Apple didn't have patents on their mic array, meaning that another manufacturer would ideally prefer if their setup is different and incompatible to reduce the chance of accidental patent infringement.
I'd search to see, but reading patents is an info-hazard which increases your chance of infringing, so I've quit reading them entirely.
Maybe they're doing something new, but beamforming microphone arrays can be found in just about any brand of laptop if you go high end enough.
I do think most such devices will present themselves as less capable than they actually are (I.E. just a stereo input) for maximum OS compatibility, but the technique isn't Apple exclusive as far as I know.
> beamforming microphone arrays can be found in just about any brand of laptop if you go high end enough.
Are you sure? I’ve never heard a laptop microphone better than the MacBook. Maybe they do beamform and there’s other issues, but
Maybe they can still install the array, and we can simply "apt-get install illegal-package".
But all joking aside, there is a tremendous amount of literature on the mathematics of beamforming. I'd be surprised if any of it is patented in a way that isn't circumventable.
There is a customer who has deployed beamforming microphones for decades. They do however have a somewhat different goal and medium.
Yes, I'm sure they have some patents because that's what big companies do/have to do. But the basic idea has been around for a long time, not just in audio but also in microwave space/domain. So I'm sure there's plenty of prior art.
ok noob here - what can i use this thing for? a better desktop-only voice app?
is there a reason apple hasn't exposed a higher level api for this given the hardware (mic array) looks like it's already sufficient in macs?
This is how Apple addressed audio hardware and do something similar for speakers. Instead of trying to make speakers that have the desired frequency response or microphones that produce the desired signal, they let the analog hardware do whatever it does.
Then in software they use digital signal processing. For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response, and for the microphones they do this work to extract the desired signal.
If Linux addressed the speakers as is, you would get unpleasant sound, and if it read the microphones as is, it would get a lot of noise. That is why Asahi had to add digital signal processing to the audio input and output, to get the "correct" audio.
It does mean the processing is specific to the analogue audio hardware in each of the different Mac models.
The processing could be done in additional hardware, but why bother when you have a very good CPU that can do the work.
> For speakers they modify what gets sent to the hardware so that the actual output then does match the frequency response
As I understand, this is not a magic pill: it probably won't help to pull out frequencies which are suppressed by 30-40 dB and I assume that if the frequency response graph is too wavy (lot of narrow peaks and dips), it won't help either.
Also, you need to have calibration files to use this method, right?
Yes you need calibration files for supported models. You can see the details and explanation at the asahi audio repository. They also criticize the MacOS curves, and point out how some Windows vendors are doing the same DSP approach.
By the way I now realized that simply adding an equalizer before the amp might be not enough; speakers typically produce different sound in different directions, so for a perfect sound you need to somehow track location of the head and adjust filter curves.
Interesting, does that means Mac speakers may be great for certain sounds, but not others.
I mean, Apple uses high quality speakers to begin with, as far as laptops go. I'm sure they're not making 40 dB corrections, that would be ginormous.
Yes, I would be very surprised if they weren't using specific calibrations for each model. That's pretty basic.
Apple did it as a software function so it's not in hardware, hence this implementation for people wanting to run (presumably) Asahi Linux.
your question was non specific so guessing a bit at what you're asking, because some of it is already answered in the docs... but conceptually it's similar how gps triangulation works, but in the other direction, (information flows from the source point, speaker in this case, to the mic array) and with audio waves instead of rf waves. Each mic will have a slightly different view of the audio coming in, and using the timing between them, you can use the wave form that one mic records to figure out what's to early or too late to be audio from directly in front of the laptop. And then delete that audio, leaving just audio from the speaker directly in front of the laptop.
eg
A ------ MIC1 --- B --- MIC2 ------ C
any sound coming from A, will be picked up by MIC1 well before MIC2, same for sounds coming from C. If you delete that audio from the income waveform, you have beam forming. And thus much better audio noise filtering.
And as it says in the link, Apple decided to implement this is software, not hardware, so you'd need to reimplement it if you're not using macos.
It's a component of Asahi Linux. It's integrated and enabled by default if you have the right laptop.
You should be able to send data and record it in a way that measures local geometry like hands since there's a microphone array like this, interesting.
It’s funny how the author has the chutzpah to simultaneously insult Apple and while also failing to replicate what they have done.
You misread. It's more like grudging admiration that Apple took the proprietary software beamforming route. It's a remark to its technical function but acknowledges that because of its closed implementation, the microphones just cannot be used outside of the macOS ecosystem without additional effort (like in this repo).
Which is, as I'm sure you agree, is unfortunate and at least deserving of some (minor) reprisal.
I can't speak for this implementation, but on MacOS, the beamforming is amazing. When used in a noise office or cafe environment it eliminates background noise to an extent I can always tell if a colleague is using it or their worse headphone mic.
I was sitting at a Starbucks next to a VERY noisy street on a google meet call on an M1 Air with usb-c AirPods (the cheap 19$ one) and I asked the person on the other end if they can hear me at all. To my surprise they couldn’t hear any noise just my voice. No idea which part in the whole setup achieved this but I feel like stuff like AI and all have some applications that can blow you away. Not putting the damn thing in everything!
That could definitely be Google Meet. I think it does some pretty fancy AI background noise reduction.
For sure. The Apple hardware is going to make your voice sound better/richer/clearer to begin with, and then Meet's AI is great at removing background noises entirely.
In comparison, if you're on Meet with a crappy mic, it will still remove background noise, but your voice will still sound crappy. I.e. like a crappy mic in a quiet room.
Due to some unfortunate circumstances I had a customer call on Google Meet once while walking across Paris. I ware barely understandable while holding the earphones' mic in front of my mouth...
Good hardware definitely beats software trying to make something out of nothing: can't make directionality out of 1 mic, so Google Meet couldn't filter out background noise in that situation. Though it didn't help that these USB-C DACs seem to all be terrible (I tried several with the best findable reviews) compared to any old headphone jack where the device's internal DAC just worked
In that situation you’re better off using Meet audio only & holding the phone as if doing a video call - the background noise cancellation modern phones do is very impressive but it only really works if you use the phone as a phone & hold it up to your ear / mouth.
Yeah, maybe I should have forfeited being on video for the sake of audio, didn't think it would be this bad. I do attribute at least half the problem to the removal of headphone jacks though, I don't remember this being that big a deal with the regular DAC in any old phone
Yeah, earbud noise cancellation may as well be non-existent unfortunately.
You’d think a phone could do cancellation between the bud microphone & it’s own microphones; maybe once the audio data from the bud has been pushed through bluetooth audio compression there’s no longer enough information to do that effectively?
Correction here: it wasn’t apples AirPods, it was bose quit comfort over the ear iirc. That’s why I could hear the other person. But I think they could hear me cause of maybe both meet and good mic array.
I'm really enjoying this trend of minimal dependencies, but I'm not taking off my tinfoil hat yet.
This is one of the cooler features of Apple Vision Pro, it does such good beamforming for the wearer's mouth that someone could be screaming next to you or blasting music, and other parties on Zoom or FaceTime will not hear them.
I wonder if there's a way to do this in reverse for people who use the speakerphone or play a video in a restaurant.
Is this akin to a phased array RF antenna (like the Starlink dish) but for audio?
Yes, except the output is something that has to sound "subjectively good" after all the DSP, vs rf beamforming where you have a very easy metric (dropped packets) that you can optimize the beamforming direction with.
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
Might be fancy, but it does make for surpisingly good audio from a laptops.
Indeed. I can't help but think that anyone thinking Apple is trying too hard to be fancy on something like "audio quality from microphone in a laptop" doesn't quite grasp what Apple's about.
There are many advantages to vertical integration as regards end-user-experience.
Honestly, with speakers it was mainly a patent avoidance thing (patent on essentially the same thing but done with dedicated hardware, doing it with software on "application processor" bypassed the patent claims)
A lot of similar stuff is done in firmware on x86 laptops, to the point that both AMD and Intel now share considerable portion of the stack, with both using Xtensa cores for DSP, with Sound Open Firmware as SDK. When I use built-in microphone array on my laptop, it's parsed through the DSPs "transparently" to end user.
But technically you can load your own firmware there.
Usually you can't load your own SoF firmware, on most hardware it has to be signed by Intel, with exceptions like Chromebooks, where you have to sign it with a "community" key that is publicly available. There was talk of a way for device owners to add keys, but that isn't implemented yet.
"Time Domain Fixed Beamformer (TDFB)" -- https://thesofproject.github.io/latest/algos/tdfb/time_domai... might be relevant here.
If it was just patent avoidance why aren’t there any non-apple laptops either their sound quality? Both the microphones and the speakers are some of the best audio I’ve ever encountered.
Aren't there? I haven't had any trouble with background noise in calls from my ThinkPad, which also does some microphone array trickery as far as I can tell. Unfortunately the drivers for Linux are nowhere near as good so the extra processing the Intel driver does isn't useful for my day to day experience, but I've never had any quality issues.
Apple does have some excellent audio engineers for the speakers, although these days the difference isn't as stark as it was five or ten years ago.
Of course you need to get a good Windows laptop to get any such quality, and many people and companies seem to only bother spending money on premium laptops if they're made by Apple.
Is it? I mean, compared to some laptops where I explicitly was not interested in paying extra for audio, sure. Especially with them being older than "standard" presence of audio coprocessor on board.
Compared to the two new-ish AMD laptops? For the rare use case that warrants using built in speakers and mic, I see no real difference. Maybe latest macs are better, but... Usually the only use of built in speakers and mic are as last chance backup, or watching movies in bad conditions. Otherwise it's always a proper headset or standalone speakers
Yes, it is. Please name a windows laptop with great speakers and mics.
It’s night and day.
Mind you, I haven't used Macs since last intel ones, but my current workhorse of Zephyrus 14 2023 has at least comparable mics to last intel 15" and 16", and better speakers. T14g5 AMD I have from one contract has slightly worse speakers than that but comparable to the Macbooks I used (if not slightly better, the Zephyrus just has a whole grade higher amplifier setup with 4 speakers). And doesn't vary sound based on where it's placed :V
I haven't bought for built-in sound quality in the past (or ever, it's backup's backup after all), but I do remember lots of laptops offered with Harman-Kardon sound system, including hardware implementation of the dynamic compensation system in M-series Macbooks. Except also usually with way beefier speakers. Microphone arrays came in later arguably, but that is more correlated with availability of audio coprocessors - T470 had not great twin mic (mac was better there), but new ones easily handle it beyond my needs.
Way above "last choice backup solution" that "built in speakers and mic" are used for by me.
Is it the same story with the Apple touchpad? Is the fancy palm rejection implemented completely in software?
No idea - audio just happens to be something I once looked into because claims about superiority of apple software solution on M-chip macbooks to the speaker quality made me look more in depth.
It's not just good, I found it to be way better than a standalone shotgun mic connected via USB. I researched this for WFH and found a lot of people saying you were going to spend hundreds to replicate the quality in a more "professional" mic setup. Super impressive.
Does it record a fixed point, or does it do something fancy like using the camera to attempt tracking the user's movement? Just curious, and I don't have access to a modern Mac. The article seems to imply that it's focusing on a fixed point.
No idea, I believe it's just a fixed point. Personally I use it while sitting in front of my Mac about 1-2 feet from my face. I've done tests, it's better than every other form of audio input I have available, including standalone shotgun mic, Airpods Max, Airpods Pro V2, etc.
As someone looking to replicate it from a pro mic setup, what do people recommend?
I've been trying to record audio in my noisy server room but only deepfilternet is able to deal with the fan noise in the background.
Biggest thing is you need a nice mic that's very close to your face, like you might see on a twitch stream. Good noise isolation via a directional mic off-camera is quite difficult/expensive apparently.
I bought a røde wireless mic which definitely helped. It gives deepfilternet enough good signal:noise to work reasonably well, but I was hoping there was an even better solution.
A lot of dialogue in movies is dubbed for this very reason, it’s very hard to not pick up noise. What you can deal with is how close the mic is to you (which is why news reporters rely on hand held mics, not just the boom mike) and the pattern of the mic: a cardoid, hypercardoid or shotgun mic facing the opposite direction of the noise source would pick up less than an omnidirectional one (which is why the mics you see in studios are not used on a loud stage—not only are they fragile and expensive, they also tend to be omnidirectional).
There's definitely an ADR (dubbing) component to movies, but it's not very much these days. (In comparison to decades ago.)
Instead, sound engineers spend weeks cleaning up spoken dialog by hand in spectrogram editors. It's honestly astounding the magic they can do, but it's also labor-intensive and therefore expensive. They're literally individually EQ-ing every vowel, splicing consonants from one word to another... it's wild.
I think the title should say "for asahi linux", else it's misleading.
of course Apple has this implemented.
Incomplete or 'not 100% obvious' is not really 'misleading'. Titles don't say everything about a story or we wouldn't need stories.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
Does it not also run on macOS? It could be useful if you want to tweak usage of the mic array yourself, rather than relying on proprietary magic that gives you the output it deems best
Well, if it's available on crates.io, I guess nobody will think that it's from Apple. Also, it could conceivably be used in other software besides Asahi too...
> Much like with the speakers, Apple are trying way too hard to be fancy here, and implement an adaptive beamformer in userspace to try and isolate the desired signal from background noise.
That’s a rather unfair characterization. I’ve found the array to work very well in practice. It’s hardly trying to hard.
They are atrocious, IME. I continually get near muted. I.e., if I record the signal, my voice is there, but extremely faint. Unusable for VC audio, and I've moved completely to a headset mic because of it.
I have no idea what any of this means
I find this kind of thing a good case for LLMs as they can dumb down the technical jargon:
From Gemini:
```
Imagine you're trying to record someone talking in a noisy room using your MacBook's built-in microphones. This software acts like a super-smart filter:
* It knows where the microphones are: Apple laptops have multiple tiny microphones.
* It listens to all of them at once: It takes the input from all the microphones.
* It figures out where the person talking is: It analyzes the sound to find the direction of the voice.
* It focuses on that voice: It boosts the sound coming from that direction.
* It quiets down the other noises: It reduces the sound from other directions, like background chatter.
So, instead of getting a muddy recording with lots of noise, you get a clearer recording of the person you want to hear. Basically, it makes your MacBook's microphones sound much better in noisy environments. And it's designed to work within audio programs that use a specific plugin format called LV2.
```