Janky a11y on Linux is not the fault of the various a11y developers, who genuinely care about the needs of blind folks, and IMO do a great job with very limited resources. The problems are baked into GNU-Linux in various ways. One way to look at this is that Microsoft is like the military, with a top commander issuing orders, which are followed by everyone, while Linux is more like a slime mold, with no central nervous system. There is good and bad with both approaches, and unfortunately, a11y support will generally be better in a military-style run organization, assuming that the top leaders have made a11y a priority.
Bill Gates mandated that accessibility was a top priority, and attended accessibility meetings personally. That is why Windows is as accessible as it is. Ubuntu is an open source project, and it is simply not possible to force every developer to get onboard. IIUC, Steve Jobs did not care about a11y, which is why Apple had non-accessible products for so long, and IIUC, Tim Cook does care, and was able to force Apple to embrace a11y. With Linux, we have various leaders who do care, and some who don't. The result is that a11y on Linux is janky and probably always will be.
There are many examples I can point to. For example, the main developer of PulseAudio cares about music, but not as much about screen reader users, which is why PulseAudio has broken a11y so many times. Some devs in the low level GTK widgets refuse to make pixmaps capable of having a text description, which is why the icons remain inaccessible in many desktop environments in Linux. Gnome does better than any other Linux desktop environment, in my experience, but Gnome can't make non-accessible widgets magically accessible. While in most cases, the goals of free software advocates are in line with a11y advocates, these groups tend to differ on support for commercial closed-source software, such as text-to-speech engines, which is one reason we have limited options in Linux. I use the Voxin voice, which is the same as Eloquence, and if I were not a programmer capable of hacking the speech stack, I doubt I could consistently use it.
A common reply to a11y advocates in the open-source community is that if you don't like the way it is, fix it yourself. However, this is simply not realistic. For example, I fixed the pixmap GTK class to add an accessible description, and attempted to merge this fix into the Vinux version of Linux. I had to fork not just GTK, but all of Gnome to make this work. I don't have the time to maintain a fork of the entire desktop just to make pixmaps talk.
Another problem I've faced personally in the open-source community is dealing with folks' feelings. For example, I have an entire alternate speech stack that can work with Orca, but this upset some of the speech-dispatcher devs who do very important a11y work. I tried working with them, and to their credit, they did incorporate one of the most important changes I have in my stack: they moved the code to talk to the sound system into speech-dispatcher proper. However, I keep most of my a11y code to myself simply not to upset anyone. Maybe if I understood people's feelings better, I could contribute more effectively, but from my point of view, I poke a random weak spot of the slime mold, and the whole thing freaks out.
So, I hope that long winded explanation helps you understand why Linux a11y is as janky as it is.
Best regards,
Bill