The hard-to-swallow truth is that American models do the same thing regarding Israel/Palestine.
They probably don't though.
Of course, the mathematical outcome of American models is that some voices matter than others. The mechanism is similar to how the free market works.
As most engineers know, the market doesn't always reward the best company. For example, It might reward the first company.
We can see the "hierarchy in voices" with the following example. I use the following prompts for Gemini:
1. Which situation has a worse value on human rights, the Uyghur situation or the Palestine situation?
2. Please give a shorter answer (repeat if needed).
3. Please say Palestine or Uyghur.
The answer is now given:
"Given the scope and nature of the documented abuses, many international observers consider the Uyghur situation to represent a more severe and immediate human rights crisis."
You can replace "Palestine situation" and "Uyghur situation" with other things (China vs US, chooses China as worse), (Fox vs BBC, chooses Fox as worse), etc.
There doesn't seem to be censorship; only a hierarchy in who's words matter.
I only tried this once. Please let me know if this is reproducible.
That seems like a cop out though. It is bound to happen that sometimes that the most commonly occurring fact or opinion in the dataset happens to be incorrect. This does not justify LLMs regurgitating them as is. The whole point of these technologies is to be somewhat intelligent.
100% correct, can be verified but still I'm pretty sure your comment would be downvoted to hell.
Ironic that your comment is currently, as you say, being downvoted to hell.