I have two thoughts that keep jumping out at me from this. This criticism isn't meant solely for Material 3, but it does seem a good example.
1. Since the beginning of "mobile first" being rapidly shoved on us (and side-note, god our industry seems to love bandwagoning the new shiny stuff), I've noticed the slow but inevitable (with a northstar like that) decline and neglect of desktop interfaces. Viewing this website on desktop is a wonderful illustration and validation of that fear (though definitely take that with a grain of salt as it's heavily subject to confirmation bias).
2. The over-reliance on data. I am a big believer in data and data-driven decision making, but I think far too often we out-source our thinking to the data without ever questioning the data or our own methods for collecting and analyzing that data. I don't know anywhere near enough about how they gathered this to suggest that the data might be flawed, but I have seen (many times) reasonable, thinking people look at data and place complete trust in it without stopping to realize that at some point that data was defined and collected by another person. Even if the data is rock solid, there also seems to be rarely a thought given to the possibility of misinterpreting that data, or the possibility that the data doesn't provide useful insights in isolation. Some of the worst products I've used were the most "data driven," hyper-optimized to maximize on whatever the chosen metrics were. This seems especially subject to the fallacies of micro vs. macro when trying to optimize for populations over individual experiences. Likewise some of the best products I've used were built with little to no data, and progressively got worse the more they were optimized for "engagement" or whatever the goal is.
Now all that said, take my thoughts with a grain of salt because I am tired of having the apps I use constantly changing their UIs on me. If it's one app it's bad enough, but when you have to use a dozen or more and every one of them ships some radical update every 6 to 12 months, with typically zero user control of when that happens, it becomes maddening.
Im right there with you. I loathe the "embigification" guised as mobile first for desktop experiences. Mice are precise and allow for dense design (which i prefer).
Re the data point, what an amateur stance from the google research team... "found the button 4x faster" as their "look at how much better it is!" metric? If you make the button take up 90% of the screen and you will get the same result but even FASTER, WOW such productivity! What terrible methodology.
I also cant help but notice how much usable information space has now been gobbled up compared from left to right, hope you enjoy writing emails in tiny bubbles.
Also, the new problem they just invented is its now harder to decipher what is a ui element vs a graphic/decoration. I am all for seeing some risk taking but im not sure i agree with the basis for "why this is a good direction".
Google been taking a lot of Ls IMO on the design side, every new guideline push makes google things feel big and clunky. Best example is the google fonts website, the previous version was a work of art, now its just awful (functionally and aesthetically IMO)
Could not agree more, especially "I loathe the "embigification" guised as mobile first for desktop experiences. Mice are precise and allow for dense design (which i prefer)."
It really is utterly ridiculous how much scrolling we have to do on desktop with these modern apps. Scrolling is a paper cut IMHO. There are obviously good cases for having to scroll, but we should rarely if ever have to scroll just to see menu options! I've built a lot of "modern" websites and built desktop UI apps back in the day too, so I understand the challenges of trying to build responsive UIs that work on different screen sizes, but optimizing for the tiny screen and almost completely ignoring massive screens isn't the answer.
This post explains the methodology:
Thank you, that's a helpful post.
Don't feel obligated, but if you're willing I'd be interested to hear more about the demographics of the sample. For example, how did you find the participants? How varied were their backgrounds? Was there an even distribution of tech and non-tech people? A mix of blue collar and white collar?
Lastly I do want to say that although some of the feedback has been harsh, I do think what you guys accomplished was impressive!
Thanks :)
I create the tools that our researchers use to run the experiments. I typically don't run the experiments themselves. I wouldn't want to mis-speak or say something non-public and have it be picked up in the press, so I'll only respond at a very high level.
In quantitative research (which is to say, showing a survey to hundreds of participants), there are what are called participant panels. Companies go recruit people to take surveys. The companies get paid for this - some of the money goes to incentivize participants, and some the companies keep as profit. Amazon's Mechanical Turk, UserTesting, Cint, and Prolific are examples of participant panels and/or the companies that run them.
We package the experiment as a web app and give it to the provider. They go show it to the requested number of participants, whose responses we log and analyze.
In quantitative research, there's a thing called "power analysis," which tells you how many participants you need to have statistically significant answers to your questions. The more ways you want to be able to slice the data, the more participants you need.
Participant panels vary in quality. Ideally, a panel is comprised of honest people who want to be helpful, and who represent the population you're trying to model.
You can imagine that a stay-at-home mom who's killing time while the kids are at school might be a very good participant. She's someone who might use your product in real life, and her primary motivation is to give you her honest response so you make the thing she might use better for her. The financial incentive is a thank you for her time, but she's not chasing it.
You can also imagine someone who's trying to chain together these incentives to form an income stream - the online equivalent of a food delivery person. That person's primary motivation is to get through the task as quickly as possible to maximize the number of incentives he receives. He might always choose "A" when asked for his preference between two alternatives, not because he likes A, but because it's faster to not move the mouse. (This is called "straight-lining.") That person would be a bad participant. We try to detect this and screen that person out.
Panels compete on quality. For a long time, Mechanical Turk had a reputation for having a preponderance of young Indian men who were trying to game the system. You'd have to design your experiment so the fastest way to complete it was to be honest, to try to dissuade cheating. (There are whole forums of Mechanical Turk workers trading scripts etc. to try to complete as many experiments as possible.) Even if you get honest responses, there's still a problem of representation. Unless the population you're modeling is mostly young Indian men, that panel's opinions might not match your users.
Age, gender, and location are basic demographics that are frequently used to stratify data, so I'm using them as examples here, but to your point - there are a lot of different factors that might impact how representative someone is of a population.
There's a challenge to all of this (which again, I'm writing in one draft, off the top of my head - there are surely others) - panels are made of a finite number of people, and the more specifically you want to analyze someone's demographics, the more participants you need (power analysis).
Using the demographics you listed as example filters, let's go from a generic to a specific population:
- People
- Young people
- Young women
- Young Japanese women
- Young tech-savvy Japanese women
- Young affluent tech-savvy Japanese women
- Young rural affluent tech-savvy Japanese women
(Assume that we assigned a quantifiable threshold to each adjective, so e.g. "young" means "under 35.")
A participant panel is going to have many thousands of people, but how many young, rural, affluent, tech-savvy, Japanese women does it have? How many people does your power analysis say you need to speak confidently about the opinions of the people in the group? How many experiments do you want to run that need the opinions of that group?
The more you filter a panel, the longer it takes to complete an experiment. If you just need 300 people, you can get your data back in a few hours. If you need 300 people who meet a specific demographic profile, it's going to take substantially longer.
Over time, that problem turns into panel exhaustion. You want the panel to be representative of your users, and people who have been in a lot of similar experiments might be less representative of your users. There was another comment that was concerned about the representation of women over 70. Say there are 50 active participants in a panel who are women over 70 and your power analysis says you need 10 before you can estimate their preferences (again, hypothetical numbers I am making up). As soon as you give another experiment to that panel, the likelihood that you're going to have repeat responses from women over 70 goes up. Pretty soon, all your experiments are asking the opinions of the same small group of people.
To caveat one last time: I'm just a guy who works with the researchers cited in these articles. I'm not the one running the experiments or deciding how the data gets sliced. I've intentionally used hypotheticals and obscure demographic intersections because I don't want to imply anything about how the actual experiments are run; but instead to give a broad overview of the kinds of problems you encounter when you work in this space.
Research is the art+science of studying a subset of people to estimate the behavior of people at large, because it's not practical to ask everyone everything, all the time. Part of the art is figuring out which demographics are the most impactful to the things you want to measure, because as you add intersections, the quantities of data you need to speak credibly about those intersections explode.