lisper 5 days ago

I've been running my own spam filter for many years now based on this super-simple heuristic: My filter looks at my outgoing mail, and any mail received from an address I've sent mail to, or with a subject that has appeared in my outgoing mail (possibly with a "re:" prefix) is marked as non-spam. Everything else goes in spam, and any spam message from an address I've never received mail from before is marked as unread. I get hundreds of spams per day, but only about a dozen from new addresses. It takes me about ten seconds to scan them for non-spam cold calls, which are extremely rare. The other source of false positives is things like subscription confirmations, but because I know to expect those, they are always at the top of the spam folder.

I put this initial system in place expecting to have to augment it later with a more traditional content-based filter, but this simple heuristic works so well I've never felt the need to implement that additional step.

2
EGreg 5 days ago

Someone posted on X advice that really helped me clean up my inbox

Add a filter looking for the word "Unsubscribe" and automatically put them in "Promotional" category or something similar. Also apply the filter to existing emails, and let it run for a minute.

Try it now! And comment if it reduced your inbox to like 2% of what it was :)

ndriscoll 5 days ago

I've commented here before that it is obvious to me that gmail makes no effort to combat spam anymore given that unsubscribe links are legally required and generally present for spam in the US and are an obvious heuristic that aren't used. I would expect basically any trained filter to pick up on it, so my assumption is that they actually intentionally have rules to allow spam.

I get emails that literally say "This is an email advertisement". These are presumably being blasted out to tons of mailboxes. How does a model not notice this?

im3w1l 5 days ago

An advertisement is only spam if it's unsolicited. If you forget to uncheck the box "yes send me promotional offers and deals" when signing up it's not spam according to that definition.

ndriscoll 5 days ago

If you're making people opt-out and setting them up to "forget" to do so, then you are spamming them, but even under that definition, I'd estimate that over 99% of what I'm calling spam still qualifies. A large amount of it is from businesses I've never interacted with, so obviously unsolicited.

EGreg 5 days ago

Isnt it against GDPR and they could get hit with large fines in Europe for every recipient?

lisper 5 days ago

A mail service run by an advertising company fails to filter out advertising emails? I'm shocked. Shocked!

lisper 5 days ago

I tried that a long time ago and the problem with it was that it produced a lot of false positives for me because I subscribe to a lot of Google Groups.

EGreg 5 days ago

Can you make a negative condition also, X but not Y?

lisper 5 days ago

Of course. But the problem is that the more complicated you make your filtering logic, the harder it becomes to maintain. I was constantly discovering new exceptions to my ever-more-complicated rules, which is why I eventually gave up on that whole approach.

kees99 5 days ago

I'm using something very similar, except incoming messages from never-seen-before senders are greylisted instead:

https://en.wikipedia.org/wiki/Greylisting_(email)

95% of spammers never retry.

lisper 5 days ago

The problem with greylisting is that it delays subscription confirmation emails when you sign up for a new service. I found that to be more trouble than it was worth. YMMV.

kees99 5 days ago

For a greylisting that sends 451 before DATA, that is indeed a known problem.

My server sends 451 after DATA, and keeps a copy of greylisted message, as marked-as-read entry in separate folder. Those are deleted after few hours, or moved out after a successful delivery retry.

lisper 5 days ago

That's a good idea. I was using an off-the-shelf greylister that didn't work that way, but I might implement that strategy now that I'm doing everything myself.