taeric 10 days ago

I think the main difficulty is that it is a paragraph level optimization and not a line one. Right? Otherwise, it seems like you can probably get pretty far by defining a metric that looks at connected whitespace sections between lines? With higher penalty for connected space between words that has been stretched. (That is, if you have space between some words expanded to make them pretty at the edge, those will be more visible as rivers if they are stacked?)

And, yes, there are some concerns that are done at the line level that could lead to a paragraph getting reworked. Ending with a single word, is an easy example. That is still something where you can evaluate it at the line level easily.

1
crazygringo 10 days ago

I think the difficulties are, how close do spaces need to be to be considered connected? Rivers aren't only perfectly vertical. And to what degree do they need to maintain the same angle across consecutive lines? How much can they wiggle? And a river is still visible across 10 lines even if one line in the middle doesn't have the space, so it needs to be able to handle breaks in contiguity.

There's no problem with paragraph-level optimizations inherently. Reducing raggedness is paragraph-level and that's comparatively easy. The problem is the metric in the first place.

taeric 10 days ago

I wouldn't try and consider spaces individually, I don't think? Rather, I'd consider the amount of space being considered. We aren't talking about fixed width typesetting, after all. To that end, you will have more space after punctuation and such. Rather than try to enumerate the different options, though, you almost certainly have some model of how much "space" is in a section. Try different model weights for how much to penalize different amounts of connected space and see how well different models optimize.

Or, maybe not? I'll note that the vast majority of "rivers" I've seen in texts coincide with punctuation quite heavily. Even the example in this article has 5/8 lines using a comma to show the river. With the other lines having what seems to be obvious stretched space between words to use more of the line? Maybe enumerating the different reasons for space would be enough?

Granted, this also calls out how dependent you almost certainly are on the font being used, as well?