crazygringo 10 days ago

I think the difficulties are, how close do spaces need to be to be considered connected? Rivers aren't only perfectly vertical. And to what degree do they need to maintain the same angle across consecutive lines? How much can they wiggle? And a river is still visible across 10 lines even if one line in the middle doesn't have the space, so it needs to be able to handle breaks in contiguity.

There's no problem with paragraph-level optimizations inherently. Reducing raggedness is paragraph-level and that's comparatively easy. The problem is the metric in the first place.

1
taeric 10 days ago

I wouldn't try and consider spaces individually, I don't think? Rather, I'd consider the amount of space being considered. We aren't talking about fixed width typesetting, after all. To that end, you will have more space after punctuation and such. Rather than try to enumerate the different options, though, you almost certainly have some model of how much "space" is in a section. Try different model weights for how much to penalize different amounts of connected space and see how well different models optimize.

Or, maybe not? I'll note that the vast majority of "rivers" I've seen in texts coincide with punctuation quite heavily. Even the example in this article has 5/8 lines using a comma to show the river. With the other lines having what seems to be obvious stretched space between words to use more of the line? Maybe enumerating the different reasons for space would be enough?

Granted, this also calls out how dependent you almost certainly are on the font being used, as well?