Item 43626471

taeric • 10 days ago

You aren't wrong; but I stand by my claim. For one, plenty of things are actually difficult optimization problems that people don't give any passing thought to.

But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero. Most websites are simply not typesetting the volumes of text that would be needed for this to be a concern.

Happy to be shown I'm flat wrong on that. What sites are you envisioning this will take a lot of time for?

pcwalton • 10 days ago

> But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero.

I've measured this, and no, it's not. What you're missing is the complexities of typesetting Unicode and OpenType, where GSUB/GPOS tables, bidi, ruby text, etc. combine to make typesetting quite complex and expensive. HarfBuzz is 290,000 lines of code for a reason. Typesetting Latin-only text in Times New Roman is quick, sure, but that doesn't cut it nowadays.

1 reply

taeric • 10 days ago

Apologies, the additional cycles to do justified text is effectively zero compared to the rest of the stack for most sites. Clearly, it is work, so not actually zero. And, yes, proper text handling is huge.

I would wager you can find scenarios where it is a large number. My question is if there are sites people use?

Seriously taken, these would all be reasons not to do many of the things Unicode does. And yet here we are.

That all said, if you have measurements, please share. Happy to be proven wrong.

1 reply

rhdunn • 9 days ago

Some complexities:

- handling shy hyphens/hyphenation when splitting long words -- working out where to hyphenate so it is readable, then how that affects the available space takes time to compute to ensure that justified text doesn't result in large blocks of whitespace esp. for long words;

- handing text effects like making the first letter large, or the first word(s) larger, like is done in various books;

- reflow due to any changes resulting from text rendering/formatting (e.g. if applying kerning or hyphenation results in/does not result in text wrapping);

- impact of things like kerning and digraphs (e.g. ffi) on text width -- including determining if a word can be split in the middle of one of these or not;

- combining characters, emoji (with zero-width non-joiners), flag Unicode character pairs (including determining valid pairs to determine if/where to split on), etc.;

- mixed direction text (left to right and right to left) handling and positioning;

- the mentioned Ruby text (e.g. https://www.w3.org/International/articles/ruby/styling.en.ht...) -- dealing with both the main text wrapping and the above/below text wrapping, both of which could happen;

- for Chinese/Japanese/Korean ensuring that characters within a word don't have extra space, as those languages don't use spacing to delimit words;

- other things affecting line height such as sub/super script text (mathematical equations, chemical symbols, references, etc.).

1 reply

taeric • 9 days ago

This isn't really disagreeing with me? I can even add to it, for a fun example, knowing where to hyphenate a word can depend on its use in a sentence. (Strictly, I'm probably wording that poorly?)

Like, I am largely aware that it is a hard problem. So is just rendering text, at large. The added difficulty for justifying text is still not something I expect to impact the vast majority of sites. If you are willing to break your content into multiple pages, I hazard it isn't a significant chunk of time for most content.

Are there edge cases? Absolutely! Most of these are isolated in impact, though. And I do think a focus on whole content optimization is clouding a lot of people's view here. You are not doing yourself any favor by optimizing a full book every time a chapter changes.

There is also the idea that you have to find the absolute best answer for justifying text. Why? One that is low enough on a penalty score is just fine. Akin to the difficulty in finding all tilings of a board, versus just finding a single tiling for a board. Or a single solution to the N-Queens question, versus all solutions. If you just want a single solution, you don't need the raw compute necessary to get them all.

binaryturtle • 10 days ago

With the current state of Websites and how much resources they waste any text wrapping is probably not an issue at all. :)

I hardly can open any website w/o some anti-bot check burning my CPU to the ground for 1/2 minute or something (if it doesn't manage to entirely crash my Firefox in the process like cloudflare). I rather would wait for 0.2s text wrapping than that, that's for sure. :)

cobertos • 10 days ago

Any page with dynamic text. If the calculation takes a moderate amount of time, that will accumulate if the page layout reflows a lot.

1 reply

taeric • 10 days ago

Only if the entire text has to be optimized as a whole? Which, most dynamic text sites do not have to do this. Most dynamic sites will be a series of individual "card" like things that could be justified internally, but are not justified with regard to anything else on the page.

contact9879 • 10 days ago

Quick example would be https://standardebooks.org/ebooks/mark-twain/the-innocents-a...

Try zooming in and out with text-wrap: pretty vs text-wrap: wrap

1 reply

taeric • 10 days ago

I... uh, wouldn't consider a text dump of a full novel getting completely typeset as a good example to consider when talking about sites?

1 reply

contact9879 • 10 days ago

sure, but html isn't only used in a browser context. I have a severely under-powered ereader that I use to read epubs (html). It already takes ten seconds to paginate on first open and font size changes. I can't imagine how long it would take to non-naively wrap lines

1 reply

taeric • 10 days ago

I don't know why you'd expect an ereader to do a full text optimization of a book, though? Pick the starting line and layout to the next screen's breaking point. If needed, consider if you left an orphan for the next page and try to adjust that into the current screen.

Are there ereaders that have to typeset the entire text of what you are reading? What is the advantage of making the task harder?

1 reply

tjader • 10 days ago

KOReader typesets the whole book at once. It is needed in order to get page counts right, for example.

1 reply

taeric • 10 days ago

Even if that is the case, it has to redo this how often? If the page counts are to be stable, I'm assuming so are page numbers? At which point we are back to this not being something I would expect to slow things down appreciably on the vast majority of actual uses.

Still, I should acknowledge you provided an example. It surprises me.

2 replies

tjader • 9 days ago

It needs to rerender everything whenever you change any setting that affects typesetting. This used to be quite annoying when trying out fonts or find the best value for some setting, but recently they implemented a better way so that it first renders the current fragment (usually a chapter), releases the UI so that you can read, and renders the rest in the background. Page counts and some other stuff are broken in this meantime.

1 reply

taeric • 9 days ago

That better way makes a ton of sense, and is what I would expect to be default. Getting page numbers is a flex that just doesn't feel necessary. As I said, I would expect even faster renders if it did present content first. Would edge case into unstable page counts, but I struggle to care on that? Make it an optional setting, and be done with it. Especially as I prefer resizing to keep the current start of page I'm looking at. Something obviously not guaranteed in a resize.

throwanem • 10 days ago

> it has to redo this how often?

As often as the font size changes.

1 reply

taeric • 10 days ago

So, never for most reads? :)

Even the few times I do change text size on my e-readers are largely mistakes. Having gestures to resize is frustrating, in the extreme.

1 reply

throwanem • 8 days ago

Eh, I don't really have a dog in the fight. When I'm out and about I just read on my phone, which my oral surgeon says is too small for my eyes; I haven't asked my ophthalmologist for advice on my dental implants, but I have been reading as much off screens as off paper since the late 1980s, so any kind of sense I might once have had for the aesthetics of typography must surely have been strangled in the crib.

When I'm home, I read books.

NoMoreNicksLeft • 10 days ago

Won't this end up in Apple iBooks or whatever it's called now? Most novels can be a megabyte or more of text, pretty much all of it needing to be wrapped.

2 replies

CharlesW • 10 days ago

It seems more likely that Apple would've adapted this from the proven technology that they currently use for Apple Books and everything else, TextKit (which first appeared in OpenStep). https://developer.apple.com/videos/play/wwdc2021/10061/

2 replies

addaon • 10 days ago

> Apple Books and everything else

Can't speak to Apple Books, but at least Pages.app (and iWork in general) use a separate text engine from TextKit, focused on higher fidelity at the cost of performance -- optical kerning, etc. (Terminal.app also does not use TextKit.)

alwillis • 10 days ago

Doubtful.

OpenStep used Display Postscript and was written in Objective-C; WebKit is written in C++.

Rendering text on the web is a different animal all together.

1 reply

NoMoreNicksLeft • 9 days ago

I was under the impression that when we got new css in Safari, in the next software cycle those same features ended up in Books. It wouldn't make sense to give it a different rendering engine... but then I've never been able to find much in the way of which epub readers used which rendering engines anywhere.

taeric • 10 days ago

I mean, not wrong. But optimizing over a megabyte's worth of text is almost certainly not going to take a lot of time. Especially as there will be chapter stops. Such that we are down to, what 100k of text per chapter to layout?

Again, I won't claim it is absolutely free. It is almost certainly negligible in terms of processing power involved with any of the things we are talking about.