Item 43626252 - HN

taeric • 10 days ago

I find myself laughing at "Many developers are understandably concerned about the performance of text-wrap: pretty." I just can't bring myself to believe there is a meaningfully sized group of developers that have considered the performance of text-wrapping.

10

queuebert • 10 days ago

Text wrapping is actually a difficult optimization problem. That's part of the reason LaTeX has such good text wrapping -- it can spend serious CPU cycles on the problem because it doesn't do it in real time.

taeric • 10 days ago

You aren't wrong; but I stand by my claim. For one, plenty of things are actually difficult optimization problems that people don't give any passing thought to.

But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero. Most websites are simply not typesetting the volumes of text that would be needed for this to be a concern.

Happy to be shown I'm flat wrong on that. What sites are you envisioning this will take a lot of time for?

pcwalton • 10 days ago

> But, more importantly, the amount of cycles that would be needed to text-wrap most websites is effectively zero.

I've measured this, and no, it's not. What you're missing is the complexities of typesetting Unicode and OpenType, where GSUB/GPOS tables, bidi, ruby text, etc. combine to make typesetting quite complex and expensive. HarfBuzz is 290,000 lines of code for a reason. Typesetting Latin-only text in Times New Roman is quick, sure, but that doesn't cut it nowadays.

taeric • 10 days ago

Apologies, the additional cycles to do justified text is effectively zero compared to the rest of the stack for most sites. Clearly, it is work, so not actually zero. And, yes, proper text handling is huge.

I would wager you can find scenarios where it is a large number. My question is if there are sites people use?

Seriously taken, these would all be reasons not to do many of the things Unicode does. And yet here we are.

That all said, if you have measurements, please share. Happy to be proven wrong.

rhdunn • 9 days ago

Some complexities:

- handling shy hyphens/hyphenation when splitting long words -- working out where to hyphenate so it is readable, then how that affects the available space takes time to compute to ensure that justified text doesn't result in large blocks of whitespace esp. for long words;

- handing text effects like making the first letter large, or the first word(s) larger, like is done in various books;

- reflow due to any changes resulting from text rendering/formatting (e.g. if applying kerning or hyphenation results in/does not result in text wrapping);

- impact of things like kerning and digraphs (e.g. ffi) on text width -- including determining if a word can be split in the middle of one of these or not;

- combining characters, emoji (with zero-width non-joiners), flag Unicode character pairs (including determining valid pairs to determine if/where to split on), etc.;

- mixed direction text (left to right and right to left) handling and positioning;

- the mentioned Ruby text (e.g. https://www.w3.org/International/articles/ruby/styling.en.ht...) -- dealing with both the main text wrapping and the above/below text wrapping, both of which could happen;

- for Chinese/Japanese/Korean ensuring that characters within a word don't have extra space, as those languages don't use spacing to delimit words;

- other things affecting line height such as sub/super script text (mathematical equations, chemical symbols, references, etc.).

taeric • 9 days ago

This isn't really disagreeing with me? I can even add to it, for a fun example, knowing where to hyphenate a word can depend on its use in a sentence. (Strictly, I'm probably wording that poorly?)

Like, I am largely aware that it is a hard problem. So is just rendering text, at large. The added difficulty for justifying text is still not something I expect to impact the vast majority of sites. If you are willing to break your content into multiple pages, I hazard it isn't a significant chunk of time for most content.

Are there edge cases? Absolutely! Most of these are isolated in impact, though. And I do think a focus on whole content optimization is clouding a lot of people's view here. You are not doing yourself any favor by optimizing a full book every time a chapter changes.

There is also the idea that you have to find the absolute best answer for justifying text. Why? One that is low enough on a penalty score is just fine. Akin to the difficulty in finding all tilings of a board, versus just finding a single tiling for a board. Or a single solution to the N-Queens question, versus all solutions. If you just want a single solution, you don't need the raw compute necessary to get them all.

binaryturtle • 10 days ago

With the current state of Websites and how much resources they waste any text wrapping is probably not an issue at all. :)

I hardly can open any website w/o some anti-bot check burning my CPU to the ground for 1/2 minute or something (if it doesn't manage to entirely crash my Firefox in the process like cloudflare). I rather would wait for 0.2s text wrapping than that, that's for sure. :)

cobertos • 10 days ago

Any page with dynamic text. If the calculation takes a moderate amount of time, that will accumulate if the page layout reflows a lot.

taeric • 10 days ago

Only if the entire text has to be optimized as a whole? Which, most dynamic text sites do not have to do this. Most dynamic sites will be a series of individual "card" like things that could be justified internally, but are not justified with regard to anything else on the page.

contact9879 • 10 days ago

Quick example would be https://standardebooks.org/ebooks/mark-twain/the-innocents-a...

Try zooming in and out with text-wrap: pretty vs text-wrap: wrap

taeric • 10 days ago

I... uh, wouldn't consider a text dump of a full novel getting completely typeset as a good example to consider when talking about sites?

contact9879 • 10 days ago

sure, but html isn't only used in a browser context. I have a severely under-powered ereader that I use to read epubs (html). It already takes ten seconds to paginate on first open and font size changes. I can't imagine how long it would take to non-naively wrap lines

taeric • 10 days ago

I don't know why you'd expect an ereader to do a full text optimization of a book, though? Pick the starting line and layout to the next screen's breaking point. If needed, consider if you left an orphan for the next page and try to adjust that into the current screen.

Are there ereaders that have to typeset the entire text of what you are reading? What is the advantage of making the task harder?

tjader • 10 days ago

KOReader typesets the whole book at once. It is needed in order to get page counts right, for example.

taeric • 10 days ago

Even if that is the case, it has to redo this how often? If the page counts are to be stable, I'm assuming so are page numbers? At which point we are back to this not being something I would expect to slow things down appreciably on the vast majority of actual uses.

Still, I should acknowledge you provided an example. It surprises me.

tjader • 9 days ago

It needs to rerender everything whenever you change any setting that affects typesetting. This used to be quite annoying when trying out fonts or find the best value for some setting, but recently they implemented a better way so that it first renders the current fragment (usually a chapter), releases the UI so that you can read, and renders the rest in the background. Page counts and some other stuff are broken in this meantime.

taeric • 9 days ago

That better way makes a ton of sense, and is what I would expect to be default. Getting page numbers is a flex that just doesn't feel necessary. As I said, I would expect even faster renders if it did present content first. Would edge case into unstable page counts, but I struggle to care on that? Make it an optional setting, and be done with it. Especially as I prefer resizing to keep the current start of page I'm looking at. Something obviously not guaranteed in a resize.

throwanem • 10 days ago

> it has to redo this how often?

As often as the font size changes.

taeric • 10 days ago

So, never for most reads? :)

Even the few times I do change text size on my e-readers are largely mistakes. Having gestures to resize is frustrating, in the extreme.

throwanem • 8 days ago

Eh, I don't really have a dog in the fight. When I'm out and about I just read on my phone, which my oral surgeon says is too small for my eyes; I haven't asked my ophthalmologist for advice on my dental implants, but I have been reading as much off screens as off paper since the late 1980s, so any kind of sense I might once have had for the aesthetics of typography must surely have been strangled in the crib.

When I'm home, I read books.

NoMoreNicksLeft • 10 days ago

Won't this end up in Apple iBooks or whatever it's called now? Most novels can be a megabyte or more of text, pretty much all of it needing to be wrapped.

CharlesW • 10 days ago

It seems more likely that Apple would've adapted this from the proven technology that they currently use for Apple Books and everything else, TextKit (which first appeared in OpenStep). https://developer.apple.com/videos/play/wwdc2021/10061/

addaon • 10 days ago

> Apple Books and everything else

Can't speak to Apple Books, but at least Pages.app (and iWork in general) use a separate text engine from TextKit, focused on higher fidelity at the cost of performance -- optical kerning, etc. (Terminal.app also does not use TextKit.)

alwillis • 10 days ago

Doubtful.

OpenStep used Display Postscript and was written in Objective-C; WebKit is written in C++.

Rendering text on the web is a different animal all together.

NoMoreNicksLeft • 9 days ago

I was under the impression that when we got new css in Safari, in the next software cycle those same features ended up in Books. It wouldn't make sense to give it a different rendering engine... but then I've never been able to find much in the way of which epub readers used which rendering engines anywhere.

taeric • 10 days ago

I mean, not wrong. But optimizing over a megabyte's worth of text is almost certainly not going to take a lot of time. Especially as there will be chapter stops. Such that we are down to, what 100k of text per chapter to layout?

Again, I won't claim it is absolutely free. It is almost certainly negligible in terms of processing power involved with any of the things we are talking about.

porphyra • 10 days ago

But with modern hardware, running the dynamic programming solution to this optimization problem takes a trivial amount of cycles* compared to rendering your typical React webapp.

* for most webpages. Of course you can come up with giant ebooks or other lengthy content for which this will be more challenging.

ta988 • 10 days ago

even on 5 pages documents LaTeX can spend a surprising amount of time

taeric • 10 days ago

What five page documents? I've seen it zip through far larger texts on the regular.

And is most of the effort from LaTeX in paragraph layout? Most slowness there, is in Mathematica typesetting, I would think.

lttlrck • 10 days ago

Is Latex text wrapping known to be well optimized?

taeric • 10 days ago

The main algorithm most any folks know by name for doing this was created for it, so sorta? I don't know that it is necessarily better than any closed source options. I was under the impression it is the baseline good, though.

__david__ • 9 days ago

Its author was supposedly pretty good at algorithms. I think he may have even written a book or four about them. So I suspect it’s decently optimized.

throw0101d • 9 days ago

> That's part of the reason LaTeX has such good text wrapping -- it can spend serious CPU cycles on the problem because it doesn't do it in real time.

Is that the reason the Microsoft Word team tells themselves as well?

We have multi-core, multi-gigahertz CPUs these days: there aren't cycles to spare to do this?

queuebert • 9 days ago

You would think, but Word has some serious efficiency problems that I can't explain. For one, it is an order of magnitude slower at simply performing word counts than tools like wc or awk. Besides that, the problem does not parallelize well, due to the long-range dependency of line breaks.

Zooming in a bit, Word also does not kern fonts as well as LaTeX, so it might be missing some logic there that would trickle down into more beautiful word spacing and text flow.

toomim • 9 days ago

It's a O(n^2) to O(n!) problem, not O(n), so it doesn't scale linearly with cpu cores.

taeric • 9 days ago

Sorta? For one, you don't always have to do full site optimization for this problem. As such, the "N" you are working over is going to be limited to the number of breaks within a section you are looking at. And you can probably divide work across the sections quite liberally on a page.

Yes, there can be edge cases where optimizing one section causes another section to be resized. My gut is that that is the exception, not the norm. More, for most of the resizes that will lead folks to do, it will largely result in a "card" being moved up/down in such a way that the contents of that card do not need to be re-optimized.

Yes, you could make this even harder by optimizing what the width of a section should be and flowing another section around it. But how many sites actually do something like that?

jcelerier • 9 days ago

to be honest as a LaTeX user on a very beefy CPU I regularly have 30s+ of build times for larger documents. I doubt Word users would want that. A simple 3 page letter without any graphics is a couple seconds already.

taeric • 9 days ago

I'd hazard most of that 30s+ build time is not in the line wrapping algorithm. Most slow documents are either very heavy in math typesetting, or loaded with many cross references needing multiple passes through TeX to reconcile.

To be clear, just because I would hazard that, does not mean I'm right. Would love to see some benchmarks.

xhkkffbf • 9 days ago

Mine are slow because of large images. If I set "draft" mode, it speeds up dramatically. I don't know why latex needs to read the entire image file but it seems to do that.

throw0101c • 9 days ago

> to be honest as a LaTeX user on a very beefy CPU I regularly have 30s+ of build times for larger documents.

A lot of the time folks are sitting and thinking and things are idle. Perhaps Word could 'reflow' text in the background (at least the parts that are off screen)? Also, maybe the saved .docx could perhaps have hints so that on loading things don't have be recalculated?

taeric • 9 days ago

Oh dear lord, the last thing I ever wanted Word to do was to try and reflow things. I can't be the only person that tried to move an image only to have it somehow disappear into the ether while Word tried to flow around it. :D

int_19h • 10 days ago

Lest we forget, TeX is almost 50 years old now, so what constitutes "serious CPU cycles" has to be understood in the context of hardware available at the time.

setopt • 10 days ago

TeX is still slow to compile documents on my current device (MacBook M1), especially when compared to e.g. Typst. I can only imagine how slow it would have been on a 40yo computer.

xigoi • 10 days ago

How much of that time is spent on text wrapping?

jgalt212 • 10 days ago

Computerphile did a nice video on this.

https://www.youtube.com/watch?v=kzdugwr4Fgk

The Kindle Text Problem - Computerphile

jkmcf • 10 days ago

That's true, but I think the OP is commenting on the state of FE development :)

taeric • 10 days ago

Largely, yes. I also challenge if it would be measurable for the size of most sites.

Typesetting all of wikipedia? Probably measurable. Typesetting a single article of wikipedia? Probably not. And I'd wager most sites would be even easier than wikipedia.

frereubu • 10 days ago

You've phrased your comment as if it's a counterpoint to OP, but it's not - both can be true (and from personal experience OP is absolutely right).

watersb • 9 days ago

The best study on the optimization of line breaking algorithms is now only on the Internet Archive. Lots of examples.

"Line Breaking", xxyxyz.org

https://web.archive.org/web/20171021044009/http://xxyxyz.org...

_moof • 10 days ago

Same. I read that and think, oh NOW you all are worried about performance?

zigzag312 • 10 days ago

I once did a naive text wrapping implementation for a game and with a longer text it caused performance to drop way below 60 FPS.

This was on a 4.5 GHz quad core CPU. Single threaded performance of todays top CPUs is only 2-3x faster, but many gamers now have 144Hz+ displays.

pcwalton • 10 days ago

Remember the days of Zuck saying "going with HTML5 instead of native was our biggest mistake"? Though hardware improvements have done a lot to reduce the perceptible performance gap between native and the Web, browser developers haven't forgotten those days, and layout is often high in the profile.

jcelerier • 9 days ago

I have to consider the performance of rendering text literally all time, even without wrapping. This is one of the most gluttonous operations when rendering a UI if you want, say, 60 fps on a raspberry pi zero.

dominicrose • 10 days ago

Basically everything that comes built-in a browser has to be perform well in most use-cases and most devices. We don't want an extra 5% quality at the cost of degraded performance.

0cf8612b2e1e • 10 days ago

Open any random site without an ad blocker and it is clear that nobody cares about well optimized sites.

Telemakhos • 10 days ago

Very likely the site is well optimized. It's optimized for search engines, which is why we found the site in the first place, which is in turn the reason I said "very likely" in the first sentence: we come upon web sites not truly randomly but because someone optimized them for search ranking. It also appears from your "without an ad blocker" that the site may be optimized for ad revenue, monetizing our visit as much as possible. There's probably optimization of tracking you in order to support those ads, too.

What you're complaining about is that the site is not optimized for your reading enjoyment. The site is probably quite well optimized, but your reading enjoyment was not one of the optimizer's priorities. I think we agree about how annoying that is and how prevalent, so the news that new typographical features are coming seems to me like good news for those of us who would appreciate more sites that prioritize us the readers over other possible optimization strategies.

taeric • 9 days ago

I want to believe you. I just can't bring myself to agree, anymore. Most sites are flat out not optimized, at all. Worse, many of them have instrumentation buckled on to interface with several different analytics tools.

And to be clear, most sites flat out don't need to be optimized. Laying out the content of a single site's page is not something that needs a ton of effort put into it. At least, not a ton in comparison to the power of most machines, nowadays.

This is why, if I open up GMail in the inspector tab, I see upwards of 500+ requests in less than 10 seconds. All to load my inbox, which is almost certainly less than the 5 megs that has been transferred. And I'd assume GMail has to be one of the more optimized sites out there.

Now, to your point, I do think a lot of the discussion around web technologies is akin to low level assembly discussions. The markup and script layout of most sites is optimized for development of the site and the creation of the content far more than it is for display. That we have moved to "webpack" tricks to optimize rendering speaks to that.

lcnPylGDnU4H9OF • 10 days ago

The developer in such a case is only allowed to care as much as the PM.

alwillis • 10 days ago

I just can't bring myself to believe there is a meaningfully sized group of developers that have considered the performance of text-wrapping.

You're kidding, right? There are a ton of non-trivial edge cases that have to be considered: break points, hyphenation, other Latin-based languages, etc.

From a Google engineer's paper describing the challenges: https://docs.google.com/document/d/1jJFD8nAUuiUX6ArFZQqQo8yT...

    Performance Considerations
    While the `text-wrap: pretty` property is an opt-in to accept slower
    line breaking, it shouldn’t be too slow, or web developers can’t use
     them due to their performance restrictions.
    
    The pinpoint result when it is enabled for all web_tests is in this CL.
    Complexity
    The score-based algorithm has different characteristics from the
    bisection algorithm. The bisection algorithm is O(n * log w) where n is
    the number of lines and w is the sum of spaces at the right end. The
    score-based algorithm is O(n! + n) where n is the number of break
    opportunities, so it will be slower if there are many break
     opportunities, such as when hyphenation is enabled.
    
    Also, computing break opportunities are not cheap; it was one of
    LayoutNG's optimizations to minimize the number of computing break
     opportunities. The score-based algorithm will lose the benefit.

    Last 4 Lines
    Because computing all break opportunities is expensive, and computing
    the score is O(n!) for the number of break opportunities, the number of
    break opportunities is critical for the performance. To minimize the
    performance impact, the implementation caches 4 lines ahead of the
     layout.
    Before laying out a line, compute line breaking of 4 lines ahead of the
     layout.
    If it finds the end of the block or a forced break, compute the score
     and optimize line breaks.
    Otherwise layout the first line from the greedy line breaking results,
     and repeat this for the next line.
    The line breaking results are cached, and used if the optimizer decided
     not to apply, to minimize the performance impact.
    
    Currently, it applies to the last 4 lines of each paragraph, where
    “paragraph” is content in an inline formatting context separated by
     forced breaks.
    The Length of the Last Line
    Because the benefit of the score-based line breaking is most visible
    when the last line of the paragraph is short, a performance
    optimization is to kick the optimizer in only when the last line is
     shorter than a ratio of the available width.
        
    Currently, it applies only when the last line is equal to or less than ⅓
    of the available width.
    Checking if the last line has only a single word
    Checking if the last line has only a single word (i.e. no break
     opportunities) requires running the break iterator, but only once.

taeric • 9 days ago

You mistake my comment to mean it isn't hard. It is absolutely a difficult problem with serious edge cases. There are people that have studied it quite heavily.

They are still not a sizeable number in comparison to the number of devs that have enabled different text wrap options. Most of which do not give much thought to a setting that did not appreciably slow things down at all.

mdhb • 9 days ago

I think this is the kind of nonsense the Safari team tells themselves so that they can continue to ship absolutely fucking nonsense features like this while ignoring anything that encroaches on the idea of the web becoming a meaningful competitor to their walled garden where they are able to rip off every single person involved every time a transaction occurs.