If you were working with code that was proprietary, you probably shouldn't of been using cloud hosted LLMs anyways, but this would seem to seal the deal.
I think you probably mean "shouldn't have". There is no "shouldn't of".
Which gives you an opening for the excellent double contraction “shouldn’t’ve”
The letter H deserves better.
The funniest part is that in that contraction the first apostrophe does denote the elision of a vowel, but the second one doesn’t, the vowel is still there! So you end up with something like [nʔəv], much like as if you had—hold the rotten vegetables, please—“shouldn’t of” followed by a vowel.
Really, it’s funny watching from the outside and waiting for English to finally stop holding it in and get itself some sort of spelling reform to meaningfully move in a phonetic direction. My amateur impression, though, is that mandatory secondary education has made “correct” spelling such a strong social marker that everybody (not just English-speaking countries) is essentially stuck with whatever they have at the moment. In which case, my condolences to English speakers, your history really did work out in an unfortunate way.
> phonetic
A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations. Though rendering historical texts illegible, a phonemic approach would work: https://en.wiktionary.org/wiki/Appendix:English_pronunciatio... But that would still mean most speakers have 2-3 ways of spelling various vowels. There are some further problems with a phonemic approach: https://alexalejandre.com/notes/phonetic-vs-phonemic-spellin...
Here's an example of a phonemic orthography, which is somewhat readable (to me) but illustrates how many diacritics you'd need. And it still spells the vowel in "ask" or "lot" with the same ä! https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....
> A phonetic respelling would destroy the languages, because there are too many dialects without matching pronunciations.
Not only that, but since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift where the same words won't be pronounced the same in, e.g. 100-200 years, which will result in future generations effectively losing easy access to the prior knowledge.
> since pronunciation tends to diverge over time, it will create a never-ending spelling-pronunciation drift
Once you switch to a phonetic respelling this is no longer a frequent problem. It does not happen, or at least happens very rarely with existing phonetic languages such as Turkish.
In the rare event that the pronunciation of a sound changes in time, the spelling doesn't have to change. You just pronounce the same letter differently.
If it's more than one sound, well, then you have a problem. But it happens in today's non-phonetic English as well (such as "gost" -> "ghost", or more recently "popped corn" -> "popcorn").
> Once you switch to a phonetic respelling this is no longer a frequent problem
Oh, but it does. It's just the standard is held as the official form of the language and dialects are killed off through standardized education etc. To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American (when in the UK different cities and social classes have quite divergent usage!) This clearly would not work and would cause the system to break apart. English exhibits very minor diaglossia, as if all Turkic peoples used the same archaic spelling but pronounced it their own ways, e.g. tāg, kök, quruq, yultur etc. which Turks would pronounce as dāg, gök, yıldız etc. but other Turks today say gurt for kurt, isderik, giderim okula... You just say they're "wrong" because the government chose a standard and (Turkic people's outside of Turkey weren't forced to use it.)
As a native English speaker, I'm not even sure how to pronounce "either" (how it should be done in my dialect) and seemingly randomly reduce sounds. We'd have to change a lot of things before being able to agree on a single right version and slowly making everyone speak like that.
> dialects are killed off through standardized education etc.
Sorry, I didn't mean that it would be a smooth transition. It might even be impossible. What I wrote above is (paraphrasing myself) "Once you switch to a phonetic respelling [...] pronunciation [will not] tend to diverge over time [that much]". "Once you switch" is the key.
> To do this in English would e.g. force all Australians, Englishmen etc. to speak like an American
Why? There is nothing that prevents Australians from spelling some words differently (as we currently do, e.g. colour vs color, or tyre vs tire).
There's no particular reason why e.g. Australian English should have the same phonemic orthography as American English.
Nor is it some kind of insurmountable barrier to communication. For example, Serbian, Croatian, and Bosnian are all idiolects of the same language with some differences in phonemes (like i/e/ije) and the corresponding differences in standard orthographies, but it doesn't preclude speakers from understanding each other's written language anymore so than it precludes them from understanding each other's spoken language.
> Serbian, Croatian and Bosnian
are based on the exact same Štokavian dialect, ignoring Kajkavian, Čajkavian, Čakavian and Torlakian dialects. There is _no_ difference in standard orthography, because yat reflexes have nothing to do with national boundaries. Plenty of Serbs speak Ijekavian, for example. Here is a dialect map: https://www.reddit.com/media?url=https%3A%2F%2Fi.redd.it%2Fc...
Your example is literally arguing that Australian English should have the same _phonetic_ orthography, even. But Australian English must have the same orthography or else Australia will no longer speak English in 2-3 generations. The difference between Australian and American English is far larger than between modern varieties of naš jezik. Australians code switches talking to foreigners while Serbs and Croats do not.
> There is _no_ difference in standard orthography, because yat reflexes have nothing to do with national boundaries
But there is, though, e.g. "dolijevati" vs "dolivati". And sure, standard Serbian/Montenegrin allows the former as well, but the latter is not valid in standard Croatian orthography AFAIK. That this doesn't map neatly to national borders is irrelevant.
If Australian English is so drastically different that Australians "won't speak English in 2-3 generations" if their orthography is changed to reflect how they speak, that would indicate that their current orthography is highly divergent from the actual spoken language, which is a problem in its own right. But I don't believe that this is correct - Australian English content (even for domestic consumption, thus no code switching) is still very much accessible to British and American English speakers, so any orthography that would reflect the phonological differences would be just as accessible.
By tautology, if you split the language, you split the language. Different groups will exhibit divergent evolution.
> current orthography is highly divergent from the actual spoken language, which is a problem in its own right
The orthography is no more divergent to an Australians speech as to an American's speech, let alone a Londoner or Oxfordian. But why would it be a problem?
The need for regular re-spelling and problems it introduces are precisely my point.
Consider three English words that have survived over the multiple centuries and their respective pronunciation in Old English (OE), Middle English around the vowel shift (MidE) and modern English, using the IPA: «knight», «through» and «daughter»:
«knight»: [knixt] or [kniçt] (OE) ↝ kniçt] or [knixt] (MidE) ↝ [naɪt] (E)
«through»: [θurx] (OE) ↝ [θruːx] or [θruɣ] (MidE) ↝ [θruː] (E)
«daughter»: [ˈdoxtor] (OE) ↝ [ˈdɔuxtər] or [ˈdauxtər] (MidE) ↝ [ˈdɔːtə] (E)
It is not possible for a modern English speaker to collate [knixt] and [naɪt], [θurx] and [θruː], [ˈdoxtor] and [ˈdɔːtə] as the same word in each case.Regular re-spelling results in a loss of the linguistic continuity, and particularly so over a span of a few or more centuries.
Interesting, just how much the Old English words sound like modern German: Knecht, durch and Tochter. Even after 1000 years have elapsed.
Modern German didn't undergo the Norman Conquest, a mass influx of West African slaves, or an Empire on which the Sun never set, so it is much more conservative. The incredible thing about the Norman Conquest, linguistically speaking, is that English survived at all.
The great vowel shift happened in the 16th century and is responsible for most of these changes. The original grammatical simplification (loss of cases etc.) between 10-1300 is difficult to ascribe, as similar happened in continental Scandinavian languages (and the Swedes had their own vowel dance!) But the shift in words themselves came much after (and before empire).
English also shows a remarkable variation in pronunciation of words even for a single person. I don't know of any other language where, even in careful formal speech, words can just change pronunciation drastically based on emphasis. For example, the indefinite article "a" can be pronounced as either [ə] (schwa, for the weak form) or "ay" (strong form). "the" can be "thə" or "thee". Similar things happen with "an", "can", "and", "than", "that" and many, many other such words.
We had a spelling reform or two already, they were unfortunately stupid, eg doubt has never had the b pronounced in English. https://en.m.wiktionary.org/wiki/doubt
That said, phonetic spelling reform would of course privilege the phonemes as spoken by whoever happens to be most powerful or prestigious at the time (after all, the only way it could possibly stick is if it's pushed by the sufficiently powerful), and would itself fall out of date eventually anyway.
> but the second one doesn’t, the vowel is still there!
Isn't the "a" in "have" elided along with the "h?"
Shouldn't've Should not have
What am I missing?
Even though the vowel "a" is dropped from the spelling, if you actually say it out loud, you do pronounce a vowel sound when you get to that spot in the word, something like "shouldn'tuv", whereas the "o" in "not" is dropped from both the spelling and the pronounciation.
The pronounced vowel is different than the 'a' in 'have'. And the "h" is definitely elided.
Many English dialects elide "h" at the beginning even when nothing is contracted. The pronounced vowel is different mostly because it's unstressed, and unstressed vowels in English generally centralize to schwa or nearly so.
Don’t worry about us. English is truly a horrible language to learn, and I feel bad for anyone who has to learn it.
Also I have always liked this humorous plan for spelling reform: https://guidetogrammar.org/grammar/twain.htm
The node for it on Everything2 makes it a little bit easier to follow with links to the English word. https://everything2.com/title/A+Plan+for+the+Improvement+of+...
So, its something like:
For example, in Year 1 that useless letter "c" would be dropped to be [replased](replaced) either by "k" or "s", and likewise "x" would no longer be part of the alphabet.
It becomes quite useful in the later sentences as more and more reformations are applied. English being particularily difficult is just a meme. only the orthography is confusing.
English spelling is pretty bad, but spoken English isn't terrible, is it? It's the most popular second language.
English is rather complex phonologically. Lots of vowels for starters, and if we're talking about American English these include the rather rare R-colored vowels - but even without them things are pretty crowded, e.g. /æ/ vs /ɑ/ vs /ʌ/ ("cat" vs "cart" vs "cut") is just one big WTF to anyone whose language has a single "a-like" phoneme, which is most of them. Consonants have some weirdness as well - e.g. a retroflex approximant for a primary rhotic is fairly rare, and pervasive non-sibilant coronals ("th") are also somewhat unusual.
There are certainly languages with even more spoken complexity - e.g. 4+ consonant clusters like "vzdr" typical of Slavic - but even so spoken English is not that easy to learn to understand, and very hard to learn to speak without a noticeable accent.
You never realize how many weird rules, weird exceptions, ambiguities, and complete redundancies there are in this language until you try to teach English, which will also probably teach you a bunch of terms and concepts you've never heard of. Know what a gerund is? Then there's things we don't even think about that challenge even advanced foreign learners, like when you use which articles: the/a.
English popularity was solely and exclusively driven by its use as a lingua franca. As times change, so too will the language we speak.
Every real, non-constructed language has weird rules, weird exceptions, ambiguities, and complete redundancies. English is on the more difficult end but it's not nearly the most difficult. I'm not sure how it got to be perceived as this exceptionally tough language just because pronunciation can be tough. Other languages have pronunciation ambiguities too...
The thing is that English takes in words from other languages and keeps doing so, which means that there are several phonetic systems in use already. It's just that they use the same alphabet so you can't tell which one applies to which word.
There are occasional mixed horrors like "ptarmigan", which is a Gaelic word which was Romanized using Greek phonology, so it has the same silent p as "pterodactyl".
There's no academy of the English language anyway, so there's nobody to make such a change. And as others have said, the accent variation is pretty huge.
That used to be the case, but "shouldn't of" is definitely becoming more popular, even if it seems wrong. Languages change before our eyes :)
Why not? Assuming you believe you can use any cloud for backup or Github for code storage.
IIUC one reason is that prompts and other data sent to 3rd party LLM hosts have the chance to be funneled to 4th party RLHF platforms, e.g. Sagemaker, Mechanical Turks, etc. So a random gig worker could be reading a .env file the intern uploaded.
What do you mean by chance? It's clear that if users have not opted out from training the models, it would be used. If they have opted out, it wont be used. And most of the users are in first bucket.
Just because training on data is opt out doesn't mean business can't trust it. Not the best for user's privacy though.
I think it's fair to question how proprietary your data is.
Like there's the algorithm by which a hedge fund is doing algorithmic trading, they'd be insane to take the risk. Then there's the code for a video game, it's proprietary, but competitors don't benefit substantially from an illicit copy. You ship the compiled artifacts to everyone, so the logic isn't that secret. Copies of the similar source code have linked before with no significant effects.
AFAIK, the actual trading algorithms themselves aren’t usually that far from what you can find in a textbook, their efficacy is mostly dictated by market conditions and the performance characteristics of the implementation / system as a whole.
This very much "depends".
Many algo strategies are indeed programmatically simple (e.g. use some sort of moving average), but the parametrization and how it's used is the secret sauce and you don't want that information to leak. They might be tuned to exploit a certain market behavior, and you want to keep this secret since other people targeting this same behavior will make your edge go away. The edge can be something purely statistical or it can be a specific timing window that you found, etc.
It's a bit like saying that a Formula 1 engine is not that far from what you'd find in a textbook. While it's true that it shares a lot of properties with a generic ICE, the edge comes from a lot of proprietary research that teams treat as secret and definitely don't want competitors to find out.
Most (all?) hedge funds that use AI models explicitly run in-house. People do use commercial LLMs, but in cases where the LLMs are not run in-house, it's against the company policy to upload any proprietary information (and generally this is logged and policed).
A lot of the use is fairly mundane and basically replaces junior analysts. E.g. it's digesting and summarizing the insane amounts of research that is produced. I could ask an intern to summarize the analysis on platinum prices over the last week, and it'll take them a day. Alternatively, I can feed in all the analysis that banks produce to an LLM and have it done immediately. The data fed in is not a trade secret really, and neither is the output. What I do with the results is where the interesting things happen.