zozbot234 10 days ago

> Maybe Cycorp's knowledge base will be made generally accessible at some point, so that it can be used to train LLMs.

More likely, it will be made increasingly irrelevant as open alternatives to it are developed instead. The Wikipedia folks are working on some sort of openly developed interlingua that can be edited by humans, in order to populate Wikipedias in underrepresented languages with basic encyclopedic text. (Details very much TBD, but see https://en.wikipedia.org/wiki/Abstract_Wikipedia and https://meta.wikimedia.org/wiki/Abstract_Wikipedia ) This will probably be roughly as powerful as the system OP posits at some point in the article, that can generate text in both English and Japanese but only if fed with the right "common sense" to begin with. It's not clear exactly how useful logical inference on such statements might turn out to be, but the potential will definitely exist for something like that too, if it's found to be genuinely worthwhile in some way.

3
Rochus 10 days ago

> made increasingly irrelevant as open alternatives to it are developed instead

Certainly interesting what these projects are going for, but it's unlikely an "open alternative", given that the degree of formalization and rigor achieved by Cyc's higher-order logic specification is likely not achievable by statistical learning, and a symbolic approach is barely achievable in a shorter time than Cyc.

yowzadave 10 days ago

It would be very surprising if the results from this approach were superior to simply machine-translating the entries from another language—because e.g. English already has so much content and contributor activity, and LLMs are already very good at translating. I can’t imagine you’d get more than a fraction of people’s interest in authoring entries in this abstract language.

yorwba 10 days ago

LLMs are good at translating between languages that have significant amounts of written content on the internet. There are few languages in this category that do not already have correspondingly large Wikipedias.

There are plenty of languages with millions of speakers that are only rarely used in writing, often because some other language is enforced in education. If you try to use an LLM to translate into such a language, you'll just get garbage.

It's very easy for a hand-crafted template to beat an LLM if the LLM can't do the job at all.

yellowapple 10 days ago

https://www.wikidata.org/wiki/Wikidata:Main_Page, for those curious about the interlingua in question.

zozbot234 10 days ago

Strictly speaking, Wikidata is an existing project which only provides a rather restrictive model for its assertions; they are not fully compositional, thus are quite far from being able to express general encyclopedic text, especially in a way that can be 'seamlessly' translated to natural language. It does provide a likely foundation for these further planned developments, though.