Foundation system

From creating images to telling jokes – basic models and the next generation of AI | Arts & Hobbies

If you’ve seen photos of an avocado-shaped teapot or read a well-written article that takes a bit of a weird tangent, you might have been exposed to a new trend in artificial intelligence (AI).

Machine learning systems called DALL-E, GPT and PaLM are causing a stir with their incredible ability to generate creative work.

These systems are known as “base models” and are not all hype and party stuff. So how does this new approach to AI work? And will this be the end of human creativity and the beginning of a real fake nightmare?

1. What are foundation models?

Basic models work by training one huge system on large amounts of general data and then adapting the system to new problems. Previous models tended to start from scratch for each new problem.

DALL-E 2, for example, was trained to match images (such as a photo of a pet cat) with the caption “Mr. Fuzzyboots the tabby cat relaxes in the sun” by scanning hundreds of millions of examples. Once trained, this model knows what cats (and other things) look like in photos.

But the model can also be used for many other interesting AI tasks such as generating new images from a single caption (“Show me a koala dipping a basketball”) or editing images based on written instructions (“Make it look like this monkey pays taxes”).

2. How do they work?

The Foundation’s models operate on “deep neural networks,” which are loosely modeled on how the brain works. These involve sophisticated mathematics and enormous computing power, but they boil down to a very sophisticated type of pattern matching.

For example, by examining millions of example images, a deep neural network can associate the word “cat” with pixel patterns that often appear in images of cats – like soft, fuzzy, furry flecks of texture. The more examples the model sees (the more data it displays) and the larger the model (the more “layers” or “depth” it has), the more complex these patterns and correlations can be.

Core models are, in a sense, just an extension of the “deep learning” paradigm that has dominated AI research for the past decade. However, they exhibit unprogrammed or “emergent” behaviors that can be both surprising and novel.

For example, Google’s PaLM language model seems to be able to produce explanations for complicated metaphors and jokes. It goes beyond simply mimicking the data types it was originally trained to deal with.

A user interacts with the PaLM language model by typing questions. The AI ​​system responds by typing in the answers.

3. Access is limited – for now

The scale of these AI systems is hard to imagine. PaLM has 540 billion parameters, which means that even if everyone on the planet memorized 50 numbers, we still wouldn’t have enough storage to replicate the pattern.

The models are so huge that training them requires massive amounts of computing and other resources. One estimate put the cost of training OpenAI’s GPT-3 language model at around $5 million.

As a result, only big tech companies such as OpenAI, Google, and Baidu can afford to build base models at this time. These companies limit who can access the systems, which makes economic sense.

Usage restrictions can reassure us that these systems will not be used for nefarious purposes (such as generating false news or defamatory content) anytime soon. But it also means independent researchers are unable to interrogate these systems and share findings in an open and accountable way. We therefore do not yet know all the implications of their use.

4. What will these models mean for the “creative” industries?

Other foundation models will be produced in the years to come. Smaller models are already being released in open source forms, tech companies are beginning to experiment with licensing and commercializing these tools, and AI researchers are working hard to make the technology more efficient and accessible.

The remarkable creativity exhibited by models such as PaLM and DALL-E 2 demonstrates that creative professional jobs could be impacted by this technology sooner than originally anticipated.

Conventional wisdom has always said that robots will displace “blue collar” jobs first. “White-collar” work was supposed to be relatively immune to automation – especially professional work that required creativity and training.

Deep learning AI models already exhibit superhuman accuracy in tasks such as examining x-rays and detecting eye condition macular degeneration. Basic templates could soon provide cheap and “good enough” creativity in areas such as advertising, copywriting, stock images, or graphic design.

The future of professional and creative work might be a little different than we expected.

5. What this means for legal evidence, news and media

Foundation models will inevitably affect the law in areas such as intellectual property and evidence, as we will not be able to assume that creative content is the result of human activity.

We will also have to face the challenge of misinformation and misinformation generated by these systems. We already face huge disinformation challenges, as we see in the Russian invasion of Ukraine and the emerging problem of deep fake images and videos, but grassroots models are poised to supercharge those challenges.


As researchers who study the effects of AI on society, we believe that the basic models will lead to enormous transformations. They’re tightly controlled (for now), so we probably have some time to figure out their implications before they become a huge problem.

The genie isn’t quite out of the bottle yet, but the foundation models are a very big bottle – and inside is a very clever genie.

Aaron Snoswell is Postdoctoral Fellow, Computational Law & AI Accountability, Queensland University of Technology; Dan Hunter is Executive Dean of the Faculty of Law at Queensland University of Technology. This article is republished from The Conversation under a Creative Commons license. Read the original article at