The gestalt of AI creations

Using creational neural networks to inspire aesthetic creation

14 min readDec 20, 2018

Artificial intelligence is categorized in different groups. One of them is content creation.
The majority of AI created content is disregarded for its lack of interesting outcomes, predictiveness and triviality or bad results. It is determined it as highly interpolated.
Perhaps we are currently in the stone age of creations through AI.
While I agree to the general opinion and wouldn’t state the outcomes are genius, I also noticed that an AI creation can only be as good as its design and would try not to only blame the lack of resources.

In the past 2 years AI has had an outburst of content creational outcomes using deep learning techniques to train neural networks to re-create something in all kinds of formats. These formats are for human consumption and interpretation.

The magic happened when we produced the first creational audio and visual outcomes and started to listen to them. Musical and visual art cannot be judged binarily as good or bad. It lives in between. And so since our trivial AI can now produce things that are somewhere on a large scale between 0 and 1 where there is no wrong and right, it can finally be allowed to exist.

Neural networks have been created to create new music pieces.

Source: Performance RNN by Magenta-Tensorflow

Neural networks have been created to draft essays — or tweets:

Source: Automatic Donald Trump Tweet Generator

The act of abstraction

The thought function we undergo when challenging ourselves with the task of abstracting an information is by using our intellect to tell the same story in fewer words,

to draw the same picture in fewer strokes.
To try to understand what can be removed, what has to stay — where the essence lies.

Forcing a neural network with the challenge to abstract leads to the same procedure of functionally having to recreate the same information in a smaller image — what comes closest on functional level is compression by size reduction.

Taking an input image of 10 mbit size and turning it into an information: “house” is a form of abstraction.
To abstract means to trade-off raw information loss in order to get a clear summarization.

(Pooling method, Source: Fei-Fei Li, Andrej Karpathy, and Justin Johnson. 2016. Convolutional neural networks CS231n.)

Information loss in deep learning happens with a mathematical method called convolution (on a more specific level: matrix multiplication and pooling methods), where 1000 pieces of data are compressed by using 500 pieces of data that “try” to contain the same information as the 1000 pieces. With this method we can go from 1 million data pieces in form of raw pixels to a single output of a category, like a house.

1 million data pieces sound like a house?
Look at this image — here are 800.000 pixels and you identify a house.
This skill of image recognition has been translated digitally by science and engineering using machine language to instruct single processing and parallel processing units that compute programs with the ability to identify.

After being able to identify images, researchers have discovered that they can invert the process and use neural networks to ask them to identify categories like: “house” and give a custom created image of what the neural network thinks a “house” looks like.

After this has worked, they started to wonder: “what if we were able to alter the AI’s reality by only showing it a very special amount of input data and then later ask it to draw a “house” — it would be heavily dependent on this input data.

The concept of carefully specifying input data to match to a certain artistic style and then demanding the AI to create outputs is called “style transfer” and I’ve taken this technology to dig deeper into.

Most style transfer neural networks to this point have been trained to recreate larger input images. And they’ve produced outcomes like these:

(From the left: Ostagram, Prisma AI, Deepart and Wizart AI)

In 2016 my colleagues Tim Suchanek, Alexander Tonn, Marc Mengler and I have spent 9 months researching on style transfer neural networks and publishing them as a consumer app via the mobile app stores. So with 100s of neural networks and a couple of terrabytes of rendered images still lying around it was a great basis to inspect the archives.

From an aesthetic viewpoint the outcomes were always dependent on the gestalt of the input image and the style was supposed to affect the gestalt, but in fact it never did. Only on a granular level that never really affected the overall gestalt of the output image.

The formulated thoughts* of neural networks were too limited for an entertaining dialogue — due to primitivity of the neural networks and technical limitations.
* a thought in this case can be any outcome produced by a neural network

But could the same be assumed for unformulated thoughts?
For thoughts that juuuust have left the region of intangibility?

What if those unformulated thoughts are on the same level as abstract hardly graspable thoughts produced by humans?

Translated to a visual language this could mean a sketch or raw basic forms could be able to express the the most important aspects of an image.

Gestalt Psychology

…the human mind (perceptual system) forms a percept or “gestalt”, the whole has a reality of its own, independent of the parts.
– Source: Wikipedia

Known as the Gestalt Theory, we are keen to understand and perceive things as a whole. This whole perception of an image is strongly influenced by the forms we find in this image. These forms are identified through contrast, color, 3-dimensionality and so on.

Paul Rand (1914–1996), Designer of the IBM logo and other stuff has once published a thought called “The language of form” categorizing form attributes. He tried to capture the visual gestalt in words and illustrations and to communicate its “gestalt” to other people hoping it would help make better design.

I was fascinated by the potential of AI being able to impact the gestalt of images when creating new ones — potentially introducing new gestalts we haven’t seen yet.

But initially the style transfer neural networks haven’t been created to affect the gestalt of an image – so I was wondering if I could tweak it and create the situations where gestalt impacting outcomes occur.

Aesthetics — the taste of beauty

from a consumer perspective aesthetics are artistic outcomes that have the ability to entertain us. The problem with every system for creation is that the entertainment factor goes down when consuming the outcomes the moment when we get used to the system.

And it sure happens really fast when a stylized outcome created by AI seems like a mere change of the color palette.

The challenge was just to find an extreme point in these style transfer neural networks that allow us to be entertained because we’ve reached a point in the system where it gives us something we want but didn’t ask for.
Much like good music and art — we know we love it, but we could never describe what we want before we see it. And while artists are the masters of the gestalt of their works, they could never predict being able to create a masterpiece before its more less crystallized and finalized.

During my studies at the HfG Karlsruhe on the aesthetics of neural networks lead by Prof. Matteo Pasquinelli (2016-2017) our group of students has created various outcomes by neural networks and we’ve discussed them in a group.
When creating new music pieces from midi datasets we’ve discovered that we often find tiny bits and pieces in larger outcomes that are quite interesting — describing them as local minima with expressive outcome.

Technical Environment

To access and control neural networks as a human, basic infrastructure has to be set up and plugged together. Often local computation resources are not sufficient and therefor a connection to a cloud instance has to be established.

Interface interaction and command

The used style transfer neural networks are based on the CNNMRF implementation by Chuan Li and Michael Wand. My colleague Alexander Tonn has developed new ways to embed stronger gestalt impact on pre-trained style transfer renderings with a pyramid upscaling methodology and additional fine-tuneable commands to balance the upscaling process which requires a couple of iterations on the same image.

Information Loss

Earlier mentioned, 50 data points can be used to try to capture the informations of 1 million data points. The 50 data points cannot capture all the informations, so they are forced to stick to the essence — the neural networks’ best guess at what the gestalt could be based on the little informations they are allowed to keep.

On the microscopic scale level the decisions made are critical and the informations given are largely reduced — making for an extreme situation of unpredictable outcomes.

When looking at different output sizes from the same image and style you can see the smaller an image gets, the more form impact by the style.
This just happens to be this way based on technical limitations and the design of the concept and implementation.

I noticed the very small sized outputs might be the sweet spot for these neural networks to perform.

Neural networks of style generation are creating alternative forms of the pixel — perhaps the beginning of an organic digital era?

It’s a forced decision making program. The extreme situation and limited complexity of the network produces outcomes. System-designed predictability yet inspiring for the human perception.

What does it give us?
Sometimes the lack of options makes for creative solutions especially in fields where there cannot be a definition of right and wrong binarily, only a spectrum of possible outcomes. These solutions of chaotic character can be refreshing and entertaining.

Notice how the shape between each image is changing a lot.

These are the selected results from ~100 self made pre-trained neural networks, the ones with the most particularly interesting gestalt and which I’ve found interesting for visual consumption.

The workflow to this point is a non organic approach to create organic outputs. That’s basically the flaw in the design of creational AI that will never make it competitive towards human capabilities. I don’t want to compare — but I am biased by who and what I am.

Now it’s really hard to look at 30px images, so I’ve upscaled them to have a better format for looking at them and obviously because the information is missing – the pixelation confused my eyes when looking at them.

An output image with an interesting gestalt seen as a consumable media with us as the consumers has to be in the right size to look at it. The problem is just the moment I resize, everything gets pixelated because there is no further information in the image.

Humans are particularly good at filling gaps in their mind. Wether it’s a car blocked by a tree, an unfinished sentence… we use our imagination to fill in any lack of information.

Utilizing human skills, I had the idea to discover what artists would see in the small and abstract gestalt form of an image — and if it could be an interesting way to influence them in their creation process.

I abstracted the form by drawing the outlines of each of the selected outcomes.

Experiment

What would an artist see and make from such an image if he knew nothing around how the result has been made?
The idea was to use a refined image without using the pixelated image (which would cause confusion) and ask artists to try to use their imagination to add made-up information to the image.
I published jobs on a freelance platform where artists offered their jobs to paint things.

Asking artists to draw what they see in the abstract sketch

The results were… interesting. Of course the quality has been determined by my small budgets and effort to find talented artists willing to help me.

For some outcomes like the above abstract painting I was quite unsure if the artist produced it for me.

Maybe a bit afraid to transform the input?

I’ve proceeded with the artist who seemed to be most freely thinking about the task and tried to put him in the right mindset to draw the next artworks. My idea was to be able to see a correlation between a gestalt and the way a person capable of imagination and painting identifies and associates. The initial identification of an artifact that could be a categorizable object leads to the further image buildup.

It was really hard to put these back together, like puzzling.

Look at each image and try to imagine where the artist could have had the first spark of idea. Which tiny piece?
What would you see? (I know this is too late to ask since you’re now biased)

When you begin making sense of a sensoric input lacking in information, you’re imagining sense.
Go back and try it with one of the sketches.
Note how you force yourself to identify a detail and then grow the story around this detail.
I wish I could run a psychological research on this, but it’s not my field. The outcome could reveal interesting insights into how we fill in the blanks if we’re confronted with lack of information.

And putting the final results against the original photograph shows an even more stunning difference between input and output:

Conclusions

1
This work shows there is potential in instrumentalizing neural networks for inspiring aesthetic creation.

2
Information loss through compression can be used to create unpredictable outcomes.

3
Intuitively I would say what’s missing in the technological design of creating anything is organic development growth from small to large like cells would divide or plants would grow.
As long as we don’t build our AI that way we will always be able for it to be uncomparable to humans. Have they been designed by humans in all their current imperfections I would say they are a fitting representation of our knowledge and lack of knowledge. I believe with continuous aggregation of intellect about how to create machines that create we come closer and closer to systems that are capable of delivering outputs we want, but weren’t able to specify.

4
The reasons these neural network renderings are deterministic is because a computation does not have environmental noises like time, radiation, quantum interference etc.
Furthermore the used neural networks are not designed to evolve while rendering a new output.