AI-Generated Music

Gone are the days of AI being considered a myth by non-tech industries. AI today is present everywhere, and it is real. A couple of generations ago, composing music meant that the composer needed to build the compositions with the help of their computer and make tunes/melodies that the crowd will love to listen to. Once the deep neural networks made a breakthrough in the early 2000s, it was only a matter of time that AI would disrupt the music industry as well. Having said all that, is AI going to replace musicians, composers and even singers? Let’s not rush to those conclusions!

By the end of this article, you will have knowledge of the following:

  • How is AI disrupting the music industry?
  • What all competitors are present currently
  • How does AI actually generate music?
  • Is AI going to replace musicians?

How is AI disrupting the music industry?

AI generating music is not just a fancy show of dummy software that produces some random tunes. It is legit music. In fact, the AI music space is a fast-growing market in itself with multiple players present as of now. For example, IBM Watson Beat, Google Magenta, Amper, AIVA, etc. Each of these has its own technology in place which is the main driver behind generating music. They differ in the user interface they offer and also the type of output they produce. Some of them produce MIDI (Musical Instrument Digital Interface) which is not directly usable so there is a need for knowledge of music production. Some of them such as the Google Magenta needs coding language knowledge to be converted into audio.

These AI-based music composing software mostly use deep neural network architectures which extract the patterns from the music data fed to them. The AI model is given massive amounts of data which it processes through its neural network to understand the relation between different chords, frequencies, beats and notes. It learns the relations between these parameters and how they combine together with each other at what values to output melodies. Some models also have some hardcoded architectures which generate music based on the theoretical notes and how music should be composed theoretically. All the models and software are not just limited to a single type of genre. They are fed every type of genre, be it hard rock, synth-pop or hip hop.

Competitors in the AI Music space

Amper Music

Amper Music is one of the easiest to use AI Music Generator apps in the industry currently. It asks the user for the inputs on how the music needs to be sounded. For example, it asks for the genre, the mood and then processes its model which has learnt all about that specific music type from all the data fed to it in the past.

The best part about Amper is that it directly gives out the audio output and not just MIDI. This means that the user can directly use it for their application and there is absolutely no need of doing any coding or having much knowledge of music production. Once the music is generated, the user can tweak it according to their liking by changing the frequencies, beats and tone of the audio. Separate instrument audios can also be muted or changed by extracting the various layers of the music.

Flow Machine

Flow Machines is a research and development and social implementation project that aims to expand the creativity of creators in music. The AI behind Flow Machine helps the composers to make music with the help of its music-based rules which are made by analyzing various kinds of music.

The main aim of the Flow Machine is not to replace humans in composing the music. The purpose behind it is to predict or come up with melodies and tones as per the genre and mood the user wants. The user can then use those predictions and alter them according to their liking.


AIVA (Artificial Intelligence Visual Artist) is another AI-based software that can create unique music out of thin air. It has ML-based models which have learnt the music compositions and converted those learnings into mathematical representations. These mathematical representations are later used to compose completely unique music. 

There are multiple genres of music pre-built into AIVA such as rock, classic rock, metal, hip hop, etc. There are multiple settings and parameters that can be tweaked by the user to alter and feedback the AI system. Below is one of the music pieces generated by AIVA’s AI. Go check it out and don’t get freaked out!

On the Edge – AI-generated rock music composed by AIVA

How does AI actually generate music?

Okay, so we have seen how AI is helping musicians generate music automatically. But is it all magic happening behind a black box? No, it’s all mathematics and neural networks grinding out millions of calculations to churn out an output that eventually is converted to audio. These models are mostly comprised of Recurrent Neural Networks(RNN). One question arises here. Why RNN? The answer is that vanilla neural networks are poor in dealing with sequential or temporal data. When it comes to RNNs, they suffer from a problem called Vanishing Gradient. This occurs due to the structure of RNN. To overcome this shortcoming, LSTMs come into the picture.

When it comes to neural networks for generating music, there are multiple different models. But the 2 most common ways are to use are WaveNet and LSTM (Long Short-Term Memory) networks. Let’s first take a look at how WavenNet models work.

WaveNet Model

WaveNet is a deep neural network architecture developed by Google’s DeepMind. It works as the language models work in NLP to generate the next words in a sentence by taking the preceding words as the input. It predicts the next sample given a sequence of samples as inputs. The inputs to the model are the amplitude waves which represent the audio in time series domain data. The output generated is the next sequence of amplitude values. As you see, it takes in the previous time data and predicts the next sequence in time. Hence, this is known as an Autoregressive model.

The WaveNet model is primarily comprised of Causal Dilated 1-D Convolutional Networks. Here Causal means that the output at time t is convolved only with elements from time t and earlier in the previous layer. The dilation here is nothing but the holes or blank spaces in between the convolutional Kernels to the reception field of a network.

LSTM Model

The input to the LSTM models is similar to WaveNet models. Here the amplitude samples are passed to the LSTM inputs and it then computes the hidden vector and passes it on to the next timesteps. Multiple LSTM cells are connected to each other with the output of one being fed to the next cell. At any given time t, the hidden vector H is calculated based on the current value at t and the value from the previous cell from t-1. This helps LSTM capture the sequential information present in the input sequence. An LSTM cell consists of multiple gates that help it to remember and pass the information to the next cell. LSTM cells consist of 3 main gates: input gate, forget gate and output gate. This complete memory cell makes the LSTM retain the useful information and discard useless information from the forget gate.

Before you go

We saw how AI has made strides in the music industry and how the software companies are already competing against each other in bringing the most cutting edge technology to the users. But the question that keeps looming… is AI going to replace humans? No! All the AI tools and models we discussed above are in no way targeting to replace humans. Their aim is to assist the music composers and producers to be more effective at their work and introduce melodies that they themselves might not be able to come up with. The AI systems need feedback on their outputs which can be done by humans. These also help the musicians who don’t have much theoretical knowledge of music and hence cannot bring the music they imagine to real life. The AI here plays a major role in generating that music. In fact, the companies are developing this software by keeping humans at the centre. Musicians are not going anywhere. They just have got a new exciting tool at their disposal. Also, the AI might well be able to compose music on its own completely which can be used as background music in videos, etc. Therefore, there is still a lot of time that an AI wins a Grammy for generating the best music ever on Earth!

Chayan is a creative Data Scientist with an eye for details. An everyday learner and blogger, he has extreme eagerness to share knowledge and support the Data Science community. Connect with him on LinkedIn to get in touch and don’t forget to check out his Medium blogs.

Data Science | Machine Learning | Tech Blogger – upGrad

Opinions expressed by contributors are their own.

About Chayan Kathuria

Chayan is a creative Data Scientist with an eye for details. An everyday learner and blogger, he has extreme eagerness to share knowledge and support the Data Science community. Connect with him on LinkedIn to get in touch and don't forget to check out his Medium blogs. Data Science | Machine Learning | Tech Blogger - upGrad

View all posts by Chayan Kathuria →