ITMO Student Makes App to Play Music With Emotions

Emotion recognition is widely used to monitor the emotional state of drivers, prevent health hazards, or deliver new gaming experiences – but what about art? Can such technologies, say, help musicians compose or improvise? They certainly can, says Garri Proshian, a graduate of ITMO’s Faculty of Software Engineering and Computer Systems, who has created a service that allows you to play musical instruments hands-free, using only your facial expressions. Learn more about his invention in the article.

Seven emotions in one program

Musicians have always been particularly meticulous with sounds, knowing their effect on music perception. Back in the day, classical composers could only choose between a violin or a flute to rightly emphasize a dramatic shift in their melodies, whereas modern musicians and sound engineers have learned to make music pieces more comprehensive and expressive by working with musical tones (e.g., equalizing, adding audio effects, or editing sound).

To make writing and editing easier, Garri Proshian has developed a special software called Face Music Control, which he defended as his graduation project at the Bachelor’s program Computer Technologies in Design. With this app, users can manipulate various musical instruments in real-time using only their emotions. This way, the app automates the editing process and produces music following users’ expectations. As noted by the developer, instruments in the app aren’t obeying the commands of users, but rather are empathizing with them.

Face Music Control is based on a face detector, a pre-editing module that converts facial images, and a neural network that can recognize seven basic emotions: anger, disgust, fear, happiness, sadness, surprise, and a neutral state. The network is trained on a FER2013 dataset – one of the largest and most common datasets that consists of 35,887 images.

The app works as follows: while a person plays a musical instrument in front of a webcam, their facial expressions are monitored by a detector that preprocesses the data and identifies a person’s face. After that, the neural network defines the user’s emotions, converts the result into a MIDI file, and transfers it to a recording, editing, and storage software, which changes the sound of the instrument depending on the emotion recognized.

Notably, emotions can be assigned to any sound parameter. For instance, the feature MIDI Mapping in Ableton Live makes it possible to choose suitable emojis one by one when setting up Face Music Control and sound parameters. The app can also be used with different musical instruments. However, when playing electric guitars/violins or any other electric musical instruments with TRS adapters, users will need an external sound card to avoid any delays, while MIDI and acoustic instruments require USB-midi adapters and recording microphones, respectively.

As for future plans, Garri is going to improve his app by, firstly, making it more user-friendly and transferring it to a compiled language. Shortly, users won’t have to use Python and download additional software before starting work, all they will have to do is open an .exe file. He also wants to increase the accuracy of the neural network so that it could identify more than one emotion at a time, for instance, surprise and happiness. And, finally, the third aspect is multimodality. By using neural interfaces and brain rhythms detection, the developer expects to better analyze human states and emotions. The updated version will be tested by members of ITMO’s musical club Живой Звук (Live Sound – Ed.) in September.

Facial expressions in music

Similar features can be found in FaceOSC (written in C++) that tracks a face based on facial points and responds to several parameters, such as how much your mouth/right or left eye is open or whether you raise your right or left eyebrow or have flared nostrils. All that changes the music’s tone.

Another emotion-focused musical device, which is based on the vvvv visual language and Kinect motion sensors, obtains data on facial points, too, yet, unlike the other apps mentioned, can only identify a smile and different degrees of mouth openness.

Keep your mouth wide open

Apple offers users to turn their smartphones or tablets into a recording studio with GarageBand. On the other hand, the developers believe that Face ID can make it easier for musicians to play keyboards or the guitar. All they need to do is play an instrument in front of their gadgets, opening and closing their mouth: the wider they open their mouth, the higher sound they produce.

MIDI Mouth Controller is another example of mouth-controlled software. Unlike GarageBand, this is a separate app for Android and iOS that uses a phone camera to scan a person and then transmit the image as a MIDI file to a computer.

Other similar programs by independent developers are available on the market, for example, face-midi. Written in Python, the program has fully open-source code, thus making it possible for any person to adapt it to their needs. Just like the apps mentioned above, face-midi reads facial points and changes music following a user’s mouth motions.

ITMO Student Makes App to Play Music With Emotions

Seven emotions in one program

Facial expressions in music

Keep your mouth wide open

Alena Mamaeva

Marina Belyaeva

Related news

ITMO Student Experiences: Starting a Career in Video Production and Shooting Videos for Celebrities

Living Paintings, Eco-Friendly Education and Compact Robots: The Results of “It’s Your Call!” Winter School

ITMO Avatar Insights: First Results of Testing and Reviews