Tag: ML

How Spotify Uses Design To Make Personalization Features Delightful

Every day, teams across Spotify leverage AI and machine learning to apply our personalization capabilities on a large scale, leading to the features, playlists, and experiences Spotify users have come to know and love. And when you spend your days working with emerging technologies, it’s easy to get transfixed by complicated new advancements and opportunities. So how do our forward-thinking teams ensure they can tackle this technical work while also prioritizing the experience of our users? 

That’s a question constantly on the mind of Emily Galloway, Spotify’s Head of Product Design for Personalization. Her team’s role is to design content experiences that connect listeners and creators. This requires understanding our machine learning capabilities as they relate to personalization to leverage them in a way that is engaging, simple, and fun for our users. 

“Design is often associated with how something looks. Yet when designing for content experiences, we have to consider both the pixels and decibels. It’s more about how it works and how it makes you feel,” Emily explains to For the Record. “It’s about being thoughtful and intentional—in a human way—about how we create our product. I am a design thinker and a human-centric thinker at my core. People come to Spotify to be entertained, relaxed, pumped up, and informed. They come for the content. And my team is really there to think about that user desire for personalized content. What are we recommending, when, and why?”

The Personalization Design team helps create core surfaces like Home and Search, along with much-loved features like Discover Weekly, Blend, and DJ. So to better understand just how to think about the design behind each of these, we asked Emily a few questions of our own.

How does design thinking work to help us keep our listeners in mind?

When you work for a company, you know too much about how things work, which means you are not the end user. Design helps us solve problems by thinking within their mindset. It’s our job to be empathetic to our users. We have to put ourselves in their shoes and think about how they experience something in their everyday life. A big thing to keep in mind is that when using Spotify, phones are often in pockets and people look at the screen in quick, split-second moments. 

Without design, the question often becomes, “How do we do something technically?” For those of us working at Spotify, we understand how or why we’re programming something technically in a certain way, but users don’t understand that—nor should they have to. What they need is to experience the product positively, to get something out of it. We’re accountable for creating user value. We really are there to keep the human, the end user, at the forefront. 

Without this thinking, our products would be overcomplicated. Things would be confusing and hard to use, from a functionality perspective. Good design is about simplicity and should largely remain invisible. 

But design is also additive: It adds delight. That’s what I love about projects like DJ or Jam that are actually creating connection and meaning. Design is not afraid to talk about the emotional side—how things make you feel. 

How does design relate to personalization?

Personalization is at the heart of what we do, and design plays an important role in personalization.  

Historically, Spotify’s personalization efforts happened across playlists and surfaces like Home and Search. But over time we utilized new technologies to drive more opportunities for personalization. This started from a Hack Week project back in the day to become Discover Weekly, our first successful algorithmically driven playlist. It then gave way to Blend, which was designed for a more social listening experience. And more recently, to DJ, our new experience that harnesses the power of AI and editorial expertise to help tell artists’ stories and better contextualize their songs. It utilizes an AI voice that makes personalization possible like never before—and it’s a whole new way for our listeners to experience Spotify’s personalization. 

When designing personalized experiences like these, we must think “content first,” knowing people come to Spotify for the content. Design ultimately makes it feel simple and human and creates experiences that users love. If recommendations are a math problem, then resonance is a design problem.

But we also have to have what I like to call “tech empathy”—empathy for the technology itself. My team, which is a mix of product designers and content designers, has to understand how the technology works to design our recommendations for the programming. Personalization designers need to understand the ways in which we’re working with complex technology like machine learning, generative AI, and algorithms. Our designers need to consider what signals we’re getting that will allow our recommendations to get better in real time and overtime. And when a recommendation is wrong, or a user just wants a different mood, we need to design mechanisms for feedback and control. That really came into play when we developed our AI DJ.

Tell us the story of the inception of DJ.

We’re always trying to create more meaningful connections between listeners and creators in new and engaging ways. And we use technology to deliver this value. DJ is the perfect example of how we’re driving deeper, more meaningful connections through technology.

Prior to generative AI, a “trusted friend DJ” would have required thousands of writers, voice actors, and producers to pull this off—something that wasn’t technically, logistically, or financially possible. Now, new technologies have unlocked quality at scale. Xavier “X” Jernigan’s voice and personality delivers on our mission of creating more meaningful connections to hundreds of millions of people. Generative AI made the once impossible feel magical.

To bring DJ to life we answered some core experiential questions knowing we are taking listeners on a journey with both familiar and unfamiliar music. We asked questions such as: What does it mean to give context to listening? How do we visualize AI in a human way? You can see this in how the DJ introduces itself in a playful way—owning that it’s an AI that doesn’t set timers or turn on lights. 

We also put a lot of thought into how we designed the character, since it is more than a voice. 

Ultimately, we really wanted to lean into making it feel more like a trusted music guide, as well as having an approachable personality. So much of our brand is human playfulness, so we made a major decision to acquire Sonantic and create a more realistic, friendly voice. And that led to Xavier training the model to be our first voice. His background and expertise made him the perfect choice.

With new technologies like generative AI, what are some of the new ways you’re thinking about your team and their work?

I’m challenging our team to think differently about the intersection of design and generative AI. We keep coming back to the conclusion that we don’t need to design that differently because our first principles still stand true. For example, we are still taking a content-first approach and we continue to strive for clarity and trust. We’ve realized that tech advancements are accelerating faster than ever, which makes design’s role more important than ever. 

Because there’s so much more complexity out there with generative AI, it means the human needs must be kept in mind even more. At the end of the day, if our users aren’t interested in a product or they don’t want to use it, what did we create it for? 

Emerging technology inspires you to think differently and to look from different angles. The world is trying to figure this out together, and at Spotify we’re not using technology to use technology. We’re using technology to deliver joy and value and meet our goals of driving discovery and connections in the process.

Rachel Bittner on Basic Pitch: An Open Source Tool for Musicians

orange open source and coding symbols on a blue, green, and white background

Music creation has never been as accessible as it is now. Gone are the days of classical composers, sheet music, and prohibitively expensive studio time when only trained, bankrolled musicians had the opportunity to transcribe notes onto a page. As technology has changed, so too has the art of music creation—and today it is easier than ever for experts and novices alike to compose, produce, and distribute music. 

Now, musicians use a computer-based digital standard called MIDI (pronounced “MID-ee”). MIDI acts like sheet music for computers, describing which notes are played and when—in a format that’s easy to edit. But creating music from scratch, even using MIDI, can still be very tedious. If you play piano and have a MIDI keyboard, you can create MIDI by playing. But if you don’t, you must create it manually: note by note, click by click. 

To help solve this problem, Spotify’s machine learning experts trained a neural network to predict MIDI note events when given audio input. The network is packaged in a tool called Basic Pitch, which we just released as an open source project

“Basic Pitch makes it easier for musicians to create MIDI from acoustic instruments—for example, by singing their ideas,” says Rachel Bittner, a research manager at Spotify who is focused on applied machine learning on audio. “It can also give musicians a quick ‘starting point’ transcription instead of having to write down everything manually, saving them time and resources. Basically, it allows musicians to compose on the instrument they want to compose on. They can jam on their ukulele, record it on their phone, then use Basic Pitch to turn that recording into MIDI. So we’ve made MIDI, this standard that’s been around for decades, more accessible to more creators. We hope this saves them time and effort while also allowing them to be more expressive and spontaneous.”

For the Record asked Rachel to tell us more about the thinking and development that go into Basic Pitch and other machine learning efforts, and how the team decided to open up the tool for anyone to access and to innovate on.

Help us understand the basics. How are machine learning models being applied to audio?

Rachel Bittner

On the audio ML (machine learning) teams at Spotify, we build neural networks—like the ones that are used to recognize images or understand language—but ours are designed specifically for audio. Similar to how you ask your voice assistant to identify the words you’re saying and also make sense of the meaning behind those words, we’re using neural networks to understand and process audio in music and podcasts. This work combines our ML research and practices with domain knowledge about audio—understanding the fundamentals of how music works, like pitch, tone, tempo, the frequencies of different instruments, and more.

What are some examples of machine learning projects you’re working on that align with our mission to give “a million creators the opportunity to live off their art”?

Spotify enables creators to reach listeners and listeners to discover new creators. A lot of our work helps with this in indirect ways—for example, identifying tracks that might go well together on a playlist because they share similar sonic qualities like instrumentation or recording style. Maybe one track is already a listener’s favorite and the other one is something new they might like.

We also build tools that help creative artists actually create. Some of our tech is in Soundtrap, Spotify’s digital audio workstation (DAW), which is used to produce music and podcasts. It’s like having a complete studio online. And then there’s Basic Pitch, which is a stand-alone tool for converting audio into MIDI that we just released as an open source project. We open sourced Basic Pitch and built an online demo, so anyone can use it to translate musical notes in a recording (including voice, guitar, or piano).

Unlike similar ML models, Basic Pitch is not only versatile and accurate at doing this, but it’s also fast and computationally lightweight. So the musician doesn’t have to sit around forever waiting for their recording to process. And on the technological and environmental side, it uses way less energy—we’re talking orders of magnitude less—compared to other ML models. We named the project Basic Pitch because it can also detect pitch bends in the notes, which is a particularly tricky problem for this kind of model. But also because the model itself is so lightweight and fast.

What else makes Basic Pitch a unique machine learning project for Spotify?

I mentioned before how computationally lightweight it is—that’s a good thing. In my opinion, the ML industry tends to overlook the environmental and energy impact of their models. Usually with ML models like this—whether it’s for processing images, audio, or text—you throw as much processing power as you can at the problem as the default method for reaching some level of accuracy. But from the beginning, we had a different approach in mind: We wanted to see if we could build a model that was both accurate and efficient, and if you have that mindset from the start, it changes the technical decisions you make in how you build the model. Not only is our model as accurate as (or even more accurate than) similar models, but since it’s lightweight, it’s also faster, which is better for the user, too. 

What’s the benefit of open sourcing this tool?

It gives more people access to it since anyone with a web browser can use the online demo. Plus, we believe the external contributions from the open source community help it evolve as software to create a better, more useful product for everyone. For example, while we believe Basic Pitch solves an important problem, the quality of the MIDI that our system (and others’) produces is still far from human-level accuracy. By making it available to creators and developers, we can use our individual knowledge and experience with the product to continue to improve that quality. 

What’s next for Basic Pitch in this area?

There’s so much potential for what we can do with this technology in the future. For example, Basic Pitch could eventually be integrated into a real-time system, allowing a live performance to be automatically accompanied by other MIDI instruments that “react” to what the performer is doing.

Additionally, we shared an early version of Basic Pitch with Bad Snacks, an artist-producer who has a YouTube channel where she shares production tips with other musicians. She’s been playing around with Basic Pitch, and we’ve already made improvements to it based on her feedback, fixing how the online demo handles MIDI tempo, and other things to make it work better for a musician’s workflow. We partnered with her to use Basic Pitch to create an original composition, which she released as a single on Spotify. She even posted a behind-the-scenes video on her channel showing how she used Basic Pitch to create the track. The violin solo section is particularly cool.

But it’s not just artists and creators that we’re excited about. We’re equally looking forward to seeing what everyone in the open-source developer community has been doing with it. We expect to discover many areas for improvement, along with new possibilities for how it could be used. We’re proud of the research that went into Basic Pitch and we’re happy to show it off. We’ll be even happier if musicians start using it as part of their creative workflows. Share your compositions with us!

Create a cool track using Basic Pitch? Share it on Twitter with the hashtag #basicpitch and tag the team @SpotifyEng.