Jun 19, 2026
What Is Diegetic Sound? a Filmmaker's Guide to Audio
Understand what is diegetic sound and elevate your video projects. Explore examples, subtypes, and expert tips for compelling audio design.
Yaro
19/06/2026 10:06 AMDiegetic sound is any sound that characters in a story can hear, such as dialogue, footsteps, or a car radio. Non-diegetic sound is different. It's audio only the audience hears, like a film's score.
If you make YouTube videos, TikToks, short films, explainers, or branded social content, you've probably felt this choice without naming it. You leave in the coffee shop noise and the room suddenly feels real. You add a dramatic music bed over the top and the same clip starts feeling guided, shaped, and a little more cinematic.
That difference matters more than most beginner guides admit. Knowing what is diegetic sound helps you make better editing calls, cleaner music choices, and smarter licensing decisions. It also keeps you from treating every piece of audio like background filler when some sounds should feel like they live inside the scene itself.
The Sounds Your Characters Can Hear
A simple way to understand diegetic sound is to ask one question: would the character hear this with their own ears if they were standing in the scene?
If the answer is yes, it's diegetic. A person speaking to camera. A bus braking outside the frame. A phone buzzing on the table. Music coming from earbuds, a TV set, or a car stereo. These sounds belong to the story world.
If the answer is no, the sound usually belongs to the audience alone. That's where score, narrator-style voice-over, and many dramatic music beds live.
The everyday creator version
This isn't just film-school vocabulary. Social creators use it constantly.
A travel vlogger walking through a market might keep the vendor chatter, scooter noise, and street musician in the edit. That's diegetic sound. A beauty creator might add soft ambient music under the montage that no one in the room could hear. That's non-diegetic.
The reason this matters is practical. Diegetic sound gives your video place, texture, and believability. It tells the viewer, “you're here.” Non-diegetic sound tells the viewer, “feel this.”
Practical rule: If you want viewers to feel present in the moment, protect the sounds that belong inside the moment.
Why beginners get tripped up
New creators often assume diegetic means “recorded live on set.” That's not quite right. A footstep, door slam, crowd murmur, or radio track can be added later and still count as diegetic if the audience understands it as coming from the scene's world.
That's why sound design is more flexible than it first appears. You're not just documenting reality. You're building a believable audio reality for the viewer.
Three common examples make the idea stick:
- Dialogue in frame: Two people talking in a kitchen.
- Ambient environment: Wind, traffic, birds, air conditioning, crowd noise.
- Source music: A song playing from a visible or implied device inside the scene.
Once you hear sound this way, your edits change. You stop asking only “does this sound good?” and start asking “who is this sound for, the character or the audience?”
Diegetic vs Non-Diegetic Sound Explained
The term comes from diegesis, a Greek word meaning “narration” or “the act of telling,” and film theory later used it to describe the story world itself. In practical film terms, the core split is simple: if characters can hear it, it is diegetic; if only viewers hear it, it is non-diegetic, as explained in this overview of diegesis and film sound.
That sounds academic until you turn it into a test you can use while editing.
The ear test
Think of the camera as not just a lens but a position inside a world. Then ask:
- Can the people in the scene hear this sound?
- Is the sound coming from something that exists in that world?
- Would removing the visual still make the sound feel connected to the scene?
If yes, you're probably dealing with diegetic sound.
A classic example helps. In a space fantasy, the hum of a weapon, the hiss of a door, and the dialogue between characters are diegetic. The sweeping orchestral music under the scene is not. If you want a refresher on how a score works differently from in-world sound, this guide on what a musical score is in a film is useful.
Diegetic vs. Non-Diegetic Sound At a Glance
The purpose is different
Neither type is better. They do different jobs.
Diegetic sound grounds the viewer. It tells you where you are, how close something is, and what the environment feels like. Non-diegetic sound guides interpretation. It can add tension, irony, warmth, or pace without pretending it exists inside the scene.
A scene can look polished and still feel fake if the audio perspective doesn't match what the character would hear.
That's why creators often get more impact by lowering the music bed and letting in-world sound lead for a few seconds. A subway announcement, a skateboard wheel, a hiss from a pan, or a laptop notification often does more storytelling work than another layer of generic background music.
Exploring the Subtypes of Diegetic Sound
Most explainers stop at “characters can hear it.” That's where confusion starts. Some of the hardest calls involve edge cases like inner thoughts, off-screen sources, or music that shifts its role during a scene. That gap shows up often enough that recent guidance specifically points to borderline examples such as voice-over, internal thoughts, and music that moves between source music and score in this discussion of diegetic sound edge cases.
On-screen and off-screen sound
Some diegetic sounds come from a source you can see. A drummer in frame. A kettle whistling on the stove. A creator shutting a car door on camera.
Others are off-screen but still belong to the same world. You hear a dog barking from another room. You hear traffic before the street appears. You hear your subject's friend call from behind the camera.
Off-screen doesn't make a sound non-diegetic. It just means the source is implied rather than shown.
Internal sound
At this point, people hesitate. A memory, a muffled subjective effect, or a private thought can feel “inside” the story even if no other character hears it.
For creators, the useful question is not whether a theory textbook would debate it. The useful question is whether the sound represents the character's lived experience inside the scene. If it does, treat it carefully and make the intention clear through editing.
Examples:
- A ringing tone after impact: You're hearing what the character perceives.
- A recalled line in an echo: You're inside a memory.
- A private thought track: This can be borderline and needs clear context.
Source music and shifting sound
Source music is music that comes from a physical source in the scene, like a phone speaker, earbuds, a bar jukebox, or a live band.
That category matters because music can move. A song may begin as something playing in the room, then swell into a fuller cinematic layer for the audience. Once you notice that move, you start hearing how editors guide attention and emotion with a lot more precision.
Powerful Examples in Film and Social Video
One of the clearest modern examples is the use of playlist-driven scenes in films where a character listens to music that also shapes the rhythm of the edit. At first, the song feels like part of the character's world. Then the mix or presentation can make it feel larger, more stylized, and more audience-facing. That's one reason this topic matters beyond definitions.
For social video, the same principle works on a smaller scale.
Social examples that actually help
A food creator records onions hitting a hot pan, a knife on a cutting board, and the low kitchen room tone. Those sounds are often more persuasive than a music-first edit because they make the viewer feel present at the stove.
A street interview creator leaves in passing sirens, shoe noise, and the little mic handling sounds around cuts. Used carefully, those details create location and immediacy.
A comedy creator shoots a skit where a song is playing softly from a phone on the couch. That track isn't just decoration. It becomes part of the scene, so the character reactions feel anchored.
Why this matters on short-form platforms
TikTok, Shorts, and Reels reward speed, but viewers still respond to audio cues that feel real. Diegetic sound helps short videos feel less like templates and more like moments that happened.
Try these swaps:
- Travel content: Keep market chatter, train station announcements, and bag rustle instead of covering every second with music.
- Fitness content: Let jump rope hits, breath, and floor contact carry a few cuts.
- Desk setup or study content: Keyboard taps, chair movement, and lamp clicks can create atmosphere without saying a word.
When a short video feels grounded, viewers usually aren't thinking “nice diegetic sound.” They just believe the moment faster.
You don't need a film budget to get that effect. You need intention. Decide which sounds prove the scene is real, and let those sounds survive the edit.
How to Capture and Use Diegetic Sound
A creator films a coffee pour for TikTok. The shot looks good, but the edit feels flat until the cup clink, steam hiss, and small room hum come back in. That is the job of diegetic sound. It gives the image a body.
Good results usually come from two methods. You record the sound while filming, or you add and shape it later in editing. Both count, as long as the final sound belongs to the world on screen.
For a broader workflow view, this guide to audio for video production helps connect capture, editing, and final delivery.
Capture what the scene needs
Start with the sounds that prove the action happened.
If someone sits in a chair, the chair should speak. If a hand picks up keys, the keys should have weight. New creators often focus on dialogue and forget the small mechanical noises around it, but those details are what make a scene feel lived in.
A simple capture order helps:
- Record clear dialogue first: If speech matters, protect that track before chasing atmosphere.
- Grab room tone next: Let the space breathe for a few seconds without talking or movement.
- Collect action details: Footsteps, fabric movement, taps, clicks, bags, dishes, doors, and other sounds tied to visible actions.
- Get one safety take: Record a little extra of the main sound in case the first version gets covered by wind, traffic, or handling noise.
Phone shooters can do this too. Move closer to the sound source, pause before and after the action, and record one clean pass of the sound by itself. A close recording of sneakers on pavement or a zipper closing will usually beat trying to fix a weak version later.
Build diegetic sound in post
Post production is where many creators get confused, so keep the rule simple. If the character could hear it in that moment, it can still be diegetic even if you added it later.
That means you can replace weak footsteps, strengthen a door close, add a cleaner phone vibration, or lay in soft traffic under a street scene. Foley, edited location sound, and added effects all work if they match the space, timing, and action on screen.
A practical workflow looks like this:
- Cut the spoken audio or main sync sound first.
- Add the base environment, such as room tone, street wash, or kitchen air.
- Layer action sounds where the edit needs clarity.
- Check that each added sound feels attached to something visible or implied in the scene.
The goal is believability, not perfection. Social video especially benefits from sounds that feel present and specific, even when they are lightly sweetened in post.
Using licensed music as a diegetic element
Music inside the scene works like any other prop. If viewers see a phone, car stereo, Bluetooth speaker, laptop, or store PA, the music can come from that source and remain diegetic.
This matters for YouTube and TikTok creators because the creative decision and the legal decision are separate. You can license a track from a catalog such as LesFM and use it as in-world music if your license covers that platform and project type. Then your editing job is to make the song behave like real source audio, not like a polished soundtrack dropped on top.
Treat it like sound with a location. Lower it when the camera moves away. Let voices, doors, and object noise compete with it. Roll off some top and low end if the music is supposed to come from a small speaker. Those choices help the audience accept that the characters are hearing it too.
Mixing and Licensing Tips for YouTube Creators
You upload a vlog scene where music is playing from a phone on the table. The song is licensed, the edit is clean, and the shot works. Then the audio gives it away. The track sounds too full, too wide, and too polished for a tiny speaker, so viewers hear it as background score instead of part of the room.
Make the source believable
Diegetic music works like any on-screen object. A lamp casts light based on where it sits. A speaker should behave the same way with sound.
A phone, car stereo, hallway speaker, and café PA do not reproduce music in the same way. If you use one clean full-range track for all of them, the audience may not know the technical reason, but they will feel that something is off.
Use the mix to tell the viewer where the sound is coming from:
- Narrow the frequency range: Small speakers usually lose deep bass and crisp top end.
- Match the room: Add reverb or reflections that fit the visible space.
- Show distance with tone and level: Farther sources get quieter and less clear.
- Let the world interfere: Doors, movement, walls, and camera position should change how the music feels.
A good test is simple. Ask, "Would the character hear this exactly this way if they were standing there?" That question keeps diegetic sound practical instead of abstract.
Know what your license covers
Creators often mix up two separate decisions. One is storytelling. The other is permission.
A track can be fully licensed and still function either way in your edit:
- Diegetic music, like a song coming from a laptop in a tutorial or vlog
- Non-diegetic music, like a music bed under a montage
- Both in one video, if it starts as source music in the room and later expands into score
That distinction matters on YouTube and TikTok because the platform rules do not decide the storytelling role for you. Your license tells you whether you can use the track on that platform, in monetized content, for client work, or in ads. Your mix tells the audience whether the music belongs inside the scene.
If you need a plain-language breakdown, this guide on how to license music for YouTube covers the practical checks to make before you publish.
Ask two questions before export. "Am I allowed to use this track here?" and "Does it sound like it lives inside this scene?"
A simple final check before export
Play the scene once with your eyes closed.
You should be able to tell where the music is, how far away it feels, and whether a person in the shot could hear it. If the song feels pasted on top, too loud for the space, or emotionally detached from the room, it is probably reading as non-diegetic even if you placed a speaker on screen.
That is where social video creators get a real advantage from understanding diegetic sound. You can make licensed music feel native to the moment instead of edited over it. The result is more believable scenes, clearer intent, and fewer uploads where the sound design breaks the illusion.
If you need music for videos where sound has to support the scene, not overpower it, LesFM offers a catalog creators can license for different publishing needs. That can be useful whether you want a track to play like source music inside a scene or sit outside the action as a more traditional score.