Media File Size Calculations

From MusicTechWiki

Revision as of 18:36, 4 August 2023 by BruceTambling (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Student ResourcesReference Mixes HomeMedia File Size Calculations

These calculations will help you to estimate the size of audio files. The main thing that determines the quality of an audio or video file is:

  1. CODEC
  2. Bitrate (also called data rate)

Content is King but audio QUALITY matters!

MP3 File Size Calculations

  • MP3s are considered a "delivery CODEC format" (and are NOT used for "production" or 'content CREATION' of professional, master quality work
  • Formula: Kbps = bits per second / 8 = Bytes per second x 60 seconds = Bytes per minute x 60 minutes = Bytes per hour
Bitrate File size per second File size per minute File size per hour
8 Kbps 1 KB 60 KB 3.6 MB
16 Kbps 2 KB 120 KB 7.2 MB
32 Kbps 4 KB 240 KB 14.4 MB
40 Kbps 5 KB 300 KB 18.0 MB
48 Kbps 6 KB 360 KB 21.6 MB
56 Kbps 7 KB 420 KB 25.2 MB
64 Kbps 8 KB 480 KB 28.8 MB
80 Kbps 10 KB 600 KB 36.0 MB
96 Kbps 12 KB 720 KB 43.2 MB
112 Kbps 14 KB 840 KB 50.4 MB
128 Kbps 16 KB 960 KB 57.6 MB
160 Kbps 20 KB 1.20 MB 72.0 MB
192 Kbps 24 KB 1.44 MB 86.4 MB
224 Kbps 28 KB 1.68 MB 100.8 MB
224 Kbps 28 KB 1.68 MB 100.8 MB
256 Kbps 32 KB 1.92 MB 115.2 MB
320 Kbps 40 KB 2.40 MB 144.0 MB

PCM File Size Calculations

Here are some file size calculations for common PCM audio settings. PCM stands for Pulse Code Modulation and commonly uses the file extensions .wav or .cda. There are quite a few other combinations of bits per sample and samples per second which may be used. We have included calculations for the most common mono (one channel) and stereo (two channel) settings.

Mono

Formula:

Bits per sample x samples per second = bits per second / 8 = Bytes per second x 60 seconds = Bytes per minute x 60 minutes = Bytes per hour of mono.

Settings Bitrate File size per second File size per minute File size per hour
16 bit, 44.1 KHz 705.6 Kbps 88.2 KB 5.292 MB 317.52 MB
16 bit, 48 KHz 768 Kbps 96 KB 5.750 MB 345.60 MB
24 bit, 48KHz 1,152 Kbps 144 KB 8.640 MB 518.40 MB
24 bit, 96KHz 2,304 Kbps 288 KB 17.280 MB 1.0368 GB

Stereo

Formula:

Bits per sample x samples per second = bits per second x 2 channels = bits per second of stereo / 8 = Bytes per second of stereo x 60 seconds = Bytes per minute of stereo x 60 minutes = Bytes per hour of stereo.

Settings Bitrate File size per second File size per minute File size per hour
16 bit, 44.1 KHz 1,411.2 Kbps 176.4 KB 10.584 MB 635.04 MB
16 bit, 48 KHz 1,536 Kbps 192 KB 11.520 MB 691.2 MB
24 bit, 48KHz 2,304 Kbps 288 KB 17.28 MB 1.036 GB
24 bit, 96KHz 4,608 Kbps 576 KB 34.56 MB 2.0736 GB

Apple Pro Res Videos

This video format is higher quality than the MP4s on YouTube or recorded on your phone. Actually, Apple recently announced the feature for new iPhones to be able to record video in ProRes format. ProRes formats are HUGE. They look better than MP4 and are better for editing and color correction. You might think of ProRes vs. MP4 video in a similar way as audio WAV files vs. MP3. And, the audio files embedded in ProRes video can be lossless hi res audio files. The audio files in MP4 videos are always compressed to the AAC CODEC. This is the equivalent audio quality of an MP3. In other words, the audio quality in the best MP4 will always sound like an MP3.

Data rates

ProRes supports different data rates and different resolutions. All ProRes422-variants use Chroma subsampling of 4:2:2 at 10 Bit Color depth. ProRes 4444 and 4444 XQ samples color in the 4:4:4 schema with a color depth of 10 or 12 bits, and can optionally include an alpha channel.

Resolution fps ProRes 422 Proxy ProRes 422 LT ProRes 422 ProRes 422 HQ ProRes 4444 (without Alpha) ProRes 4444 XQ

(without Alpha)

(points) (Hz) (Mbit/s) (Mbit/s) (Mbit/s) (Mbit/s) (Mbit/s) (Mbit/s)
720 × 576 50i, 25p 12 28 41 61 92 138
1280 × 720 25p 19 42 61 92 138 206
1440 × 1080 50i, 25p 32 73 105 157 236 354
1920 × 1080 50i, 25p 38 85 122 184 275 415
50p 76 170 245 367 551 826
2048 × 1536 25p 58 131 131 283 425 637
50p 117 262 377 567 850 1275
3840 × 2160 25p 151 342 492 737 1106 1659
50p 303 684 983 1475 2212 3318
4096 × 2160 25p 162 365 524 786 1180 1769
50p 323 730 1049 1573 2359 3539
5120 × 2880 25p 202 456 655 983 1475 2212
50p 405 912 1311 1966 2949 4424e

ProRes 422

Key features

  • 8K, 5K, 4K, UHD, 2K, HD (up to 1920×1080), & SD resolutions
  • 4:2:2 chroma subsampling
  • up to 12-bit sample depth [6]
  • I frame-only encoding
  • Variable bitrate (VBR) encoding
  • Normal 147 Mbit/s and High-Quality 220 Mbit/s and ProRes (LT) 100Mbit/s as well as ProRes Proxy for HD 45Mbit/s for HD resolution at 60i
  • Normal 42 Mbit/s and High-Quality 63 Mbit/s for SD resolution at 29.97
  • Fast encoding and decoding (both at full size and half size)

ProRes 4444 and ProRes 4444 XQ

ProRes 4444 and ProRes 4444 XQ are lossy video compression formats developed by Apple Inc. for use in post-production and include support for an alpha channel.

ProRes 4444 was introduced with Final Cut Studio (2009)[7] as another in the company's line of intermediate codecs for editing material but not for final delivery. It shares many features with other, 422, codecs of Apple's ProRes family but provides better quality than 422 HQ in color detail.[8] It has a target data rate of approximately 330 Mbit/s for 4:4:4 sources at 1920x1080 and 29.97 fps

ProRes 4444 XQ was introduced with Final Cut Pro X version 10.1.2 in June 2014. It has a target data rate of approximately 500 Mbit/s for 4:4:4 sources at 1920x1080 and 29.97 fps, and requires OS X v10.8 (Mountain Lion) or later.

Key features

  • 8K, 5K, 4K, 2K, HD (up to 1920×1080), & SD resolutions[9]
  • 4:4:4 chroma subsampling
  • Up to 12-bit sample depth for video
  • Variable bitrate (VBR) encoding
  • Alpha channel support at up to 16-bit sample depth

ProRes RAW

In April 2018 Apple released ProRes RAW. It is built upon the same technology as other ProRes codecs, but is directly applied to the raw data coming from the sensor, thus delaying the debayering process to the post-production stage. ProRes RAW therefore aims at quality and better color reproduction, rather than performance.

How to Accurately Calculate Video File Size

Video file size can be a tricky thing. How large is the one you just recorded? This complex storage format holds a lot of information and there are many reasons why you may want to check the size of it. In order to get the most accurate calculation, we need to start by dispelling a common myth:

Video file size depends on the bitrate but not the video resolution.

Bitrate is the most important factor in determining a video file size. Technically-speaking, you can have a 4K video with a lower bitrate than a 720p video. However, in this instance, the 4k video quality would appear poor but take less space on the disk when compared to a 720p video. And if your video contains audio? That track has its own bitrate as well.

File Size = Bitrate x duration x compression ratio

Here is a reference chart taken from sample videos found on Youtube/internet

Resolution Bitrate 1 minute Recording Duration per GB
4K (UHD) 20 Mbps 84MB 12 minutes
1080p (FHD) 5 Mbps 20MB 50 minutes
720p (HD) 1 Mbps 5MB 3.5 hours
480p (SD) 500 Kbps 2MB 8 Hours

The above table is for heuristic estimation and reference only. There are a lot of other factors influence the actual video file size such as compression ratio, variable bitrate, color depth.

Bitrate = Frame size x Frames Rate

Although the original intent to write about video file size, read along if you would like to learn more about videos, overall. This guide purposefully hides complex details to simplify the understanding of most common terms and their usage.

A Glossary of Terms

  1. Frame: Any static picture you see on your screen while playing or pausing a video is called a frame. They are consecutively presented in such a manner as things appear moving on the screen. That’s why video is also called moving pictures. A frame behaves just like a photo, and all the attributes such as color depth and dimension. A 1080p or full HD video will have frames of size 1080×1920 pixels with each pixel storing RGB (Red, Green, Blue) 8-bit color data and maybe some more. The frames are presented usually at a constant rate called frame rate.
  • Frame Rate: The number of frames (frame rate) presented on screen per second is represented as with FPS or frames per second. A typical video can have 15 to 120 frames per second. 24 is used in movies and 30 FPS on common on TV. The frame rate should not be used interchangeably with shutter speed. Shutter speed is an in-camera setting used to determine the amount of motion blur in film production. More FPS means smoother playback but a bigger file. The approximate size of each uncompressed frame is 5MB. At 30 frames per second, a raw HD video will need 5MBx30 = 150MB storage space per second. We are going to need around 540GB per hour for the raw footage. that’s a lot of disk space even today. Many of our storage drives can’t even write to a disk that fast. However, you usually won’t need that much space, thanks to compression and lossy encoding (quality compromise to save disk space) techniques. Compression reduces the space required to store similar frames that have fewer moving parts. Such as a landscape scene with little or no motion between frames. Since motion in scenes can drastically change in most videos, some encoders allow encoding at a variable bit rate by consuming more than average when needed and less when the scene is mostly static.
  • Encoding: Encoding is the process of digitization of analog video streams. Just like getting an electric wire feed from the camera and storing the content in a .mov file. The process may happen in hardware or software. Many digital cameras encode video natively, without needing to have any additional software post-processing and requiring less storage space. The conversion between different file formats is called transcoding. These terms have different meanings but are used interchangeably since digital cameras have greatly eliminated the need for encoding these days.
  • Codec: Codec is the program that is responsible for the encoding and compression of the video and audio tracks. A lossless raw encoder may not compress the data hence need a lot of storage space to store every bit of the video feed. A lossy codec such as H.264 could store the same video on a fraction of filesize. Different codecs are used to achieve a balance between quality and storage space.
  • H.264 aka AVC (Advanced Video Coding) by the MPEG group is internets current popular codec. This codec is widely supported by most mobile devices, web browsers, and operating system vendors thankfully requiring many different formats for playback like the old days. Mp3 by MPEG group and AAC (Advanced Audio Coding) by Apple are the most popular audio codec on the internet. Since the mp3 patents have expired AAC is being recommended. A newer video codec H.265 aka High-Efficiency Video Coding or HEVC is now available as the successor of the H.264 codec. H.256 provides better compression and faster decompression. This codec is being promoted for use by video pioneers such as Netflix and Youtube to improve the streaming video quality and experience, especially on slower connections.
  • Containers: Often called file formats such as MP4, MOV, AVI, WMV, MKV, and WebM. There are a lot of different container formats. MP4 is very popular on the web and WebM is an open container format being actively promoted by Google for royalty-free internet use. The container is a file format that describes how the tracks (video/audio/subtitles) stored inside the file. The file format is just a matter of choice often used along with well-known codecs that work together. Some containers allow streaming video playback while others require the file to be downloaded entirely before playback. Since these container formats support different feature sets and require some agreement and royalty payment by the manufacturer, vendors tend to prefer one format over another.
  • MP4: MP4 (MPEG-4 Part 14) is a well-known internet container/file format that is supported by a wide range of devices such as mobile phones and digital cameras. This container allows storage of multiple video, audio, subtitles, and other metadata, where containers such as mp3 container only allowed audio tracks and a limited set of metadata inside it. A variant of this format supports progressive streaming, this is the most preferred format for internet video playback.

Above are the main factors used for determining the file size.

  • HDR: High dynamic range. Modern TVs and cameras are able to capture greater details of images and video in senses that contain brighter and dark objects. In traditional SDR (Standard dynamic range) images were either bright or dark depending on the contrast application. HDR format can, however, capture more information per pixel (32 bits) and let the display decide the actual contract at the time of presentation. This method requires double the amount of storage file size and some advanced compression technique that can impact the final file size when applied.
  • Audio: Some containers allow multiple audio tracks embedded in the video files. Hence the size of the video depends on no of tracks and bitrate of the audio as well. 192Kbps bitrate is considered good quality audio for stereo sound.
  • Encryption: Video security mechanisms such as DRM (Digital Rights Management) that use encryption to protect playback of the content on authorized devices. For example, Netflix only allows you to play their video only if you have an active membership. This is often done to implement licensing and prevent piracy. This protection usually increases the file size due to metadata inclusion.
  • Video streaming: Video streaming is a process of watching a video over a network without having to download the entire video file. This technique often begins by buffering (downloading some metadata and the portion of video currently being watched) parts of the video and provides seeking and skipping parts that are not being watched. Streaming provides smoother watching experience and requires less network bandwidth and disk storage.