Fine Tuned Video Compression in iOS (Swift 4, no dependencies)

Back to Blog Home

Fine Tuned Video Compression in iOS (Swift 4, no dependencies)

TLDR; Skip to the “How do I compress a H264 video?” section if you know what “bitrate” and “codec” mean. Skip to the end of the article if you’re just looking for code.

Chances are there will be a time you’ll need to capture a video within your app, compress it to a respectable size, then send it to your server to fulfill some feature of your app. Maybe your app is a video sharing platform where people showcase their talent. Or maybe your app overlays some special effects on people’s faces. The more users your app have, the more bandwidth your server will require, not to mention the storage space. At this age of technology where most iPhones are more powerful than some desktop gaming computers and bundled with industrial-grade cameras, the disk space a single video consumes increases tremendously each year. Unfortunately, cloud service providers’ price tag for storage and bandwidth do not reflect the inverse of the rise. You as the obsessive optimization guru know there must be a way to utilize all the horsepower in our phones. Below is the line by line explanation of how to do that.

What sets the size of a video file?

A video file is actually two files taped to each other, assuming the video is not 3D and the audio is not multilanguage. One is a silent slideshow of pictures and the other is audio to be played along to slideshow. Both of these parts contribute to the final video size and the only determining factor of their own size is this thing called “the bitrate”.

Bitrate is the amount of bits (8 bits = 1 byte) required per one second of a video/audio file. If you know the bitrate of a video and audio as well as the duration of those in seconds, you can accurately calculate the final file size even if you have no idea what the video resolution or playback frame rate is. In other words, two video files compressed with the same bitrates (i.e one is HD and the other is not) will ALWAYS consume roughly the same amount of storage space.

The final formula to calculate how many MBs to expect after the compression is simply:

 

Duration (seconds) x (Video Bitrate + Audio Bitrate)

———————————————————————-        MB

                            8 x 1024 x 1024

 

The calculation does not include the space required by the metadata and file headers but most of the time, their contributions are negligible.

The unit of bitrate is bits per second, or bps. 1000 bps equals to 1 kilobits per seconds (kbps), 1000 kbps equals to 1 megabits per seconds (mbps), you get the point.

Who chooses this bitrate thing?

It is chosen every time a video is compressed, by the person who compresses the video. Initially, when the video is captured by the camera, the software inside the camera assumes a high, general purpose bitrate to maximize the quality. Then the user of the initial compression (you) are allowed to re-compress as you desire. If you decide to leave the initial video as it is, it is guaranteed to consume more storage for the same duration with each release of a new iPhone/iPad.

Bitrate represents the amount of information a video file is allowed to store. The more information there is, the better the visuals and the audio. A video of twice the framerate of another video otherwise the same, will require twice the bitrate (ignoring the audio) to visually appear the same. A video of twice the width and height (in pixels) will require at least four times the bitrate not to sacrifice some color detail.

As the developer, knowing the purpose why the video is captured have the power to decide how much quality you are willing to sacrifice to save some storage and bandwidth. In many cases, the only valid option is to try as many different bitrates as you can and compare the results for the best compromise.

If bitrate is the king, why are there so many codecs/algorithms/standards?

Because some codecs work better for high bitrates but suck at lower values. Some work really well if the video isn’t shaky. The preference of the codec is not a quantitative decision, but a qualitative one. In mobile, if you are not willing to include external libraries to support other standards, in most phones your best option tend to be H264.

How do I compress a H264 video?

In this tutorial, we are going to use Swift 4 with no external dependencies. Native AVFoundation Framework includes all the tools we need. The resulting code will work on iOS 9 and greater. Once we fully implement the code, what we will get is a single free function such that when called, it will compress a single video file with a given configuration and notify us for success or error with callbacks. All the work will be done in a background thread, therefore our app’s UI will remain responsive enough to do other work such as showing loading animations.

Let’s make a list of things we’d like our function to be able to do.

  1. Grab a video for a given URL
  2. Ask the destination URL for the compressed video
  3. Optionally adjust the video resolution
  4. Fix orientation issues caused by device cameras (i.e mirror the video for front camera)
  5. Set compression parameters (bitrate, sample rate etc)
  6. Notify with a callback once the compression is complete
  7. Notify with a callback if an error occurs
  8. Allow canceling a started compression and notify the cancel with a callback

We are going to use a test driven approach and start with a function interface which doesn’t compile but express everything we listed above. Then we will implement the missing types and functionality one by one to realize our goal.

func compressh264VideoInBackground(
    videoToCompress:      URL, 
    destinationPath:      URL, 
    size:                 CompressionSize?, 
    compressionTransform: CompressionTransform, 
    compressionConfig:    CompressionConfig, 
    completionHandler:    @escaping (URL)->(), 
    errorHandler:         @escaping (Error)->(), 
    cancelHandler:        @escaping ()->()
) -> CancelableCompression {
    // implementation
}

We used native URL type to ask for file paths.

We used @escaping blocks for our callbacks. The reason they are @escaping is because they will be called some other time, after our function returns, in a background thread to be precise. Completion callback will be notified with a URL parameter representing the compressed file path and the error callback will be notified with an object conforming to Error protocol.

We used four non existing types which we will implement to represent,

  1. the video resolution: CompressionSize? (nil for not adjusting)
  2. orientation fixes: CompressionTransform
  3. fine tuning: CompressionConfig
  4. and the cancel operation: CancelableCompression.

We could have used CGSize for the resolution but its nested width and height values has the type of CGFloat which clashes with the fact that width or height of a video can only be integers. A named tuple will do the trick here.

typealias CompressionSize = (width: Int, height: Int)

As you may already know, it is sometimes desirable to mirror the video captured by the front camera to visually not distract users. Also there is this small case that if a video captured by the back camera uses the portrait mode instead of landscape, initially captured video will express this orientation as EXIF metadata instead of literally arranging the pixels.

EXIF is the method for cameras to attach additional information to captured media. Some video players honor the instructions given in EXIF and adjust the video accordingly but sometimes they don’t. If such case happens, your portrait video may appear flipped by 90 degrees and look funny. In order to handle such errors in a conservative manner, one way is to apply the transformation in the EXIF to pixel data directly, then remove the EXIF entirely.

We will leave this decision of fixing video orientation to our users via an enumeration.

enum CompressionTransform {
    case keepSame
    case fixForBackCamera
    case fixForFrontCamera
}

There are many configurable compression parameters other than bitrate. This tutorial will only deal with what is available and really relevant. We will also provide a default configuration as a quality of life improvement. A curious reader can find information about missing parameters in AVFoundation documentation if he/she wishes, as the approach is quite straightforward to extend.

struct CompressionConfig {
    let videoBitrate: Int
    let videomaxKeyFrameInterval: Int
    let avVideoProfileLevel: String
    let audioSampleRate: Int
    let audioBitrate: Int
    
    static let defaultConfig = CompressionConfig(
        videoBitrate: 1024 * 750, // in bps
        videomaxKeyFrameInterval: 30,
        avVideoProfileLevel: AVVideoProfileLevelH264High41,
        audioSampleRate: 22050,
        audioBitrate: 80000 //in bps
    )
}

As you may have already guessed, videoBitrate and audioBitrate reflect what they are named after. For example, the music you buy from iTunes has the audio bitrate of 256 kbps. You can adjust it to your desire if you find 80 kbps too low.

Youtube suggests to use a video bitrate of roughly 5 mpbs for videos of 720p if you’d like it to look well on all kinds of devices. Since a useful approach is to try multiple bitrates and compare results for the best visuals, a really low bitrate of 768 kbps (1024 times 750 bps) is a good start for incremental changes.

Audio sample rate is the rate the output audio samples produced by your speaker/headphones can change in a second. The higher the value, the more digital nature of the audio is hidden to the listener. Most of the time, 22050 sounds good enough for cheap headphones. 44100 is considered the standard for CD Players. 48000 is the sample rate used by DVDs. Blu-rays sometimes go as high as 96000 but for mobile, it is almost always overkill to use such high values. The unit of audio sample rate is Hertz (Hz).

Modern compression algorithms tend to abuse the fact that a video taken by a person cannot jump long distances or angles. It is quite common for videos to show content taken by a moving or rotating camera instead of random still pictures. If two consecutive video frames contain similar scenes, the compression algorithm can eliminate the duplicate pixels if exists, hence saving a lot of space.

 

Picture 1: Comparison of two video samples, one is uncompressed, the other is compressed. (source: https://sintwo.wordpress.com/2014/11/25/the-art-of-datamoshing/)

 

The waterfall and the man in the above picture are the only moving objects in the scene. Thus, the surrounding rocks needn’t to be stored multiple times in the video data. After the initial key frame, the rest of the consecutive ones are stored as sparse pixel arrays.

This technique works really well for small cuts of the content but every once in a while, a full picture of the scene should be included in the video to avoid unnecessary artifacts. In order to limit this kind of behavior in the compression, videomaxKeyFrameInterval is used to specify how many sparse frames we are allowed to use before we insert a full one. The higher the value, the more artifacts we may see but the less time it takes to compress. The lower the value, the more time it may take to compress the video as well as some video players can stutter while drawing whole frames too close to each other. It is a tradeoff between compression/playback performance and visual quality instead of storage space and visual quality like in the bitrate case.

Our last configuration parameter is avVideoProfileLevel, which sets the compatibility level of our video. H264 standard defines many profile levels to capture many devices’ capabilities, ranging from really low-end, low performance devices to really powerful, 4K TV processors. Profile level of High 4.1 is a common denominator among a vast number of smartphones, so we choose the default value of AVVideoProfileLevelH264High41 in our implementation. You can find the available options in AVFoundation documentation.

Lastly, we’d like to define a wrapper for a single boolean to allow interruption of previously started compression. We will return an instance of it to the caller to let him/her notify the background thread for cancelling compression loop.

class CancelableCompression {
    var cancel = false
}

We will need some utilities before we get into the implementation: an error type to express unexpected behavior in our compression, a simple utility extension to convert angle values between degrees and radians and a global dispatch queue for our background thread.

fileprivate let compressQueue = DispatchQueue(
    label: "compressQueue", 
    qos: .background
)

extension Int {
    fileprivate var degreesToRadiansCGFloat: CGFloat { 
        return CGFloat(Double(self) * Double.pi / 180) 
    }
}

struct CompressionError: LocalizedError {
    let title: String
    let code: Int
    
    init(title: String = "Compression Error", code: Int = -1) {
        self.title = title
        self.code = code
    }
}

We are ready to implement our compression function. The rest of the tutorial will simply be plumbing. We will configure and attach instances of classes provided by AVFoundation to each other, set some flags, queue a while loop that compresses video frames one by one to background thread and notify our caller via provided callbacks.

First insert the following imports to the beginning of your file to prevent native type errors.

import AVFoundation
import AssetsLibrary
import Foundation
import QuartzCore
import UIKit

Let’s remember our previously defined interface.

func compressh264VideoInBackground(
    videoToCompress:      URL, 
    destinationPath:      URL, 
    size:                 CompressionSize?, 
    compressionTransform: CompressionTransform, 
    compressionConfig:    CompressionConfig, 
    completionHandler:    @escaping (URL)->(), 
    errorHandler:         @escaping (Error)->(), 
    cancelHandler:        @escaping ()->()
) -> CancelableCompression {

We will close the curly brace at the end of the article.

Let’s define the thread local variables that are required during the compression by combining them in a custom, nested type.

class CompressionContext {
    var cgContext: CGContext?
    var pxbuffer: CVPixelBuffer?
    let colorSpace = CGColorSpaceCreateDeviceRGB()
}

Apple uses a simple naming convention inside its media frameworks. Types prefixed by CV extends to Core Video, meaning they are video related. Types prefixed by CI extends to Core Image, meaning they are image, hence pixel related. Types prefixed by CG extends to Core Graphics, meaning they are GPU (OpenGL or Metal) related.

CGContext is required to draw frames. It will use internal OpenGL driver to do the drawing as well as the orientation fix. Since we will compress video frames one by one, we only require one of these which we will recycle multiple times.

CVPixelBuffer is required to store temporary pixel data for a single frame. Since we will compress video frames one by one, we only require one of these which we will recycle multiple times.

All of our pixels will be opaque, thus we only need a color space of RGB.

Our compression loop will read the source video frame by frame, manipulate the read frame according to a specified transformation (to resize, rotate etc), and write it to a destination buffer (pxbuffer). Then the framework will compress the buffer. If we had a simple utility function that does such thing for a single frame, we can use it unlimited times in a loop! Below function does that in the documented manner.

func getCVPixelBuffer(
    _ i: CGImage?, 
    compressionContext: CompressionContext) -> CVPixelBuffer? {
      
    // Unwrap our image to get rid of safety checks (assume frame is ready)
    weak var image = i!
    let imageWidth = image!.width
    let imageHeight = image!.height
    
    let attributes : [AnyHashable: Any] = [
        kCVPixelBufferCGImageCompatibilityKey : true as AnyObject,
        kCVPixelBufferCGBitmapContextCompatibilityKey : true as AnyObject
    ]
        
    // If first frame, allocate the buffer, otherwise recycle the previous
    if compressionContext.pxbuffer == nil {
        CVPixelBufferCreate(kCFAllocatorSystemDefault,
                            imageWidth,
                            imageHeight,
                            kCVPixelFormatType_32ARGB,
                            attributes as CFDictionary?,
                            &compressionContext.pxbuffer)
    }
        
    // Draw frame to newly allocated buffer
    if let _pxbuffer = compressionContext.pxbuffer {
        let flags = CVPixelBufferLockFlags(rawValue: 0)
        CVPixelBufferLockBaseAddress(_pxbuffer, flags)
        let pxdata = CVPixelBufferGetBaseAddress(_pxbuffer)
            
        // If first frame, allocate a context
        if compressionContext.cgContext == nil {
            compressionContext.cgContext = CGContext(
                data: pxdata,
                width: imageWidth,
                height: imageHeight,
                bitsPerComponent: 8,
                bytesPerRow: CVPixelBufferGetBytesPerRow(_pxbuffer),
                space: compressionContext.colorSpace,
                bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue)
        }
            
        if let _context = compressionContext.cgContext, let image = image {
            _context.draw(
                image, 
                in: CGRect(x: 0, y: 0, width: imageWidth, height: imageHeight)
            )
        }
        else {
            CVPixelBufferUnlockBaseAddress(_pxbuffer, flags)
            return nil
        }
        
        CVPixelBufferUnlockBaseAddress(_pxbuffer, flags)
        return _pxbuffer;
    }
        
    return nil
}

We now have a way to copy pixels from a video frame to a buffer. Lets instantiate the utilities to read and write frames from a source video into a destination file. In order to do that, we will use AVURLAssetAVAssetReader and AVAssetWriter classes.

// Asset, Output File
let avAsset = AVURLAsset(url: videoToCompress)
let filePath = destinationPath

do {
// Reader and Writer
let writer = try AVAssetWriter(
    outputURL: filePath, 
    fileType: AVFileType.mp4
)
let reader = try AVAssetReader(asset: avAsset)

We need to extract existing audio and video tracks from our source asset to be able use metadata inside them.

// Tracks
let videoTrack = avAsset.tracks(
    withMediaType: AVMediaType.video
).first!
let audioTrack = avAsset.tracks(
    withMediaType: AVMediaType.audio
).first!

We need to define how to write frames to the destination file. This involves all the codec configuration we previously discussed. If any error occurs during the setup, we will warn the user and return prematurely. Below is the necessary plumbing to do that.

// Video Output Configuration
let videoCompressionProps: Dictionary<String, Any> = [
    AVVideoAverageBitRateKey : compressionConfig.videoBitrate,
    AVVideoMaxKeyFrameIntervalKey : compressionConfig.videomaxKeyFrameInterval,
    AVVideoProfileLevelKey : compressionConfig.avVideoProfileLevel
]
   
let videoOutputSettings: Dictionary<String, Any> = [
    AVVideoWidthKey : size == nil ? videoTrack.naturalSize.width : size!.width,
    AVVideoHeightKey : size == nil ? videoTrack.naturalSize.height : size!.height,
    AVVideoCodecKey : AVVideoCodecType.h264,
    AVVideoCompressionPropertiesKey : videoCompressionProps
]
let videoInput = AVAssetWriterInput(
    mediaType: AVMediaType.video, 
    outputSettings: videoOutputSettings
)
videoInput.expectsMediaDataInRealTime = false

// OpenGL pixel format must be 4 byte, 
// so we use 32RGBA and ignore the Alpha at the end    
let sourcePixelBufferAttributesDictionary: Dictionary<String, Any> = [
    String(kCVPixelBufferPixelFormatTypeKey) : Int(kCVPixelFormatType_32RGBA),
    String(kCVPixelFormatOpenGLESCompatibility) : kCFBooleanTrue
]
let pixelBufferAdaptor = AVAssetWriterInputPixelBufferAdaptor(
    assetWriterInput: videoInput, 
    sourcePixelBufferAttributes: sourcePixelBufferAttributesDictionary
)
    
videoInput.performsMultiPassEncodingIfSupported = true
guard writer.canAdd(videoInput) else {
    errorHandler(CompressionError(title: "Cannot add video input"))
    return CancelableCompression()
}
writer.add(videoInput)

We need to define how to write audio samples to the destination file. This involves all the codec configuration we previously discussed. If any error occurs during the setup, we will warn the user and return prematurely. Below is the necessary plumbing to do that.

// Audio Output Configuration
var acl = AudioChannelLayout()
acl.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo
acl.mChannelBitmap = AudioChannelBitmap(rawValue: UInt32(0))
acl.mNumberChannelDescriptions = UInt32(0)
    
let acll = MemoryLayout<AudioChannelLayout>.size
let audioOutputSettings: Dictionary<String, Any> = [
    AVFormatIDKey : UInt(kAudioFormatMPEG4AAC),
    AVNumberOfChannelsKey : UInt(2),
    AVSampleRateKey : compressionConfig.audioSampleRate,
    AVEncoderBitRateKey : compressionConfig.audioBitrate,
    AVChannelLayoutKey : NSData(bytes:&acl, length: acll)
]
let audioInput = AVAssetWriterInput(
    mediaType: AVMediaType.audio, 
    outputSettings: audioOutputSettings
)
audioInput.expectsMediaDataInRealTime = false
    
guard writer.canAdd(audioInput) else {
    errorHandler(CompressionError(
        title: "Cannot add audio input"
    ))
    return CancelableCompression()
}
writer.add(audioInput)

As you can see above, we hard-coded the audio codec configuration as AAC and it assumes two audio channels (stereo) for the audio. If this is different in your case, feel free to extend it to your preference. Do not forget the fact that audio configuration parameters must be of type UInt. This is a compatibility constraint caused by AVFramework’s being written in Obj-C in an unsigned manner.

We need to define how to read frames from the input file. It is very unlikely for you to find a H264 video that doesn’t use kCVPixelFormatType_422YpCbCr8_yuvs color space, thus we hardcoded our constants as is. If your videos use some other format, you may extend the configuration to your liking.  If any error occurs during the setup, we will warn the user and return prematurely. Below is the necessary plumbing to do that.

// Video Input Configuration
let videoOptions: Dictionary<String, Any> = [
    kCVPixelBufferPixelFormatTypeKey as String : 
        UInt(kCVPixelFormatType_422YpCbCr8_yuvs),
    kCVPixelBufferIOSurfacePropertiesKey as String : [:]
]
let readerVideoTrackOutput = AVAssetReaderTrackOutput(
    track: videoTrack, 
    outputSettings: videoOptions
)
    
readerVideoTrackOutput.alwaysCopiesSampleData = true
    
guard reader.canAdd(readerVideoTrackOutput) else {
    errorHandler(CompressionError(
        title: "Cannot add video output"
    ))
    return CancelableCompression()
}
reader.add(readerVideoTrackOutput)

We need to define how to read audio samples from the input file. If any error occurs during the setup, we will warn the user and return prematurely. Below is the necessary plumbing to do that.

// Audio Input Configuration
let decompressionAudioSettings: Dictionary<String, Any> = [
    AVFormatIDKey: UInt(kAudioFormatLinearPCM)
]
let readerAudioTrackOutput = AVAssetReaderTrackOutput(
    track: audioTrack, 
    outputSettings: decompressionAudioSettings
)
    
readerAudioTrackOutput.alwaysCopiesSampleData = true
    
guard reader.canAdd(readerAudioTrackOutput) else {
    errorHandler(CompressionError(title: "Cannot add video output"))
    return CancelableCompression()
}
reader.add(readerAudioTrackOutput)

We have to decide whether we will write the frames as they are or transform them according to the desired orientation fix. Below is the configuration for that.

// Orientation Fix for Videos Taken by Device Camera
var appliedTransform: CGAffineTransform
switch compressionTransform {
case .fixForFrontCamera:
    appliedTransform = CGAffineTransform(
        rotationAngle: 90.degreesToRadiansCGFloat
    ).scaledBy(x:-1.0, y:1.0)
case .fixForBackCamera:
    appliedTransform = CGAffineTransform(
        rotationAngle: 270.degreesToRadiansCGFloat
    )
case .keepSame:
    appliedTransform = CGAffineTransform.identity
}

Note that scaling by -1.0 along an axis corresponds to mirroring along that axis. If you have any other transformation you desire to have on your video, you can add more cases to this block.

Let’s mark the beginning of our compression.

// Begin Compression
reader.timeRange = CMTimeRangeMake(kCMTimeZero, avAsset.duration)
writer.shouldOptimizeForNetworkUse = true
reader.startReading()
writer.startWriting()
writer.startSession(atSourceTime: kCMTimeZero)

Above lines do not actually start the compression. They simple tell the framework to store all the state in an immutable manner for us to be able to loop effectively. We will do the copying by hand and the framework will compress the copied frames one by one.

We are ready for compression. The rest of the implementation will pass the remaining work to the background thread and return immediately to the caller. All the progress will be notified via provided callbacks.

Let’s allocate our OpenGL and compression contexts and loop the frames.

// Compress in Background
let cancelable = CancelableCompression()
compressQueue.async {
    // Allocate OpenGL Context to Draw and Transform Video Frames
    let glContext = EAGLContext(api: .openGLES2)!
    let context = CIContext(eaglContext: glContext)
    let compressionContext = CompressionContext()
        
    // Loop Video Frames
    var frameCount = 0
    var videoDone = false
    var audioDone = false
        
    while !videoDone || !audioDone {

We need to make sure everything is going well on the reader and writer side. If reader fails during the loop, chances are the file is corrupted in the middle or deleted before the compression is complete. If writer fails, it may have run out of disk space.

        // Check for Writer Errors (out of storage etc.)
        if writer.status == AVAssetWriterStatus.failed {
            reader.cancelReading()
            writer.cancelWriting()
            compressionContext.pxbuffer = nil
            compressionContext.cgContext = nil
    
            if let e = writer.error {
                errorHandler(e)
                return
            }
        }
            
        // Check for Reader Errors (source file corruption etc.)
        if reader.status == AVAssetReaderStatus.failed {
            reader.cancelReading()
            writer.cancelWriting()
            compressionContext.pxbuffer = nil
            compressionContext.cgContext = nil
        
            if let e = reader.error {
                errorHandler(e)
                return
            }
        }

If none of the possible device errors occur, maybe our caller decides to cancel the compression.

        // Check for Cancel
        if cancelable.cancel {
            reader.cancelReading()
            writer.cancelWriting()
            compressionContext.pxbuffer = nil
            compressionContext.cgContext = nil
            cancelHandler()
            return
        }

So there is no error and nobody cancels. In that case we can start reading pixels. AVAssetReader reads pixel data in an internal private thread. If that thread lags behind ours, it may cause a frame to have some missing pixels. If such case occurs, we should wait a little more by not reading any data for a few miliseconds (~1ms). Otherwise we may continue the copy.

            // Check if enough data is ready for encoding a single frame
            if videoInput.isReadyForMoreMediaData {
                // Copy a single frame from source to 
                //destination with applied transforms
                if let vBuffer = readerVideoTrackOutput.copyNextSampleBuffer(), 
                    CMSampleBufferDataIsReady(vBuffer) {
                        frameCount += 1
                        print("Encoding frame: ", frameCount)
         
                        autoreleasepool {
                            let presentationTime = 
                                CMSampleBufferGetPresentationTimeStamp(vBuffer)
                            let pixelBuffer = 
                                CMSampleBufferGetImageBuffer(vBuffer)!
                
                            CVPixelBufferLockBaseAddress(
                                pixelBuffer, 
                                CVPixelBufferLockFlags(rawValue:0)
                            )
                
                            let transformedFrame = CIImage(
                                cvPixelBuffer: pixelBuffer
                            ).transformed(by: appliedTransform)
                
                            let frameImage = context.createCGImage(
                                transformedFrame, 
                                from: transformedFrame.extent
                            )
                            let frameBuffer = getCVPixelBuffer(
                                frameImage, 
                                compressionContext: compressionContext
                            )
                
                            CVPixelBufferUnlockBaseAddress(
                                pixelBuffer, 
                                CVPixelBufferLockFlags(rawValue: 0)
                            )
                
                            _ = pixelBufferAdaptor.append(
                                frameBuffer!, 
                                withPresentationTime: presentationTime
                            )
                        }
                } else {
                    // Video source is depleted, mark as finished
                    if !videoDone {
                        videoInput.markAsFinished()
                    }
                    videoDone = true
                }
            }

As you can see, we called the initially defined helper which copies a single frame to a destination buffer here. Note that we apply the transformation to read CIImagebefore we send it to destination. If we wanted to transform or modify it more, this is the place to do it. CoreImage provides many other image manipulation tools at your disposal.

We should check the same possibility of being not ready for the audio as well. Otherwise we are okay.

        if audioInput.isReadyForMoreMediaData {
             // Copy a single audio sample from source to destination
             if let aBuffer = readerAudioTrackOutput.copyNextSampleBuffer(), 
                 CMSampleBufferDataIsReady(aBuffer) {
                     _ = audioInput.append(aBuffer)
             } else {
                 // Audio source is depleted, mark as finished
                 if !audioDone {
                     audioInput.markAsFinished()
                 }
                 audioDone = true
             }
        }

Here is the mentioned 1ms break before next iteration.

        // Let background thread rest for a while
        Thread.sleep(forTimeInterval: 0.001)
    } // end while

If our loop ends successfully, we can write all the remaining data to the destination file and notify our caller.

    // Write everything to output file
    writer.finishWriting(completionHandler: {
        compressionContext.pxbuffer = nil
        compressionContext.cgContext = nil
        completionHandler(filePath)
    })
} // end queue.async

Above code marks the end of the job we sent to the background thread. Below, you find the cancel wrapper we allocated before scheduling the compression, returned to the caller. Since our background thread strongly captured the wrapper before we returned, our thread will have access to our caller’s intention about whether it should cancel or not.

    // Return a cancel wrapper for users to
    // let them interrupt the compression
    return cancelable
} // end do

Lastly, we check if any error is thrown during the configuration phase. If so, we return prematurely and notify our caller with the thrown error.

catch {
    // Error During Reader or Writer Creation
    errorHandler(error)
    return CancelableCompression()
}

Our implementation has completed. We can close the function level curly braces.

} // end function

Final Result and Usage

If everything checks and the code compiles, below usage example should work as expected. You can find an example project in the link below as well as a single file implementation of what we discussed in this article.

// Get source video
let videoToCompress = //any valid URL pointing device storage

// Declare destination path and remove anything exists in it
let destinationPath = URL(
    fileURLWithPath: NSTemporaryDirectory()
).appendingPathComponent("compressed.mp4")
try? FileManager.default.removeItem(at: destinationPath)

// Compress
let cancelable = compressh264VideoInBackground(
    videoToCompress: videoToCompress,
    destinationPath: destinationPath,
    size: nil, // nil preserves original,
    //size: (width: 1280, height: 720) 
    compressionTransform: .keepSame,
    compressionConfig: .defaultConfig,
    completionHandler: { [weak self] path in
        // use path
    },
    errorHandler: { e in
        print("Error: ", e)
    },
    cancelHandler: {
        print("Canceled.")
    }
)

// To cancel compression, set cancel flag to true and wait for handler invoke
cancelable.cancel = true

What else can I do?

The code provided can actually be used to compress standalone audio as well. The only modification you need to do is to remove anything related to video including writer configuration, tracks, OpenGL contexts and pixel buffers.

The code works even if the device is locked if you execute it after you call application.beginBackgroundTask() as stated in Apple’s Background Execution docs. The only catch is it will work in a low performance mode, thus you will only be able to compress short videos in order not to run out of allowed time limit for background execution.

Conclusion

Apple, despite providing really powerful tools to accomplish complex tasks, sometimes lets you shoot yourself in the foot. This article was an attempt to prevent that for people who plan to engage with AVFoundation Framework. Feel free to leave a comment if any part of it doesn’t make sense. If you like articles like this, we would appreciate your constructive feedback to write more. Until next time, peace!

Give me the code!

Example Project: https://github.com/testfairy-blog/VideoCompressionTutorial

All the code in this article: https://github.com/testfairy-blog/VideoCompressionTutorial/blob/master/VideoCompressionTutorial/VideoCompression.swift

Credits

  1. Photo by JJ Ying on Unsplash.

Sources

  1. https://developer.apple.com/documentation/avfoundation/media_assets_playback_and_editing/asset_file_import_and_export
  2. https://developer.apple.com/library/archive/samplecode/ReaderWriter/Listings/Objective_C_AVReaderWriterOSX_AAPLDocument_m.html#//apple_ref/doc/uid/DTS40011124-Objective_C_AVReaderWriterOSX_AAPLDocument_m-DontLinkElementID_5
  3. https://medium.com/@rodrigo_freitas/a-simple-ios-view-to-play-video-b7ee05f9ce6a
  4. https://sintwo.wordpress.com/2014/11/25/the-art-of-datamoshing/