Why MoCap Matters
The Common (Mis)Understanding
Motion Capture (MoCap) is an wondrous technology suspended in a limbo of misunderstanding. For despite the countless workflows and opportunities it enables for productions of all sizes, MoCap's bookcover is almost too easy to read, leaving it shelved far too-often when it could otherwise be saving time, money, and headaches all over.
To the general public, MoCap is viewed as "Those suits that track actors’ movements and puts them into a computer!", and while that’s not an inaccurate assessment, this narrowed understanding bleeds over into Film & Game Industry professionals who don't have personal familiarity with Motion Capture and what it truly enables for their projects.
This is best summed up anecdotally by my experience with a Producer who, upon leaving a MoCap demonstration event, turned to me and said "That was all very impressive, but what would you actually use it for?", which left me somewhat perplexed given all we'd just witnessed.
And after asking others in attendance, I realised they shared an expectation that this MoCap stuff was only really helpful for "CGI characters", "big budgets", "SciFi stuff", "VFX heavy projects".
This is a common problem faced by all new technologies, how do we communicate the significance and convenience to those who don't work hands-on with the stuff?
At the end of the day, does a Producer really need to know the ins and outs of what MoCap can enable?
I'd argue yes, that regardless of your budget or scale, MoCap is one of those technologies you should be able to wield a rounded understanding of. Not to personally make use of MoCap in your next project, but to make better informed decisions on when and where to use their workflows, save your budget, and more efficiently create higher quality stuff.
As you'll soon see, your projects are likely already wading through areas that rub shoulders with MoCap, and almost certainly some of your Post-Production Crew are already using these workflows unbeknownst to you.
This article should provide you - a MoCap novice - with that much of an understanding, this whole article is a long read, I won't lie to you - it’s about 10,000 words, but you can skip around to the sections that are most relevant to you.
The most important areas to read if you're in a hurry are the Significance Recap, which gives you some grounded foundational knowledge, and the Next Steps conclusion to provides some guidance in the right direction for your journey, you can then jump back to check out specific workflows as they appeal, or become relevant, to you.
Contents
A Brief Significance Recap: What MoCap does & why that's helpful
MoCap Terminology, Tracking Types, Common Brands
Type A Workflows: The VFX stuff you're expecting
Type B Workflows: The Post-Production stuff you're missing
Type C Workflows: The emergent & novel workflows you're not considering
Next Steps: For the solo creative, industry talent, and established Producers
You can leave a comment with questions or follow up below, thanks to everyone who supports these articles on Patreon.
A Brief Significance Recap
What MoCap Does
Rest assured you won't need an intimidate understanding of the mathematics behind Motion Capture data, instead let's make the fundamental mechanism crystal clear.
MoCap is simply keeping track of where things are in the real-world, in 3D space... 4D if we're counting time.
Here’s a scenario for you! You're sitting in a cafe, the barista places your coffee on the table in front of you. You pick up the cup and take a sip. Then you put the cup back down on the table.
Now you probably didn't put the coffee back down in the exact same place, that doesn't really matter to you.
But how would you describe where it is now compared to where it started?
Maybe "Slighty towards me.", "A bit to the right.", or even "An inch towards the edge of the table.".
We're not great at describing the accuracy of these things because it doesn't really matter to us.
Computers, however, are fantastic at precision and can tell us exactly that the coffee cup is now 21mm South, and 7mm East of where you first picked it up, and you rotated it -48degrees around the Y-axis... whatever that means.
Fundamentally this tracking of individual objects, or "points", is all MoCap ever really does for us.
This is why MoCap suits used to look like they were covered in ping-pong balls.
Those balls are just the points being tracked, they're a lot safer and more convenient than using dozens of coffee cups.
But why is that useful?
Granted this MoCap so far doesn't seem too impressive, but the key to MoCap's magic is that we can combine it with other data, other mediums, that are even less impressive!
When we film something, what's unfortunately happening is we're compressing our world down into a flat 2D image made of nothing but colours.
How would you describe what's in this photo?
Maybe "Daytime cityscape", "There's a person standing on a street", "There's some buildings behind and some cars disappearing into the distance".
Hey, we found something we are great at!
Computers, however, have no idea what is in this photo, they can tell us every colour in every pixel left to right, up and down... but that's still just 2D colour information. They have no idea how far any of these objects are supposed to be from the camera...
They are completely blind to depth!
So if MoCap is able to keep track of any point in 3D space... there must be a way of using that 3D information to restore depth to our filmed image?
But what am I talking about? Computer's don't watch movies, people do!
Why does it matter if a computer can understand depth or not?
Demystifying Depth
It might be surprising to learn that depth is the most important factor in all of editing, for all of film history.
But in truth the most basic, standard, prolific techniques that are so common as to be unspoken, or not even considered "effects" anymore, depend entirely upon depth.
This is easiest to speed-run your understanding of by looking at a stereotypical greenscreen situation:
Greenscreen is actually a very basic technique for capturing depth information in a 2D image.
While it's not as detailed as MoCap might be, we can think of the green in the greenscreen as that part of the 2D image shouting "I'm reaaaalllllyyyyyy far away!", while the rest of the image says more calmly "I'm here, everything is fine."
The computer can instantly understand this depth information, and separate those parts of the image into two separate depths!
Allowing editors/compositors to easily insert other images between the two of them!
Greenscreens then can be considered as “depth pass” with only two possible values for each pixel the camera sees, "Near" or "Far", for anything caught in the field of view of the camera.
Where-as MoCap will preserve precise, detailed depth information, even when the object/actor being MoCaped is somewhere other than where the camera is pointing!
In practice this means that tasks that would otherwise be incredibly difficult, time-consuming, or impossible to pull of within a reasonable budget are immediately solvable because we have preserved the 3D position / depth information from the real world.
From quick fog and lens blur effects, to compositing interactions between shots filmed at different locations, to managing full-body replacements, we'll get into all the workflows you can imagine shortly
But first, let's brush up on some terminology so you can more easily communicate with others in-the-know.
Mocap Terminology
This is a skippable section, but worth skimming if you need to discuss MoCap with other professionals working with the technology.
Tracking
Motion Capture typically refers to tracking a person, but can also include objects, tables, props etc.
Performance Capture in most English-speaking industries has now become a more accepted replacement term for the Motion Capture of a human performance (typically also capturing facial performances) to properly confer credit on to the performer.
Facial Capture is MoCap that works as described previously, but to track dozens of points on a performer's face, allowing the computer to recreate their facial performance in detail in 3D.
Outside-In Tracking is MoCap that operates, on a technical level, through a perimeter of tracking cameras that need a clear unobstructed view of the object they're trying to track. It's Outside-In because those cameras are on the outside, pointing IN at the object we're tracking.
The Volume, is a term you may have heard in the context of Virtual Production, it refers to the space within that perimeter of an Outside-In MoCap system. Whatever space is within the bounds of a Outside-In Mocap tracking is inside the volume!
Inside-Out Tracking is MoCap that operates, on a technical level, through sensors within each point that is being tracked. Unlike Outside-In Tracking, this method typically doesn't suffer from obstruction issues, but often outputs far less accurate tracking data.
Solving is the term used for when the computer tries to make sense of the tracking data. For example, if we use MoCap on a human actor, we might track one point on each foot, knee, hand, elbow, shoulder, and a few on the hips, torso, and head. Those are still just points with 3D coordinates, the computer needs to then take those points, and our prompting of what point relates to which body part, to figure out (to solve) how our human performer fits in the position described by the points.
Camera Tracking is a term that already exists within the world of editors and VFX artists for software that tries to figure out how a camera was moved when it filmed whatever footage they're working on. But it also has a meaning in the MoCap world for actually... tracking the camera while filming.
AI MoCap refers to software that uses AI to analyse footage and derive what the MoCap data should have been if it had been captured. This is typically less accurate than Outside-In or Inside-Out, but far more affordable.
Brandnames To Know
OptiTrack manufactures high-end tracking cameras for Outside-In MoCap.
Rokoko manufactures Inside-Out MoCap products, including full body, face, and hand capture. They also provide an AI MoCap software solution with multi-angle support.
Wonderdynamics is a leader in AI MoCap software and all-in-one pipeline solutions.
Vicon manufactuers high-end tracking equipment for Outside-In and Inside-Out MoCap projects.
Ncam offers an optical solution for Camera Tracking.
HTC Vive a consumer grade Virtual Reality gaming system, used by many as an introduction to Camera Tracking and simple (object, not human) MoCap.
PIFuHD is a human-recognition MoCap framework upon which much of today's AI MoCap has been built.
Type A Workflows - What You're Expecting
Type A Workflows are those you're probably aware of and already imagining when you're thinking of MoCap.
Full CGI Character
What makes a CGI character convincing? A lot of things you don't need to know the minutiae of…
But simply consider which aspects of an actor’s performance might you need to make a CGI character’s performance equally convincing?
Body position, and arm movements are a good start, what about finer details?
How far their chest expands when they breathe? Probably not, that sounds excessive.
But whether or not they're smiling? That sounds reasonable.
MoCap allows you to track a performance... anywhere... in your trailer, quarantined in a hotel, or hanging off the side of a cliff, and then apply that information to a CGI character, an alien, a robot, etc.
And all that position data can be used by the computer to make them fit into the filmed scene you're dreaming up flawlessly!
Basic Steps:
A method of MoCap, anything from a OptiTrack Volume to a Rokoko Suit, to a cleverly positioned iPhone equipped with some MoCap software. A CGI character model, the more human in relative size and limb dimensions, the easier the next steps will be.
Rig the CGI model, this is a technical term for Animators and Technical Artists who take a CGI character and tell the computer "this part is an arm", "this part is a hand", "this part should bend here at the elbow".
Apply the MoCap data to the Rigged CGI Model, this can be lucky and work immediately, or require some "clean up" by animators. Think of a chef cleaning and dressing the plate after their rookie has rushed to serve a meal.
Composite the Animated CGI Model into your scene, depending on your project this might be as simple as your editor dropping it in a timeline, or involve more detailed Rendering passes from 3D software.
Without MoCap:
A Full CGI Character without MoCap needs to be animated "by hand" by animators, which might sound cheaper, easier, and quicker. But you're not actually avoiding MoCap, you're just pushing the MoCap out-of-sight, out-of-mind. Animators never animate from nothing, they reference, typically based upon videos of themselves performing the desired action as close as possible.
You could view their process as a very labour-intensive version of MoCap.
Instead of the MoCap cameras and sensors saving all the MoCap data of the performance instantly, these animators are filming themselves and then trying to figure out what the positioning, and timing was of the movements they performed, and then input it manually into their software.
Obviously Animators are also adding layers of artistry to the final product, MoCap isn't replacing Animators. It's accelerating the first steps of their job, saving them time, and therefore saving you money.
Character Replacement
Character replacement is an ever-present, and often invisible, effect in Post-Production.
This is most obvious with stunts that would be too dangerous or expensive to perform for real, a character falling through a 100th story window and crashing into a car below. Perhaps you have footage of your actor running up to the window, and jumping through the empty window frame onto a pad. You also have footage of your actor standing up from the wrecked car after the fall. So this Character Replacement needs to seamlessly connect those two filmed pieces of footage.
There is no difference between a Full CGI Character and a Character Replacement, except for the need for this replacement to perfectly line up with the real actor/character they're replacing (and not look unconvincingly different).
Once again, MoCap saves us a huge headache in Post Production by having exact data on where that actor was, how they were standing, how fast they were moving, where their arms were, and how it all fit in 3D space relative to the window.
Basic Steps:
A method of MoCap, anything from a OptiTrack Volume to a Rokoko Suit, to a cleverly positioned iPhone equipped with some MoCap software.
A CGI Character Replacement model, these can be modelled/sculpted by a 3D Artist, but it's far quicker and more accurate to take a "scan" of the actor. Basically a lot of photos of the actor from every angle that a computer can rebuild into their likeness.
Rig the CGI model, this is a technical term for Animators and Technical Artists who take a CGI character and tell the computer "this part is an arm", "this part is a hand", "this part should bend here at the elbow".
Apply the MoCap data to the Rigged CGI Model, but unlike the Full CGI Character, there are empty moments in time that need their own Animation. (Such as the fall from the window and crash landing into the car) This MoCap data however is providing a very accurate beginning and end that the Animators can work to.
Composite the Animated CGI Model into your scene, this most often will involve a detailed export from your Animation Rendering software (that retains the 3D depth information) that is then neatly layered into your timeline.
Without MoCap:
Character Replacement without MoCap is all done by eyeballing it, and is even more difficult than dealing with a Full CGI Character. When an Animator is making a CGI character based upon their own reference performance, there's a lot of leeway, they can't truly be "wrong" because there's no "right" way to perform.
But with a Character Replacement, the 3D positioning of the animation must be perfect to match with the real footage for a convincing transition and continuation of the performance. This isn't impossible, but it is time-consuming, and seen as a particularly tedious piece of work handed down to junior animators on the team.
The situation here remains the same as with a Full CGI Character, MoCap isn't essential, but it will dramatically assist your Post Production team, and enable you to get more hours of actual Animation from the total hours you're paying for.
Crowd Simulations
What have crowds got to do with MoCap, just hire some Extras right?
Crowd Simulations are a go-to to add more life to larger scale scenes that might be looking a little dull otherwise, and it's really not difficult.
Stadium shots, some gliding establishing shots over a city, the blurry background behind your characters doing a walk and talk, even some animals in a field, can all be made just that little bit better with some Crowd Simulations.
Unlike our previous two examples, Crowd Simulations are all about what the audience is not paying attention to, so thankfully we don't need to be providing unique MoCap data to every person in a crowd. Instead we can use MoCap to capture a few different takes of someone cheering, walking, shuffling, coughing, and then distribute them to however many CGI people/creatures in a crowd we need!
Basic Steps:
A method of MoCap, anything from a OptiTrack Volume to ac Rokoko Suit, to a cleverly positioned iPhone equipped with some MoCap software.
An assortment of CGI models that will act as members (tech. term is "agents") of the crowd, they can be of various sizes and shapes, all that matters is we ensure they can accept the MoCap data.
Rig the models, in the other examples we only needed to Rig one model, but with a crowd we are going to have multiple, so to ensure the MoCap data works with all of them we need to have Rigs that... share a language, or are setup in the same way. This is a technical step carried out by the Animators, the short explanation is that an Animation (like a character jumping) can't simply be copied to a different character, the computer needs to know more detailed information than "this is the arm", "this is the leg", it's more complex math like "what is the elbow rotation relative to the forearm length?". These problems are... not avoided, but solved more smoothly by ensuring the models are all Rigged as similarly as possible.
Apply the MoCap data to the Rigged Characters, usually to keep this looking convincing with large crowds your Animators will also set them up with some randomness and timed sequences. Instead of Agent-1980 waving forever in the background they might [Wave-3sec]-[Stand-2sec]-[Scratch-4sec]-[Stand-5sec]-[CheckPhone-12sec]. This variation makes them less noticeable, and all seemingly unique members of a crowd in the background.
Composite the Crowd Simulations, typically these are just one element in the greater scene being created, so they will be passed on to the team compositing the environment. But in smaller productions they may immediately go to the editor to drop into the timeline.
Without Mocap:
Crowds can be overlooked entirely without MoCap, or more likely they are included after some late Post-Production design decision and are populated with MoCap data downloaded from online libraries (Free or Paid), or Animated by Animators referencing their own performances.
This is one of those "cheap-to-know expensive-to-not" situations, where if you have a cheap MoCap suit, or free software, and have a day's worth of "idle animations" recorded (tech. term for animations that aren't doing anything special, such as standing, thinking, shifting in place) your studio can keep those saved for use in all future projects. Then when a crowd, or single random background character might be handy, those animations can be forwarded to the Post Production team with a much shorter turnaround task of assigning those Animations to a crowd to put in the back of that shot.
It again simply reduces the workload, and allows you to spend your time and budget more efficiently.
Type B Workflows - ...Unknowingly Using...
These, Type B Workflows, you are less likely to realise involve MoCap but your team may have been using them without you even noticing!
Actor Tracking
Possibly the most common Post Production / VFX task is to change a detail of an actor.
This can be changing something about their outfit that was forgotten while filming and messed up continuity ("Unfold that sleeve!"), it could be an intentional effect ("Their eyes should glow at night!"), or touching up some wear and tear that occurred throughout the day ("You can see their lipstick is worn off whenever they open their mouth!").
And realistically the VFX Artist assigned this task is given nothing to work with other than the filmed footage itself, so they use all manner of tricks and tools to recreate that lost depth information from the footage. How did the camera move? How much is the head rotating here? How far away are they? All to rebuild a facsimile of the information that could have been recorded accurately with MoCap on the day (and sometimes is, but never makes it to the VFX Artist).
Basic Steps:
A method of MoCap, an OptiTrack Volume or Rokoko Suit may be impractical for most scenes, but a AI/Software based MoCap can be used from a camera with clear line-of-sight to the Actors, such as a phone or consumer-grade action camera placed above the main camera.
Partner the MoCap Data with its footage, when the camera footage is saved and metadata (Tech. term for extra information like Shoot Day, Camera, Shot No. Take No.) recorded at the end of the day - so too should the MoCap Data. If this is a second MoCap view camera for AI/Software MoCap, this footage should also be saved with the corresponding metadata of the primary footage.
Inform Data Management, avoid leaning into micromanagement, but ensure that whomever is in charge of Data Management is aware of the existence of the MoCap Data, and the priority of keeping it organisationally accessible to the Post Production based upon its metadata.
When a Post Production team member is assigned a task that requires MoCap data they can access it directly apply it to their effect.
Without MoCap
Realistically these effects are always being performed with a level of MoCap, it's just a lower-quality and time-consuming step for the VFX Artist. Although the tools are constantly improving and being propelled by AI in recent years, the main filmed footage often uses a camera angle that isn't... particularly easy to work with. The effect will still get done, it will just take more time, more effort, and more guess-work than is necessary for something that could've been fixed with a cheap camera and some extra storage on the day of filming.
Grading
Color Grading stands proud as one of those steps in the pipeline that has an undeniable importance. You know what someone is talking about if they say "Color Grading", but if you say "Match Moving" you should be prepared for some blank faces.
Color Grading is basically the beautification step, making all those colors pop, the shadows match the tone of the film, the sky look just right, and everything matches together beautifully. The people who work primarily as Color Graders can seem like true magicians when you see a before and after, but what they do isn't as simple as saying "Make that corner brighter!".
If the camera is moving and there's a particular object that needs to be darker, we know that the computer can't tell where that object is, the computer only knows 2D information. So the Color Grader, like a VFX Artist, uses lots of fancy software to reverse-engineer the layout of the filmed scene in 3D and take their best guess.
But perhaps more impressively is for a character's face, one of the best tools of a Color Grader is a "Face Tracker", a piece of software that calculates the 3D face shape, positioning, and movement of a character from a bit of footage. They can then adjust everything you can imagine, and a few you wouldn't believe.
Basic Steps:
A method of MoCap, as with our Actor Tracking workflow using a consumer-grade camera should be plenty for this level of detail on characters. A method of MoCap for any objects or light sources that are moving within the scene would also be helpful.
Scene Layout with measurements, particularly denoting the relative positioning of any lights, modifiers, characters, and cameras.
(Opt.) Light Passes are additional pieces of footage that have a limited lighting setup. Imagine you are filming with 3 lights (Light A, B, and C), simply turn them all off except for Light A. Film a short take (or a full camera move if there is one) with only that light on, and then repeat for only light B and only light C. This can be used by the Color Grader (and VFX Artists) for isolating certain effects in the main footage!
(Opt.) Matte Passes / Matte Plates are a helpful trick for providing the Color Grader with extra information. Just as the Light Passes are helpful for literally shining a light and saying "Only look at this part!" we can use Chroma Mattes (like a green screen), or Luma Mattes (Bright and Dark) to store more information in another extra take.
(Opt.) Luma Depth Plates aren't common on large scale productions, but basically involve turning off all the lights except for a bright key light centred around the camera. As the light falls off into the distance of the scene, this provides a way of capturing the depth of the camera angle that will be more accurate than what the Color Grader's software can reverse engineer.
Data Management again is key here, as it was with the Actor Tracking example, to ensure both the MoCap Data/Footage is available to the Post Production and Color Grading teams, and any additional "Passes/Plates" are correctly extracted, tagged with MetaData, and accessible should they be useful.
Without MoCap:
Color Graders are used to working without any extra MoCap data provided, unless the scene they are provided comes from 3D/CGI Software, which can provide much more detailed depth information than we're aspiring to here. So when asked, or even provided with these options they will most likely hand-wave them away and stick to what they're comfortable with.
But if you get in the habit of filming extra Matte Passes, linking that with the footage provided to them, your extra 2 minutes on each shot rapidly toggling through lighting setups won't necessarily save you any billable hours with your Color Grader - but it may just bump up their creative potential.
The face-tracking MoCap tools available to Color Graders through their software isn't something you will surpass easily without entering the realm of full 3D face replacements, as touched upon with "Character Replacement" above. But in some cases that is a necessary workflow, and again the savings in having that actor's MoCap data comes from saving the Animators from the task of recreating it by hand, or worse, the Color Grader having to animate it by hand - you're supposed to be paying them to grade!
Camera Tracking
Camera Tracking is a universal tool, used by everyone Editors to Color Graders to VFX Specialists. This is the MoCap of the camera movement itself, if it's not immediately clear why that matters, basically if we want to do... anything... any changes to the what's been filmed (like removing a C-Stand left in the frame) we need to know if, where, and when the camera is moving. If we don't, and the camera moves, then our change won't know to move WITH the camera (so our hidden C-Stand will suddenly move back into view).
There are many many tools to automate this process from footage, and in some scenarios (bright interiors, nothing is moving, and there are lots of straight lines to reference) they can be very accurate. But many elements complicate this, lenses distort the image, heat shifts the air, dust and fog act as distractions, and maybe you're filming something that's also in motion!
If we could safely, and accurately say "The camera started here, moved like this, and stopped here." we could rest assured that everything will stay lined up without having to lift a finger! And that's exactly what MoCap can provide.
Basic Steps:
A method of camera MoCap, this can be the same complex OptiTrack setups, or something more simple like a phone that is recording gyroscope and accelerometer data, or even another camera positioned high above to precisely record the movement of the main camera.
A scene reference point, this is often simply the starting position of the camera, but a common mistake when recording camera MoCap is having the movement and NOT the knowledge that the starting position was "1.23 meters above, 0.5 meters back, and 0.14 meters right of the corner of the table."
A Rigged 3D Camera, this is a technical term for a way of applying our MoCap data to our footage in the computer, this sounds like deep VFX stuff but is within the grasp of Editors in their typical timeline-based software / NLE.
Applying the 3D Camera Motion to the footage, this is automatically achieved by a Rigged 3D Camera, but it is worth noting that other information must be available, such as the camera's lens settings at the time of recording, for it to work flawlessly.
Without MoCap:
Realistically this is all possible without MoCap, but even MORE realistically speaking... it’s never as accurate, and it always wastes hours of your VFX Artists’ time. Especially when they use their typical camera tracking techniques, and get it 99% correct - so correct they can't tell it's not 100% - and then a change is requested in the VFX effect assigned to them... and that effect makes it horribly clear that their 99% tracking isn't good enough and they have to start it all over again.
All these moments of wasted time, and needless backtracking, are kept quiet behind many layers from a Producer or Director, and that makes sense. But it also makes sense that all that adds up to a lot of wasted time and money, that very little extra effort when filming could have saved.
If you're on the indie-end, your camera might even have some basic built-in inertial tracking data that is recorded with your footage! Simply by being aware, and then taking the extra step to extract it and send it to your VFX artists could save them their next headache.
Compositing 3D Elements
Most problems that are hand-waved away with an equally vague statement of "Just fix that in post!" end up involving some level of 3D tracking. Sky replacements, continuity miss-matches, lighting mistakes, or timing errors, all land in the laps of an editor or VFX artist who must solve it with what little data they have been provided. And as all Editing/Compositing boils down to telling a computer what depth each element is from the camera to display it properly, a significant portion of any Post-Production worker's time is spent accurately imparting that information.
Perhaps the most unseen but essential example is with "depth of field", this is the technical term for how objects that are far behind (or infront) of the focus character in a scene seem quite blurry. If the 3D Elements don't have accurate 3D tracking information they will pop out and grab attention, seemingly too focused compared to their surroundings, but if it's done right you'd never know any work had to be done at all! This is known to VFX Artists as "The curse of invisible effects", for so many effects if they’re done well no-one knows they ever happened.
As a result the Producer, Director, or anyone outside of the VFX team may see the footage they've paid for from the Post-Production studio and not understand where the money was spent, or why it seemed to take so long. This becomes a problem because then when requests are made by VFX Supervisors for additional information, data... MoCap Tracking on the day of filming, those who haven't worked hands-on in VFX are completely unaware of how much time (and therefore budget) is being wasted by saying "No, you guys worked got it done perfectly well last time without any of that.".
Basic Steps:
As for Camera Tracking, a method of camera MoCap, this can be the same complex OptiTrack setups, or something more simple like a phone that is recording gyroscope and accelerometer data, or even another camera positioned high above to precisely record the movement of the main camera.
A Scene Reference Point (Measurements of the Camera's Start Position), accompanied by an accurate map of the Scene Layout, particularly denoting the relative positioning of any lights, modifiers, characters, and cameras.
(opt.) A quick HDRI, this is a technical term for a 360 degree photo that captures much more light information than a normal photo (Basically storing a thousand brightness values instead of a dozen). These can be instant with a cheap action camera, and your VFX Supervisor will know where best to take them, but if you don't have a VFX Supervisor - just take them from where the actor will be standing, or where any 3D Elements will be composited in.
As is becoming a pattern, inform Data Management of this priority, and ensure the HDRIs, Camera Tracking data, Scene Reference Points are all going to be fitted with the appropriate metadata and be made organisationally accessible to the Post Production.
The HDRI, Camera MoCap, and Scene Layout will then be used by your Editor and/or VFX Team by applying them through a 3D Camera Rig, this is a technical thing you don't need to have a deep familiarity with, but suffice to say it enables immediate accurate positioning of the 3D Elements relative to the Camera Footage in a 3D Scene. And the HDRI will do 80% of the work to making them beautiful and correct with a click of a button.
Without MoCap:
Speaking about 3D Elements generally is difficult because it is such a wide-reaching field that encapsulates everything in VFX. But the general pillars we can't avoid are the fact we have a camera with real footage we need to edit, we need to composite to add/remove things from. The more information we have about that camera, and what it was recording, the easier this process will be.
Typically if a VFX artist is assigned a clip and told to include a 3D element they will begin by using software to try and calculate information about the camera, what was its "focal length", where was the "depth of field" set to. These aren't even MoCap details yet, these are facts known to everyone in the Camera Department on the day of shooting and should be included in the MetaData of the clip sent to the VFX Artist, but so often is not. Then they must use more software, inputing their best estimates of the camera information, to calculate an estimate for the camera movement and position. And then another layer of software calculations to try and separate depth - how far is that car, that wall, that character from the camera?
All this preparation, with increasingly muddied accuracy step-to-step, can be accelerated with correct MetaData and skipped almost in its entirety with MoCap data from on-set. This one can seem complicated compared to the examples we've covered before, and typically these are elements overseen and asked for by the VFX Supervisor. But if you don't have one, or should they not feel particularly empowered on your set, it's worth knowing and pushing for some of these additions to save your budget and improve your creation in the long-run.
Motion Graphics
I've separated this from 3D Elements because I'm sure to many established Producers, Motion Graphics seem an entirely different discipline from VFX and 3D work. I remember so clearly a decade ago how one particular VFX Software Package became famous as the unparalleled leader in Motion Graphics, and was spoken about as if that was all it could do.
If you're unsure of what I mean by Motion Graphics, imagine any news program, broadcast TV with a sports section, any text on the screen, floating windows and images next to presenters. Big, blocky, obvious graphics that aren't trying to convince anyone they really existed at the time of filming (as opposed to a CGI fire effect, or replacement sky). Because these are so simple looking, it's easy to separate them in your mind as an easier effect, and they can be... so long as we know the depth of everything involved.
Motion Graphics are generated within a computer, so they come with their own MoCap data built in! We know their depth, but the footage we're mixing them with still suffers from the same problems that we have with any other use-case. If you hope to have a real filmed element at a CLOSER depth than the Motion Graphic, or should there be any movement in the real filmed element's camera, or if you want to have the Motion Graphic move as if in that real filmed 3D world... MoCap again is being relied upon.
Basic Steps:
A method of camera MoCap, this can be the same complex OptiTrack setups, or something more simple like a phone that is recording gyroscope and accelerometer data, or even another camera positioned high above to precisely record the movement of the main camera.
A method of MoCap for any objects or character you're planning to move in-front of (CLOSER to the camera than) the Motion Graphic elements. If this object has a shifting silhouette, such as hair or fur, an additional step to highlight the silhouette with a chroma key background behind it would be even more helpful, but that is outside the scope of the purpose of this article.
A Scene Reference Point (Measurements of the Camera's Start Position), accompanied by an accurate map of the Scene Layout, particularly denoting the relative positioning of any lights, modifiers, characters, and cameras.
Inform Data Management to ensure the Camera Tracking data and Scene Reference Points are all going to be fitted with the appropriate metadata and be made organisationally accessible to the Post Production.
The Camera MoCap, and Scene Layout will then be used by your Editor and/or VFX Team by applying them through a 3D Camera Rig, this is a technical thing you don't need to have a deep familiarity with, but suffice to say it enables immediate accurate positioning of the 3D Elements relative to the Camera Footage in a 3D Scene.
Without MoCap:
Motion Graphics without any MoCap involves, perhaps more-so than other effects, a frame-by-frame (tech. term for a task manually performed on every individual still image/frame in a film, there's typically between 24 and 60 of these for every second of footage) laborious process of correcting the Computer's interpretation of Depth in the scene.
Because Motion Graphics usually move around the scene, appearing and disappearing, and are sometimes interacted with by characters, they must match the scene's 3D environment accurately. This requires the same calculations as with any other 3D Elements for camera lens data, focus data, positioning data, movement, all of which must be calculated by the Motion Graphic Artist from the footage they are provided. Once they have then used that data to create a fake 3D Camera, and replicated the scene to the best of their ability, they can finally begin animating the Motion Graphics they were hired to create.
There are plenty of Motion Graphics that don't enter into 3D, and are purely 2D Animations, so MoCap may seem especially unnecessary here. But even the creation of those 2D Motion Graphics will often involve a 3D workflow from the Motion Graphic Artist working on it, despite your final product you receive back being 2D. If nothing else, the minimal inclusion of Camera Tracking data, and a measured Scene Layout highlighting the distance to elements relevant to the Motion Graphics, can add a huge head-start for their process, and shift the bulk of their paid time onto actually creating you some great graphics!
Type C Workflows - Novel Workflows
Type C Workflows are those I'm expecting to be new, novel ideas to most of you, some of them don't even relate to the finished product, as in today's world MoCap can offer many benefits much earlier in the process.
Rapid PreVis
As was reinforced during the global lockdowns, the filmmaking industries strongly rely on sharing a physical space together, getting on a location and physically planning and solving problems. Virtual Production is recent development that amongst other things, enables many of our Pre Production queries and concerns to be addressed quickly, remotely, and collaboratively in a shared space.
When you need to block out actions in an environment, test camera positions, or even check the feasibility of using certain equipment, the answers are informed by the location itself and your crews' problem solving skills. MoCap can be used, as a key ingredient in a Virtual Production workflow to transport your key problem solvers into a Virtual Environment, to answer those same questions collaboratively without ever visiting the location for real.
Basic Steps:
Acquire a 3D scan of your filming location, this can be captured by your Location Scout with a "Photogrammetry" App on their phone, with a LiDAR scanner, or simply taking many many photos of every angle possible of the location.
Create a 3D Replica of your filming location with the 3D scanned information, this could be handled by a PreVis Artist, or through your VFX Supervisor if you do not have a VAD or similar department. This results in an accurate 3D model of your location that can be viewed in realtime on your phone or computer through typical 3D software.
Open a shared collaborative workspace showcasing the 3D Replica Location, the most popular software solution for this currently is Unreal Engine, though other Game Engines and realtime asset viewers have similar capabilities. This allows anyone on your team, located anywhere with an internet connection to access the same 3D Replica of the Location.
A MoCap solution for the equipment involved in your problem solving, if you're testing camera positions you'll want to accurately track it with something like an OptiTrack setup that updates the camera in the 3D Replica Location in real time. These systems will typically also involve further inputs to communicate the lens data to the computer, such as focal length.
(opt.) A Virtual Reality Headset is the preferred method for viewing this Virtual Environment, while it is possible to view it on a phone or a computer, when a VR Headset is worn the collaborators feel like they are in the real environment, and the problem solving is more natural.
(opt.) A MoCap solution for the collaborators while they wear their VR Headsets. This may be starting to sound quite SciFi, but by having everyone wearing VR Headsets and having their MoCap sent to the Virtual Environment in real time, the result is everyone can see everyone else in real time - as if they were really in the environment.
Without MoCap:
Normally the minutiae that will be solved by this process can, and will, be solved when the Camera Department arrives on the location. The Industry Standards are not taking advantage of MoCap in any way here, but by using MoCap you can accelerate your turnarounds, and reduce your shooting schedule dramatically - anecdotes range from 30-50% in the amount of time saved.
The common description from Cinematographers after having prepared in this way is that when it comes time to actually film on location it feels more like doing a re-shoot, because they've already hammered out all the kinks, and they know the moves beat for beat.
The MoCap used here can even expand to include performers for these workflows, or stand-ins at the very least, allowing rehearsal of action, and complex moving shots to be orchestrated on a much lighter (and travel-free) budget.
Digital Extra Placement
This is no different to the Crowd Simulations discussed in the Type A section, except for the quantity and quality of the digital performer. Crowd "Agents" are typically small, far out of focus, and not at all intended to grab attention despite numbering in the 10s or 100s. However, Digital Extras are higher quality, intentionally placed, and close enough to draw attention despite being perhaps the only one in the scene.
Digital Extras work best when the filming has been undertaken with a consideration for their later inclusion, meaning exact measurements, camera metadata, and HDRIs have been captured that will later be used to enhance their realism. This is a slight increase in workload than hiring human extra (excluding catering, management etc), but this workflow far surpasses human extras in their flexibility and cost.
Instead of the extra being captured in the camera footage, restricting any later decisions to their performance (at the cost of additional hours on the Post-Production team to remove/edit them), the quantity, number, and even existence at all of the Digital Extras can be chosen at any point for no additional hassle.
Basic Steps:
A MoCap solution for the camera, this can be the same complex OptiTrack setups, or something more simple like a phone that is recording gyroscope and accelerometer data, or even another camera positioned high above to precisely record the movement of the main camera.
A detailed Scene Layout map with measurements, particularly detailing the lights, cameras, characters, moving objects, and potential Extra locations.
An HDRI captured from the intended (or possible) location of any extras.
Clear communication with your Data Management team to ensure each of these items is tagged with the correct Metadata and made accessible to your Post-Production team.
A MoCap suit for performance capture of the Digital Extras, this can be as easy as a Rokoko suit or as accessible as a phone video run through AI tracking software. This can be performed, and captured, anywhere at any time before or even during Post-Production.
A Rigged Realistic CGI Human Asset, these can be purchased, modelled, or scanned from real people, and will typically be rigged by an Animator or Technical Artist so the MoCap performance can be applied realistically onto the asset.
Compositing of the Digital Extra onto the footage will be performed by a VFX Artist with access to each of the previously mentioned pieces of data. Or if the Digital Extra performance was exported by the VFX Artist it may be directly composited by your Editor in the timeline.
Without MoCap:
This is not a process to replace Extras as performers, nor entirely remove them from locations.
There are times when a scene is filmed and the thought occurs later that "this really could use someone else in the back there to give it life", or the opposite in an emotionally charged scene "that extra is a little drawing our eye when we should be hooked to the lead here".
Digital Extras just make those moments easier to solve, instead of being committed to whatever was on film with very little room to pivot, a few extra steps are considered during filming to ensure no-ones hands are tied. Should those steps not have been taken, likely your Post-Production team wouldn't even attempt to use a full CGI character as a Digital Extra - the HDRI alone provides 80% of the realism.
The go-to solution without this workflow would be to schedule a pick-up shoot with an extra on a greenscreen, where the crew will try to recreate lighting that matches the scene as close as possible. Needless to say this is far more complex, expensive, time consuming, and inaccurate than having just recorded everything you needed in the first place.
Filming for AI MoCap on a Budget
AI Motion Capture has been booming recently, and with good cause, its been a few years of progress finally coming to a head. AI Motion Capture refers to fancy software that can look at video of someone doing... anything really... and figure out how their body was really positioned in 3D space for the duration of the video. If that's not clear, imagine footage of someone waving and walking towards the camera, if you've used a MoCap suit you can tell exactly how far away they are from the camera and how high they raised their arm, AI MoCap does a very good job of magically (and mathematically) coming to that same conclusion without ever needing the suit.
The wrong conclusion to take from this is "Well what do we need MoCap suits for anymore then!", the AI solutions have their own downsides and at the very least AI MoCap is an Outside-In MoCap solution which means it needs a clear line-of-sight to the performer.
This should be intuitive, what if you put a lens cap over the lens? No matter how good the AI is getting, it's not going to figure that one out.
This principal of a clear helpful view carries over to the type of camera angle being used in a performance, AI Mocap works best on full-body shots, with undistorted lenses, where the performer's feet are on the ground.
This doesn't mean you can't use AI Mocap for close-ups, or seated shots, the better realisation to take away is that there is no reason AI Mocap needs to come from the same camera you're using to film!
Basic Steps:
A second camera, a phone is good enough, positioned wide to have a clear unobstructed view of the performer's full body during the take.
Synchronised timecode, or at least an audio slate to make the two videos effortless to synchronise.
A detailed Scene Layout map with measurements, particularly detailing the lights, cameras, characters. The most essential element to note is the orientation and distance between the two cameras, as the camera being used for the MoCap may be pointing in an entirely different angle to the main camera.
An AI MoCap Software Solution (not all of these technically use AI for the heavy-lifting MoCap work) this could be Rokoko Video, WonderDynamics, PIDuHD. This software can take the secondary footage intended for MoCap and output accurate 3D data of the performers as if they had been MoCaped.
With the Scene Layout informing the offset between the MoCap camera and the main camera, your VFX Artist will be able to correctly adjust and apply the AI MoCap data into its correct position relative to the footage.
Depending on your intended MoCap use case, the next step should follow as normal for that workflow. This AI MoCap process is merely an alternative for capturing that initial performer/human MoCap data - the following steps remain unchanged.
Without AI MoCap:
Without AI these MoCap could of-course be done with any other MoCap method. The real breakthrough with this technology is the lack of barriers, cost, or expertise to make use of it!
Independent films, advertising spots, student films, kids in their backyard would more often than not find a MoCap suit a little outside their reach. And even if they could get their hands on one, so much time would have to be dedicated to ensuring it is done right to be of any use at all.
AI MoCap sidesteps the risk, and eliminates the barrier to entry, anyone can record that second camera angle and maybe it ends up being incredibly useful and enabling new MoCap fuelled workflows! Or it doesn't, and they never use it, but it cost them no extra time or resources beyond recording another angle on their phone and writing down where they left it.
By removing the cost and expertise requirement of dabbling in MoCap, newcomers can skip the hurdles and just experiment with all the previous workflows we've discussed and feel out for themselves where the benefits, money savers, and quality multipliers really are hiding.
Modern Retargeting
Retargeting is a technical term you don't really need to know, remember how we had to tell the computer that "these MoCap points mean Left Arm", "these MoCap points mean Left Elbow"?
When we do the same on a CGI character, we can think of that as "Targeting", so we say "This part of the CGI Character is the Left Elbow!", "This part is the Left Forearm, and should bend at the Left Elbow".
It's a complicated, essential, and precise step for Animators, typically performed by Technical Artists (the task itself is called Rigging).
When we find we need to do this whole process AGAIN, that's where retargeting comes in, if we'd told the computer everything about one CGI character (This is the Left Arm etc.), but we then decided to include another CGI character we'd have to start from scratch. Retargeting allows us to describe multiple CGI characters in the same language, so we can use the same MoCap data and Animations that we've already "targeted" to one, and REtarget it to the others.
The concept you should understand is that you can inexpensively reuse the one performance, the one action, on any number of unique CGI characters for negligible additional effort beyond animating the first! If you're animating an extra walking in the background, why not make that 6 unique extras, or if you're using a MoCap setup for a main character, use some of that same data on lower quality background characters to fill in the frame at minimal extra cost.
Basic Steps:
A Rigged, Animated CGI Character in any 3D Software, this could be anything from a free tool like Blender to an industry standard as in Maya or even Houdini.
Another CGI Character of similar form, meaning the same number of arms and legs, though it is still possible to retarget with variations, the closer they are the easier it will be.
Rig the new CGI Character to be compatible with your original, exactly what this means varies technically depending on the method of Retargeting preferred by your Animator / Technical Artist. Although humans have dozens of bones in our spines, Animators tend to give them fewer, and some Retargeting workflows require the exact same number of bones, while others (like in Unreal Engine 5) don't require such strict similarity.
Apply your Animation onto the newly Rigged CGI Character, as the rigging process only needs to be done once, any animation can now be applied at this step. However some refinements will typically need to be made by your Animator on a case-by-case basis to ensure they work as intended.
Without MoCap:
For years Retargeting was actually considered an impossible dream, or unrealistic at best, until a Game Development Studio allowed resources to be spent R&Ding tools to save them money for an ongoing franchise that wanted to have dozens of unique characters with different heights and builds all animating uniquely and affordably.
Retargeting simply does the first 90% of the work that an Animator would need to do mind-numbingly recreate by hand over a few hours for every asset. It has completely changed the math when it comes to calculating the cost of CGI character. Instead of having to pay a set number of hours PER character, the bulk of the cost and effort comes with that first character.
Additional budget can therefore be spent on more varied, or higher quality Animations themselves, instead of the character setup itself. While lower-budget productions can now more realistically consider multiple CGI characters.
Next Steps
So with a bit more of an understanding, where can you take MoCap next?
For the Solo Creative
If you're a solo artist, perhaps a student or just an interested teen, MoCap may have seemed a little out of reach, or out of your scope.
That’s not the case, so hopefully you're able to see a path to embrace what's available at your level and set your sights a little higher.
MoCap doesn't exist solely at the high-budget higher-complexity level, don't limit yourself to what others have done before, and their lower expectations of the solo filmmaker. Start experimenting with the free AI MoCap tools, experiment with how that data can be useful to you. Perhaps you find a more abstract way to interpret it than I've laid out here, but the more familiar you are with these tools that other, older, more established creatives might wave away, the better positioned you are to set yourself apart and rocket ahead of the rest of us!
Prompts:
Experiment with AI MoCap, film yourself, film a group of your friends.
Aim to include one out-of-focus CGI Extra, powered by your own AI MoCap, in your next film.
Next time you filmed a locked-off (Unmoving Camera-on-Tripod) shot, take the extra time to film some Light Passes/Mattes as I mentioned in the "Color Grade" section, and see what creative uses you can make of them.
Industry Talent
When you're already in a slot in a creative industry, your room to move can be quite limited and your voice somewhat muted depending on your level. However your ears are always open to new opportunities, and you should always be ready and willing to widen your understanding so you can more easily assess those opportunities. And maybe even find that opportune time to float the suggestion, the solution, that perfectly solves the problem before you.
Whether you're in the camera department, a gaff assist, or a post-production artist, you know better than me where your influence lies, and where the decisions are made that impact your work. But it is always better to do, to demonstrate rather than to simply offer up an idea, the only way I've effectively communicated Virtual Production to people is by physically dragging them into an LED Volume or slapping a VR Headset on their heads.
However MoCap overlaps with your niche of the industry, obtain a sample of it and make use of it on your own time, demonstrate how you would use it or how it can be used for you. If you're in lighting and you start using rangefinders to record their exact positioning relative to the camera for each setup and pass that off to the onset VFX Supervisor on your next gig they'll be absolutely wonderstruck with you.
Prompts:
Consider what MetaData or Scene Layout information you could record and supply for each setup.
Explore the AR/VR/Virtual Production PreVis tools that can apply to you, iPhone and iPad apps are an accessible entrance point (that have built in gyroscope MoCap).
If you are in an appropriate position you may suggest capturing extra visual information that may be helpful for MoCap purposes in post-production. This could be a Gaffer toggling through the light setups on their iPad during the slate pre-roll, or even a Grip dashing behind some objects with a chroma-key-panel during a rolling take. There's a lot of waiting around on a set, and there are always opportunities to make use of that time (in a non-disruptive way) to help the production run more smoothly.
Established & Emerging Producers
To the Producers out there, you're the reason I wrote this article so thank you, I hope you found it somewhat enlightening.
Fore-mostly, you should come away with an understanding and acknowledgement that your past projects and current workflows are almost certainly using post-production elements that could be expedited by MoCap, and are likely suffering from expensive workarounds that only arose by not having that MoCap data. Your VFX Supervisor has been protecting you from this knowledge, they see it as a battle not worth fighting, so it's up to you to reach out, float these ideas, and don't be surprised if they are initially reluctant to be fully honest with you...
Many standard workflows in the VFX Industry are known to be... not as quick as they should be or even an objectively subpar way of doing something. But it's done that way because if the VFX team relies upon metadata, or on-set information, to make it from Production all the way to them... typically it won't actually make it there even if it is recorded. But you're Producers, you’re making things happen, organising, and interfacing departments is your area, so let's change that stereotype.
Many of the MoCap solutions I've touched upon here are nonintrusive, and will have no disruption upon your Production schedule, nor cause delays on the day of shooting. Yes MoCap suits, and OptiTrack systems are an extra expense, but there are plenty of workable options between that level of excess and no effort whatsoever.
Prompts:
Reach out to your preferred Color Graders and VFX Artists, even on a personal basis, and ask what MoCap information would make their lives easier. You may be surprised how little of the Metadata you're assuming is being recorded and passed on to them never actually makes it there.
Have a discussion with whomever is in charge of your Data Management and Ingest, learn what Metadata is falling through the gaps and what procedures can be put in place to ensure additional camera information, lighting information, positioning information, can be recorded seemlessly.
If you have a VFX Supervisor, encourage them to be more forward in what they would like to see recorded, and maybe changed, during Production. If they're not already on-set, suggest it, and if they are, empower their voice to be appropriate of a HoD - too often VFX Supervisors stay silent as witnesses to what they know will cost hundreds of hours of time in Post.
Aim to incorporate a small MoCap element in your next (small) project, whether its an ad, corporate, or film, so you can emphasise its importance this one time, elevate everyone's familiarity with the workflows, and have a conversation with your VFX and Color Grading team afterwards to debrief on its effectiveness.
Make a habit of asking random members of different departments what they're currently enthusiastic about, what they've just heard about, what they want to try. Too often the biggest time savers, or lingering problems in a production don't make it high enough to reach the Producer's ears. Equally the latest technologies and breakthrough workflows don't get taken onboard until years after they were possible. Keep yourself ready, and willing to listen, and embrace what's exciting others in your industry.