Technical Paper Title: AVATAR–The Cast of Motion Capture Technology
Authors: D.Sneha Lakshmi & Y.Lakshmi Praveena , 2nd BTech, IT
College: Prakasam Engineering College, Kandukur
This paper presents a detailed description of the process of motion capture, whereby sensor information from a performer is transformed into an articulated, hierarchical rigid-body object. We describe the gathering of the data, the real-time construction of a virtual skeleton which a director can use for immediate feedback, and the offline processing which produces the articulated object. This offline process involves a robust statistical estimation of the size of the skeleton and an inverse kinematic optimization to produce the desired joint angle trajectories. Additionally, we discuss a variation on the inverse kinematic optimization which can be used when the standard approach does not yield satisfactory results for the special cases when joint angle consistency is desired between a group of motions. These procedures work well and have been used to produce motions for a number of commercial games.
Motion capture is a popular process for generating human animation. In this paper we describe the process whereby magnetic sensor information is transformed into an animated human figure. Our emphasis is on the data collection and processing that goes into determining an animation of a hierarchical 3D rigid-body skeleton.
Human motion capture techniques may be categorized according to the intended degree of abstraction imposed between the human actor and the virtual counterpart. Highly abstracted applications of motion capture data, analogous to puppetry, are primarily concerned with motion character, and only secondarily concerned with fidelity or accuracy. Beyond an initial calibration to insure that the `puppet’ or animated figure can be adequately manipulated, the most significant calibration takes place in the minds of the animators or puppeteers who learn to manipulate the figure indirectly as a puppet, rather than as a direct representation of themselves. Such applications commonly require the development of a unique capture procedure to take into account the characteristics of the puppet and its range of motion, and often rely on a combination of multiple actors, multiple input devices and procedural effects. Furthermore they often depend on real-time electromagnetic or electro-mechanical motion capture devices.
At the other end of the spectrum, efforts to accurately represent human motion depend on limiting the degree of abstraction to a feasible minimum. These projects typically attempt to approximate human motion on a rigid-body model with a limited number of rotational degrees of freedom. This work is not restricted to real-time systems, and is often conducted with non-real-time techniques such as optical tracking as well as with electromagnetic and electro-mechanical systems. It requires that close attention be paid to actual limb lengths, offsets from sensors on the surface of the body to skeleton, error introduced by surface deformation relative to the skeleton and careful calibration of translational and rotational offsets to a known reference posture.
Motion capture systems are commercially available, and the two main types of systems are optical and magnetic. At the present time neither has a clear advantage over the other, but magnetic systems are significantly cheaper. Considerable literature exists on using and editing motion capture data in animation.
Motion capture for use in animation has been surveyed in and various descriptions of the end product of its use have appeared but beyond descriptions of the various algorithms for inverse kinematics there has been little attention given to processing the data for use by inverse kinematics routines. The work gives an alternative technique to inverse kinematics for going from sensors on an actor to an animated articulated figure. The goal of producing an articulated rigid body is critical if additional motions which depend on dynamics are to be added, either from dynamical simulation or space time constraints additionally, accurate motion analysis is important to the biomechanics community.
The present work deals with the data processing discussed in but gives a detailed presentation of the processing techniques needed for a system which uses inverse kinematics as the base routine for producing the articulated figure. Inverse kinematic techniques are used because they have potential to avoid rotational error propagation which may result in unacceptable positions of end effectors when, for example, there is interaction with props. In the first part of our paper, we present the basic motion capture process, including sensor attachment and derivation and inference of rotational degrees of freedom (DOF). Note that this phase is accomplished in real time, so that a director can view the basic quality of the captured motion. Next we determine the skeleton lengths from the recorded data, and generate an inverse kinematic solution. In particular, we discuss our approach to the problems of noisy sensor data caused by a limited number of sensors, sensor slip, and sensor noise. We note that an advantage of our technique over many commercial systems is that it generates data which can be easily sub-sampled. Finally, we present additional steps which can be taken when our inverse kinematics optimization step fails to find a realistic or consistent solution between similar motions in a motion capture session.
2 Basic Motion Capture Process
Our motion capture data is generated from an Ascension Motion Star system and is input directly into a 3D modelling and animation program, Softimage, at capture time. Data is sampled at up to 144Hz. This high sampling rate is advisable when fast motions, such as sports motions, are captured; using slower sampling rates for such motions can often produce problems in the inverse kinematics phase, since there is less frame to- frame coherence. Actors are suited using from 13 to 18 six DOF sensors. The typical location for the sensors of an actor is shown in Fig. 1. Our motion capture method is designed to take advantage of (a) the real-time capability of the electromagnetic capture system which allows for careful initial calibration and periodic re-calibration of sensor offsets to the virtual skeleton (i.e., the non-kinematically constrained rotation and translation data), (b) animation tools in Softimage which allow fine control over secondary structures used as a source of derived or inferred data, and (c) the ability of statistical analysis and inverse kinematics to discard gross errors and outlying data, and to fit a hierarchical rigid body with a reduced set of DOFs to the data.
2.1 Sensor Placement
Our typical capture configuration relies primarily on the pelvis, forearms, head, and lower legs; for each of these, six DOFs are captured. These body segments are chosen for the degree to which they define the position and posture of the figure and for their comparative advantages as anchor points for the sensors. The data sampled for these segments are considered primary data, and are not processed beyond translating their six DOFs to their respective rotation points. Data for additional body segments are considered secondary and are inferred from the primary data. In particular, a 3D virtual skeleton is constructed that provides
On the left is a 13 sensor configuration; the gray dots Show sensors on the back of the body. On the right is an 18 sensor configuration. translational and rotational constraints enabling us to conveniently infer such things as the rotation of a virtual limb about its longitudinal axis, based on the orientation of a dependent limb.
2.2 Measuring and Building the Skeleton
Our production concerns dictate that the process of preparing an actor for a capture session, and building and calibrating a virtual skeleton be as convenient as possible. Our method largely automates the process of measuring rotational offsets, and requires a comparatively small set of manual adjustments for the final calibration. It does, however, require systematic hand measurements of translational offsets for all sensors for which translation data is used. The need for such measurements can be reduced by relying on methods based solely on the rotations of sensors secured to each body segment, and a single global translation. However, the tendency of rotation-based techniques to propagate error can make them unwieldy for tracking motions that rely on the precise placement of the hands and feet, such as self-referential motions and motions that depend on extensive interaction with props. Prior to securing the sensors, the actor’ s limbs are carefully measured. After the sensors are in place, their translational offsets are measured according to a coordinate system based on an arbitrary reference posture or “zero position.” All measurements are rounded to the nearest 0.25 inches but are assumed to be somewhat less accurate. A skeletal model based on the measured limb lengths and offsets, and posed in the zero position is then generated programmatically.
This model maps the data from each sensor to that sensor’ s corresponding “null model.” A null model is a node in the virtual skeleton which has no geometry. Null models are used to introduce an offset between the motion of two linked objects. For each capture sensor there is a null-model that holds its rotational and translational offset to the virtual skeleton. The translational offsets are assumed to be approximately correct. The rotational offsets are arbitrary and are assumed to be wrong, as the sensors are oriented on thebody according to practical concerns such as cable management and sensor stability. Before a hierarchical relationship is established between the captured input and the joint centers, a single key frame of rotation data is recorded with the actor standing in the zero position thus setting the offset to the frame of captured data. Fine calibration is necessary to account for any error in the measurements of the limbs and the translational offsets, and to correct for the degree to which the actor cannot perfectly assume the theoretical zero position of the skeleton. This fine calibration is accomplished by manually adjusting the translations and rotations of the null model in an interactive calibration mode, that allows manipulation of scene elements while the capture device and drivers are running. A simple set of routine motions is generally sufficient to identify any necessary refinements to the calibration; the arms may not line up, the hands may not come together, the legs may not be parallel, and so on. In practice, the fine calibration is primarily confined to the adjustments to the offsets for six sensors–those on the lower legs, the lower arms, the pelvis, and the chest .The resulting skeleton closely approximates the actor’ s motions and tends to localize error caused by sensor migration.
At capture time, data is recorded for the sensors and used to drive the virtual skeleton. The next stage
is the optimization step. The data used in this step is not the sensor data, but data which represents the
translation and rotation for each joint.
Given a set of data based on the virtual skeleton described above, our goal is to construct an articulated,
Hierarchical rigid-body model. The model to which we will fit length and rotational data is shown in Fig. 3,and contains 38 joint degrees of freedom and six degrees of freedom at the root, located in the center of the pelvis, for positioning and orienting the entire figure. Our first task is to extract the best limb lengths from the motion capture data. Once the scale of the segments is determined, an inverse kinematics solution is calculated to determine the joint angles for the figure. Our inverse kinematics routine uses penalty functions to constrain the joint angles to approximate a human’s range of motion.
3.1 Optimizing the skeleton
As mentioned previously, motion capture data is noisy and often contains gross errors. The source of the noise is primarily the magnetic sensors themselves, although we note that in our experience optical data is as noisy. We determine the size of the skeleton by determining the distances between the translated joint locations over a motion or repertoire of motions. Using the simple arithmetic mean to compute these distances results in answers unduly distorted by the gross errors, and editing the data by hand to remove outliers is impractical. As an example, gross errors in fast motions such as throwing may, for a frame or two, give a distance between the elbow and wrist of over 3 meters. Thus a robust statistical procedure which can minimize or reject the influence of these outliers is employed.
3.2 Inverse Kinematics
Once the hierarchical model has been determined using robust statistical analysis of the data, each frame of data must be analyzed to produce a set of joint angles. Additionally, these joint angles should form reasonable piecewise-linear DOF curves which can be sampled at any time, not just the original frame times.
Piecewise-linearity is extremely useful if the data is to be sub-sampled. In contrast, many commercial datasets often contain discontinuities in the rotational data from frame to frame, which make sub-sampling impossible .Our primary and secondary data yield information about many areas of the body, giving us a highly constrained kinematic problem. This problem can be solved using a non-linear optimization technique, which seeks to minimize the deviation between the recorded data and the hierarchical model. We use
a modification to the technique presented .
to produce our DOF curves in the form of piecewise-linear functions. If the data is relatively non-noisy and the skeleton is well formed, this technique will work well. It can produce poor results if these conditions are not present. Robust statistics helps to insure these conditions by making the best skeleton and by marking data points which are considered outliers. Since a hierarchical description of a skeleton is a biological simplification and since non-linear optimization is hard, the analysis can still fall into insufficient local minima if the starting guess for the optimization is far from the desired solution.
Our internal representation of rotations is as XYZ Euler angles. This representation was originally used because it provided simplicity in our code. However, Euler angles provide only a local parameterization of the group of rotations in R3, and thus have singularities. While our technique works well for many motions, it cannot be denied that the use of a global parameterization, such as quaternion, would be better. We mitigate this problem in two ways. The first technique is simple: use the result of frame i for the starting guess of frame (i+1).For many motions, this technique is perfectly valid. It suffers if the data and the skeleton are mismatched near where the skeleton goes through a singularity or where the data points are too far apart in time for a given motion’s velocity. Additionally, it will suffer if it never finds a good solution for an initial frame. Over the shoulder reaching, fast motions, and motions where the arm is extended to its
Figure 4: Walk motion
limit are examples. If this happens, the solution can jump over to another local minimum and stay there. This
behavior is not desirable.
Analyzing a walk motion of 6:7 seconds duration at 30 frames per second required 306 seconds on a Pentium 133 machine with 4389 BFGS iterations for satisfactory convergence of the solution. A selected frame showing the fit of the skeleton (yellow) to the data (black) is shown in Fig. 10 (see Appendix). Notice that the fit is extremely good and shows only a slight discrepancy in the left arm. The resulting walk motion
is shown in Fig. 4.
A further refinement of the motion capture analysis presented here is to use motions to bootstrap one another by providing good starting guesses to the BFGS optimization. The assumption for this technique is that many motions of similar structure are to be analyzed, and that motions of a similar structure will have similar joint angle curves. Such a data set might include reaches, runs, walks, etc. These sets are likely to be a part of any motion capture session.
Assume there is a motion M, a set of DOF curves, for a motion of type T: If we have a motion capture dataset for another motion of this type, the joint angles for this motion will be similar to those for motion M. The main difference between the solution for the new desired motion ~M and M will be a time warping to account for differences in the relative timing between M and the captured data. Thus a scaling in time on the data sets is needed. We mark a set of correspondence times, key times, in M and in the data set. We time-warp M and then use that as the starting guess for the inverse kinematics optimization described earlier. Thus, this technique will not propagate errors, whereas in the previous technique a bad starting guess may result in a bad solution, which can propagate from frame to frame. As a result, similar motions will make similar use of their joints when analyzed. This similar joint use is important when these motions are later used in techniques like those presented in, for example, Note that this technique requires operator intervention to mark the key times, and thus it is only employed for groups of datasets where the a fore mentioned method did not work.
The reach motion shown in Fig. 5 was analyzed using the same technique as the walk.
We have presented a detailed account for taking human performance sensor data and producing animations of articulated rigid bodies. Our technique involves using geometric modelling to translate rotation data to joint centers, a robust statistical procedure to determine the optimal skeleton size, and an inverse kinematics optimization to produce desired joint angle trajectories. We presented a variant of the inverse kinematics optimization to be used when an initial approach has not yielded satisfactory results, for the special cases when joint angle consistency is desired between several motions. This procedure has been used to produce a number of motions for various commercial games and has been found to work well. Fig. 11 shows the motion capture process at the various stages of processing: the first figure shows the capture phase, where an actor is interacting with a prop (a model of an Indy car); the second shows the articulated rigid body skeleton obtained after inverse kinematics optimization; the third shows this skeleton again repositioned with the prop; the fourth shows a full rendering of the character and the prop. Notice that the inverse kinematics optimization has not changed the location of the end effectors significantly, since they are still able to interact with the prop. Finally, we remark that if any animator intervention is required, this intervention will occur in Fig 3 and Fig 4.
Above Figure : Shoulder DOFs for the medium reaching motion
Above Figure: Shoulder DOFs for the low reaching motion, using the medium reach as a reference guess.
The least satisfactory aspect of the motion capture method described here is the gross over-simplification of the motion of the spine. This aspect is not an inherent limitation of the method and can be improved by capturing or inferring data for the abdomen, thorax and neck as distinct segments. The need to obtain reasonably accurate translational offsets for sensors and to carefully calibrate the virtual skeleton has required the development of an efficient systematic approach to measuring the skeleton and the sensor positions. This approach allows us to successfully capture complex motions involving good registration between the virtual actor and the virtual representations of props in the capture space. However, it requires a fairly high degree of production preparedness for its efficient execution. This process is a likely candidate for a more general and robust solution.
Changing our internal representation of rotations from Euler angles to quaternions would likely help our
inverse kinematics processing, particularly for fast sports motions such as throwing. Better models of the human body, including more accurate skeletons and more realistic joint angle constraints will yield better analysis of motion capture data. In the statistical analysis of the data, it is likely that a “redescending” estimator, which has the ability to reject outliers outright, would produce better results, and this will be explored further. Optimization techniques such as active set optimization rather than penalty-based ones, may give better adherence to joint angle constraints. Additionally, methods for self-calibration and other avenues which reduce the operator workload and shorten the time to produce an animation will be a major avenue of research.
Figure :Fit between skeleton(yellow)and data(black):
 BADLER, N. I., HOLLICK, M. J., AND GRANIERI, J. P. Real-time control of a virtual human using
 BADLER, N. I., PHILLIPS, C. B., AND WEBBER, B. L. Simulating Humans: Computer Graphics
Animation and Control. Oxford University Press, Oxford, UK
 BRUDERLIN, A., AND WILLIAMS, L. Motion signal processing. In Computer Graphics
. And Proceedings of SIGGRAPH
 GILL, P. E., MURRAY, W., AND WRIGHT, M. H. Practical Optimization. Academic Press,
HAMPEL, F. R., RONCHETTI, E. M., ROUSSEEUW, P. J., AND STAHEL, W. A.
Robust Statistics: The Approach Based on Influence Functions. John H.Wiley, New York