Video digitization has more decisions to make

As Keely pointed out in the interview, decisions on video formats are more complicated than still images. Video formats for preservation purpose must be of lossless or uncompressed. Due to huge storage required by uncompressed files, lossless compression is usually more desirable. In addition to selecting a codec, several parameters are critical to encoding, such as frame rate, frame resolution, aspect ratio, interlace mode, pixel format (e.g., a variety of YUV formats), color space, and so on. All of those decision points apply when to produce presentation versions, with one more important consideration of bit rate, especially essential to deliver contents in an ecosystem of varying devices and network bandwidths.

In audio preservation there are more consensuses on preferable formats such as Broadcast Wave, whereas in the video sphere there is less agreement with codec or container format. The recent course of action by some facilities in Library of Congress to adopt MXF-wrapped JPEG2000 as the video preservation format might be a sign for the future. It is interesting to track how influential this LC’s decision would be on other libraries.

I think the greater turbulence in video world is due to the fact that the video technologies are still rapidly evolving. Not too long ago, people just started to accept HD. Now the industry is pushing 2k or 4k. By contrast, technologies in still image and audio are relatively matured and stabilized.

Besides file formats and technology issues, the challenges in video digitization librarians have to deal with are basically the same as other types of digitization. As mentioned in Liu’s paper, librarians have to determine the prioritization of what to be digitized, standards to be used, copyright policies, and so on. I believe the prioritization is particularly challenging in video digitization because there are already enormous amount of born-video and they are expecting to grow exponentially. Moreover the cost of video preservation is far higher.

Finally I would like to describe a little bit about container. Many people have already used container without knowing it. For example, the popular AVI or MOV are actually container format. Containers are able to carry contents encoded by different codec. So an MOV file may contain AVC contents encoded by .H264 codec or just an RGB uncompressed stream. The question is why we want to have this additional layer. The major motivation is for cross-platform interexchange. Theoretically a container can wrap any data conforming to its specification. In addition to video frames or audio samples, a container can carry “payloads” for packaging, transporting and presenting purposes. The aforementioned MXF format adopted by LC is an SMPTE standard widely used in professional video and audio media. The wiki page at well compared an array of container formats.



4 responses to “Video digitization has more decisions to make

  1. Hsikai, thanks for this exceedingly informative post and breakdown of the state of the art with regard to video. The storage issues related to keeping raw (read: uncompressed/lossless) video for the WUA digital library was one of the first things I know that Keely had to deal with when she came on board. Their committment there to offering video in formats for people who may be making films means that they are unable to just offer compressed, lower-quality files and call it good. This brings into play all of the issues you brought up, and then there’s the added issue that, with a library that is collecting content from the general public, the video starts life at a range of quality levels, from snapped on someone’s flip phone all the way up to shot with a Canon 7D.

    Your explanation of container formats is extremely helpful. Thanks so much for the link!

  2. Thanks for your wealth of information. One question I have, which you brought up briefly, concerns proprietary formats, like blu-ray and HD. Are there any ways that digital librarians could have access to these formats, or any reason we might need to have them? Are they only useful for physical containers, or are different kinds of information encoded in them that make them not only ideal, but essential, for the content recorded? Maybe I’m not asking this right because I’m not very familiar with them. Do you know what I mean?

    • Hi, Eric:
      Thanks for your liking my posting.
      For Blu-ray disc, the three standard codecs used are MPEG2, AVC, and VC1. Stereoscopic 3D clips use MVC. Most Blu-ray software allow users to pick their favorite codecs which are shipped together with the tool itself. A market leader in Blu-ray authoring tool is Blu-Print ( But there are far less-expensive versions for simple authoring and replication purposes. Libraries might use them to aggregate clips and burn to Blue-ray discs for delivering on requests, for example.

  3. I really liked your post. It was very informative. I agree with you when it comes to video preservation. The types of videos and their formats are changing rapidly and that means that the ways we preserve these materials is changing too. Keely explained in her video that material, especially for preservation, is saved in a format that is most stable, meaning that it can be accessed easily in the future since it is a more common format and less likely to change in the future. With videos this could be difficult as things are changing more quickly than other types of material, like audio or text.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s