PubMedia CMS feature request
(This post began life as an email thread, but maybe needs to be more public so here it is. Edited and expanded for obsessive clarity…)
It strikes me as somewhat simple (OK maybe not exactly simple) to develop a Drupal-based CMS with enough commonly-needed features for public radio/TV stations. You’d have your pre-built data types, skinable templates, forms, and possibly a set of pre-defined roles and workflows. All nicely documented etc.
But what we all really want is a system that knows about media files. You could upload (or link to) a media object, and the CMS would extract its available metadata. The system would then save that metadata in its database for processing and display in various ways. On web pages where media is published, the system would display its media type, length, bitrate, framerate, whatever. Then of course we’d be adding by hand other metadata like title, subject, author, keywords, description, etc as we add media content to the website. Ideally, the system would be able to automatically read ID3 tags, MXF, and EXIF metadata for both technical and descriptive information. The idea is to automate the capturing of metadata as much as possible.
For web pages, we’d probably want to display mostly descriptive metadata, and not things like sampling rate, bit depth, color format, etc.
But for RSS feeds we need some of that technical metadata like filesize and mimetype.
And here’s the good part: If we capture enough information about our media objects, we can easily express it as “shareable metadata” via PBCore-compliant XML, and other standard schema. So the CMS becomes a powerful tool for creating a large index of public media. We can then write applications to search that index at a very fine level of detail. Think Technorarti only focused entirely on media objects expressed as detailed XML records.
At WILL we currently catalog media objects (as I call them) using our CMS, but there’s no automatic extraction of anything. We have to key in all the data. But once that’s done, the output looks like this:
http://will.illinois.edu/metadata/pbcore/pf2008-04-17-a
Seems to me this is the beginning of a system-wide super API that doesn’t depend on any central organization, and is truly open source.
Existing open source PHP functions for automated metadata extraction could be integrated in a Drupal-like CMS. The PHP ID3 function allows for reading and manipulating ID3 tags; the PHP Exif Functions can extract all kinds of metadata from JPEGs and TIFFs. Similar functions may already exist for video files.
If we have a CMS that understands how to read existing metadata from the digital objects we feed it, we’re half-way to building an online digital asset management system. More on that in Part Two…
Jack Brighton
Hey John, though I am not using Drupal, these specs and needs are right on with our open source project Django Newsroom. I’d love to help flush these details out and invent the wheel so others don’t have to.
Comment by mandric — December 8, 2008 @ 2:47 pm
Apologies, meant to address Jack not John.
Comment by Milan Andric — December 8, 2008 @ 2:58 pm