LUMIERA.clone/doc/devel/rfc/DelectusShotEvaluator.txt

398 lines
17 KiB
Text
Raw Normal View History

[grid="all"]
`------------`-----------------------
*State* _Parked_
*Date* _2008-09-21_
*Proposed by* link:nasa[]
-------------------------------------
Delectus Shot Evaluator
-----------------------
This is a brain dump about the shot evaluator subproject.
Description
~~~~~~~~~~~
Brainstorm on Delectus
~~~~~~~~~~~~~~~~~~~~~~
Some (many) of the ideas presented herein come from the various parties
involved in the Lumiera discussion list and IRC channel #lumiera.
http://lists.lumiera.org/pipermail/lumiera/2008-September/000053.html[] -- the
main discussion thread
Additionally, a lot of great concepts for how to streamline the interface are
derived in part from link:KPhotoAlbum[].
I use tags, keywords, and metadata almost interchangeably, with the exception
that metadata includes computer generated metadata as well. These are not tags
in the conventional sense -- they don't have to be text. In fact the planned
support (please add more!) is:
* Text -- both simple strings (tags) and blocks
* Audio -- on the fly (recorded from the application) or pregenerated
* Video -- same as audio
* Link -- back to a Celtx or other document resource, forward to a final cut,
URL, etc
* Still image -- inspiration image, on set details, etc
* ID -- such as the serial number of a camera used, the ISBN of a book to be
cited, etc
As such, the tags themselves can have metadata. You can see where this is
going...
Also, the tags are applied to "clips" -- which I use interchangeably between
source material imported into the application and slice of that material that
tags are applied to. Any section of a video or audio source can have tags
applied to it.
Two key functions: assign metadata and filter by metadata.
clips are one thing; but in reality most clips are much longer than their
interesting parts. Especially for raw footage, the interesting sections of a
clip can be very slim compared to the total footage. Here is a typical workflow
for selecting footage:
. Import footage.
. Remove all footage that is technically too flawed to be useful.
. Mark interesting sections of existing clips, possibly grouped into different
sections.
. Mark all other footage as uninteresting.
. Repeat 3-4 as many times as desired.
Some key points:
* Import and export should be as painless and fast as possible.
* Technically flawed footage can be both manual and computer classified.
* In some cases (e.g. documentaries, dialog) audio and video clips/footage can
follow different section processes.
It is possible to use video from footage with useless audio or use audio
from footage with useless video.
* "Interesting" is designed to be broad and is explained below.
* steps 2-5 can be performed in parallel by numerous people and can span many
different individual clips.
In simple editors like Kino or iMovie, the fundamental unit used to edit video
is the clip. This is great for a large number of uses, such as home videos or
quick Youtube postings, but it quickly limits the expressive power of more
experienced engineers in large scale productions (which are defined for the
purposes of this document to include more than 2 post-production crew members).
The clip in those editors is trimmed down to include only the desired footage,
and these segments are coalesced together into some sort of coherent mess.
The key to adequate expressive power is as follows:
* Well designed, fast metadata entry. Any data that can be included should by
default, and ideally the metadata entry process should run no less than
about 75% as fast as simple raw footage viewing. Powerful group commands
that act on sections of clips and also grouping commands that recognize the
differences between takes and angles (or individual mics) enhance and speed
up the process.
* Good tools to classify the metadata into categories that are actually
useful. Much of the metadata associated with a clip is not actively used in
any part of the footage generation.
* Merging and splicing capabilities. The application should be smart enough to
fill in audio if the existing source is missing. For example, in a recent
project I was working on a camera op accidently set the shotgun mike to test
mode, ruining about 10% of the audio for the gig. I was running sound, and
luckily I had a backup copy of the main audio being recorded. This
application should, when told that these two are of the same event at the
same time, seamlessly overlay the backup audio over the section of the old
audio that has been marked bad and not even play the bad audio. This is just
background noise, and streamlining the immense task of sorting through
footage needs to be simplified as much as possible.
* Connection to on site documentation and pre-production documentation. When
making decisions about what material to use and how to classify it, it is
essential to use any tools and resources available. The two most useful are
onsite documentation (what worked/didn't work, how the weather was, pictures
of the setup, etc all at the shoot) and pre-production (what the ideal scene
would be, what is intended, etc). Anything else that would be useful should
be supported as well.
* Be easily accessible when making the final cut. Lumiera is, if the
application gets up to speed, going to serve primarily to render effects,
finalize the cut, and fine tune what material best fits together. Any
metadata, and certainly any clipping decisions, should be very visible in
Lumiera.
* Notes, notes, notes! The application should support full multimedia notes.
These differ from (4) in that they are generated during the CLASSIFICATION
process, not before. This fits in with (5) as well -- Lumiera should display
these notes prominently on clip previews. The main way for multiple parties
to communicate and even for a single person to stay organized is to add in
notes about tough decisions made and rationale, questionable sections, etc.
These notes can be video, audio, text, etc from one of the clips, from the
machine used to edit (such as using a webcam or microphone), or over the
network (other people's input).
Too technically flawed
^^^^^^^^^^^^^^^^^^^^^^
A clip is said to be too technically flawed if it has no chance of making it to
the final product whatsoever. This does not, however, preclude its use
throughout the post-production process; for example, part of a clip in which
the director describes his vision of the talent's facial expression in a
particular scene is never going to make it into the final product, but is
invaluable in classifying the scene. In this case, the most reasonable place to
put the clip would be as a multimedia note referenced by all takes/angles of
the scene it refers to.
As mentioned above, flawed video doesn't necessarily mean flawed audio or
vice-versa.
Interesting
^^^^^^^^^^^
An "interesting" clip is one that has potential -- either as a metadata piece
(multimedia note, talent briefing, etc) or footage (for the final product OR
intermediary step). The main goal of the application is to find and classify
interesting clips of various types as quickly as possible.
Parallel Processing
^^^^^^^^^^^^^^^^^^^
Many people, accustomed to different interfaces and work styles, should be able
to work on the same project and add interactive metadata at the same time.
Classification interface
++++++++++++++++++++++++
The classification interface is divided into two categories: technical and
effective. Technical classification is simply facts about a clip or part of a
clip: what weather there is, who is on set, how many frames are present, the
average audio level, etc. Effective classification allows the artist to express
their feelings of the subjective merits (or failures) of a clip.
DCMS
^^^^
The project is organized around a distributed content management system which
allows access to all existing materials at all times. Content narrowing allows
for a more digestible amount of information to process, but everything is
non-destructive; every change to the clip structure and layout is recorded,
preferably with a reason as to why it was necessary or desired.
Content narrowing
^^^^^^^^^^^^^^^^^
With all of the information of an entire production available from a single
application, information overload is easy. Content narrowing is designed to fix
that by having parts of individual clips, metadata, or other files be specific
to one aspect of the overall design. This allows for much more successful use
of the related information and a cleaner, streamlined layout. As an example,
metadata involving file size has no effect whatsoever on the vast majority of
most major decisions -- the answer is almost always "whatever it takes." Thus,
it would not appear most of the time. Content narrowing means that it is easy
to add back footage -- "widen the view" one step, add it back, and "narrow the
view" again.
Multiple cuts
^^^^^^^^^^^^^
There is no need to export a final cut from this application; it merely is the
first step in the post-production chain. It is the missing link between
receiving raw footage from the camera and adding the well executed scenes to
the timeline. What should come out of the application is a classification of
Situational, take, and instance tagging
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is VERY powerful. The first step to using the application is to mark which
scenes are the same in all source clips -- where same means that they contain
sections which would both not run. This can include multiple takes, different
microphones or camera angles, etc. The key to fast editing is that the
application can edit metadata for the situation (what is actually going on IN
THE SCENE), take (what is actually going on IN THIS SPECIFIC RUN), and instance
(what is actually going on IN THIS CLIP). If editing a situation, the other
referenced clips AUTOMATICALLY add metadata and relevant sections. This can be
as precise and nested as desired, though rough cuts for level one editing
(first watchthrough after technically well executed clips have been selected)
and more accurate ones for higher levels is the recommended method.
Subtitling
^^^^^^^^^^
This came up on the discussion list for Lumiera, and it will be supported,
probably as a special tag.
nasa's Laws of Tagging
^^^^^^^^^^^^^^^^^^^^^^
. There is always more variety in data than tags. There are always more
situations present in the data than can be adequately expressed with any
(reasonable) number of tags. This is OK. All that is needed is the minimum
set of unique tags to progress to the next cycle without losing editing
intent or the ability to rapidly evaluate many situations.
. Many tags are used many times. "Outdoors" will be a very, very common tag; so
will "redub." If conventional names are decided upon and stuck to, it is
significantly easier to map the complex interactions between different
content situations.
. Avoid compound tags. Do not have "conversation_jill_joe" as a tag; use
"conversation," "jill," and "joe" instead. It is very easy to search for
multiple tags and very hard to link data that doesn't use overlapping tags.
The interface -- random idea
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is not meant to be a final interface design, just something I wrote up to
get ideas out there.
key commands
mutt/vim-style -- much faster than using a mouse, though GUI supported.
Easy to map to joystick, midi control surface, etc.
Space stop/start and tag enter Tab (auto pause) adds metadata special Tracks
have letters within scenes -- Audio[a-z], Video[a-z], Other[a-z] (these are not
limits) -- or names. Caps lock adds notes. This is really, really fast. It
works anywhere. This means that up to 26 different overlapping metadata
sections are allowed.
Prompting Prompting for metadata is a laborious, time-consuming process. There
is no truly efficient way to do it. This application uses a method similar to
link:KPhotoAlbum[]. When the space key is held and a letter is pressed, the tag
that corresponds to that letter is assigned to the track for the duration of
the press. (If the space is pressed and no other key is pressed at the same
time, it stops the track.) For example, suppose that the following mapping is
present:
o = outside
x = extra
p = protagonist
c = closeup
Then holding SPACE over a section and pressing one of these keys would assign
the tag to the audio AND video of the section over which the space was held. If
instead just the key is pressed (without space being held), that tag is
assigned to the section over which it is held. This is very fast and maps well
to e.g. PS3 controller or MIDI control.
If LALT is held down instead of SPACE, the audio is effected instead. If RALT
is held, just the video is effected.
In order to support scenario/take/clip tagging:
The default is situation. If the keybinding to x is:
x = t:extra ; effect only take
x = ts:extra ; effect take and scenario
x = c:extra ; extra only visible in this clip!
x = tc:extra ; this take and clip show the extra
etc
Other keyargs (the part in front of the colon) can be added to account for
other uses (e.g. l = all taken on the same location).
Tab is pressed to add metadata mappings. Tab is pressed to enter metadata edit
mode; this pauses video. Then press any key to map; and type the tag to
associate (with space, multiple tags can be added.). The following specials are
defined:
[:keyarg:]:TAG is special tagging for scenario/take/clip.
!TAG removes TAG if it is present. This is useful because it allows huge
sections of the clip to be defined as a certain tag, then have parts
removed later.
a:TAG applies TAG only to the audio.
v:TAG applies TAG only to the video.
p:PATH adds a link to PATH as a special tag.
(This will have a nice GUI as well, I just will always use the keyboard method
so I am describing it first. Mapping configurations can be stored in a
separate file, as a user config, or in the specific project.)
If ESC is pressed, all currently ranged tags are ended.
Finally, if single_quote is pressed without SPACE or {L,R}ALT down, it marks an
"interesting location." Pressing SHIFT+single_quote goes to the next
"interesting location" and pressing CNTRL+' goes to the previous "interesting
location." This allows for very quick review of footage.
Comments
--------
Rating - Quantitative Rating as well as Qualitative Tagging
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The importance/value of the video for various factors uses, can vary through
the video. It would be helpful to have the ability to create continuous ratings
over the entire track. Ratings would be numerical. Automatic clip
selection/suggestion could be generated by using algorithms to compute the
usefulness of video based on these ratings (aswell as "boolean
operations"/"binary decisions" done with tags). The ratings could be viewed
just like levels are - color coded and ovelayed on track thumbnails.
- Tree 2008-10-25
link:MultiView[] - useful for concurrent ratings input
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
It would be convenient to have an ability to view the different tracks (of the
same scene/time sequence) at once, so the viewer can input their ratings of the
video "on the fly", including a priority parameter that helps decide which
video is better than what other video.See the GUI brainstorming for a viewer
widget, and key combinations that allow both right and left hand input, that
could be used for raising/lowing ratings for up to six tracks at once.
- Tree 2008-10-25
I like the idea of rating clips (or rather, takes) a lot. It would be cool to
include both "hard," "relative," and "fuzzy" rating. Hard is an exactly defined
value (scaled 0-1) that puts the clip in an exact location in the queue.
Relative means that one is higher or lower rated than another. Fuzzy is a
slider which is approximate value, and there is some randomness. The best part
is that these can be assigned to hardware sliders/faders. Pressure sensitive
buttons + fuzzy ratings = really easy entry interface. Just hit as hard as
needed! Multiple tracks at once also an astounding idea. I could image some
sort of heap (think binary heap, at least for the data structure) which
determines the priorities and decides which clips are played. Then the highest
rated clips are played first, down to the worst.
- link:NicholasSA[] 2009-01-04
Possible Collaboration with the people from Ardour?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I guess if the thing can do all the things we talked about here, it would be
perfectly suitable for sound classification too, and maybe could fill another
gap in FOSS: Audio Archival Software, like this:
http://www.soundminer.com/SM_Site/Home.html[] (which is very expensive)...
maybe the Ardour people would be interested in a collaboration on this?
I like the suggestion of sound classification with a similar (or, even better,
identical) evaluator. link:SoundMiner[] looks interesting, but like you say
very expensive. I'm a sound guy, so I feel your pain...
- link:NicholasSA[] 2009-01-04
Parked
~~~~~~
Decided on Developer meeting, until someone wants to investigate this further.
Do 14 Apr 2011 02:52:30 CEST Christian Thaeter
Back to link:/documentation/devel/rfc.html[Lumiera Design Process overview]