lumiera_/doc/technical/infra/MenuGen.txt

Website Navigation Generator
============================
:Author: Hermann Voßeler
:Date: 2/2011


This page contains documentation and notes regarding the +menugen.py+ --
written 2/2011 during our attempt to get the new Lumiera website online finally.
The link::http://git.lumiera.org/gitweb?p=website-staging;a=blob;f=menugen.lua;h=aad2129d7f4ed3f3b35b2fc3ac2a63a9f1bfb62d;hb=menugen[initial draft version] was written by _cehteh_ in Lua


**************************************************************************
The purpose of the +*menugen*+ script is to maintain the navigation menu
on the Lumiera website semi-automatically. In the usual setup, this script
is triggered from a _Git push_ -- it walks the web subdirectories and
discovers menu entries. The generated HTML page contains both visible
elements and JavaScript snippets to display and highlight the menu
on the client side appropriately
**************************************************************************

Overview: how it works
----------------------
The menu generation and display is comprised of several parts working together

. the +build_website.sh+ is triggered as a Git post-receive hook, whenever new
  commits are transfered to the website Git repository. After discovering new
  Asciidoc source files and generating the corresponding HTML files, the
  menu generator script is invoked
. the +menugen+ python script walks the subdirectories to discover possible
  menu contents. It visits Asciidoc source files (`*.txt`) and picks up

  - the location / URL
  - the title
  - special `//MENU:` directives embedded in Asciidoc comments

. after building a complete menu tree (actually a DAG), this data structure
  is walked to generate output HTML into a `menu.html` file in website root.
. the page template (`page.conf`) for generated Asciidoc pages contains an
  +<IFrame>+ to display this `menu.html`
. when loading `menu.html`, some JavaScript elements generated into the body
  alongside with the visible content will execute, causing a lookup table
  in the client side memory being populated with the menu entries and parent
  dependencies. Each individual menu entry has an attached unique ID, originally
  generated by the server side +menugen+ script. The clientside JavaScript always
  addresses elements directly through these IDs, mostly ignoring the actual DOM
  structure
. whenever a new webpage is loaded, the `onload` handler on the +<IFrame>+ (or
  a similar mechanism) invokes the +markPageInMenu()+ JavaScript function, which
  addresses the IFrame by its ID `inavi`, and calls into the JavaScript located
  there. This script in turn finds the menu entry corresponding to the current
  page with the help of the lookup table mentioned above; this allows to highlight
  the current page and fold any other branches of the menu to keep the visible
  part reasonably small to fit on a single page
. folding and highlighting changes are done by manipulating the style of these
  elements; the actual presentation is mostly controlled by a `menu.css`
. any further JavaScript functions used to operate the menu are located in
  the statically served `menu.js` -- the generated menu contains only the
  ``moving parts''

Configuring menu generation
---------------------------
While, generally speaking, the script was written to remove the need to care
for the menu most of the time, there are numerous extension points and configuration
options to deal with special cases. Adjustments can be done on several levels:

* the +menugen+ python script contains in embedded set of _predefined menu entries,_
  forming the backbone of the generated menu. The use of this feature is optional
  and can be enabled with the `-p` or `--predefined` switch. These predefined
  configuration steps are done in a function +addPredefined()+ right at the top;
  the configuration is written in the style of an _internal DSL_ and should be
  fairly self explanatory.
* when discovering Asciidoc page sources, special `//MENU:` directives are
  processed (`//` marks an Asciidoc comment). The remainder of such a line
  is always parsed as a single directive; in case of a parsing error a warning
  is printed and the line will be ignored. The individual directives mostly
  correspond to similar functions usable in the aforementioned internal DSL;
  actually both kinds of configuration have the same effect: they attach
  some modification command to the menu element in question. Note especially
  that such directives can modify the discovery of further pages -- pages
  can be attached, excluded, ordered; and the discovery can be redirected
  to another subdirectory.
* the actual code generation is mostly based on python template code contained
  in a separate script +menuformat.py+ -- located alongside the main menu generator
  script. This code generation is driven by a classical recursive tree visitation
  over the menu data structure built up thus far; code generation hooks are called
  on each tree leaf and when entering and leaving inner nodes (submenu nodes).
* the highlighting is done by the client side JavaScript in +js/menu.js+ --
  mostly just by _adding or removing CSS classes_ dynamically. The actual styling
  of the menu entries is thus largely independent of the menu generation (but of
  course the CSS selectors must line up with the basic structure of the generated
  code). The current version of this CSS stylesheet makes heavy use of _contextual
  selectors_ and the general cascading mechanism to build up the final style; e.g.
  the indentation according to the menu level is done by attaching a style based
  on the number of nested HTML elements.


Summary of menu placement directives
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With the term _placement directives_ we denote all the adjustments and configuration
possible either through the internal DSL for the predefined menu structure, or through
the `//Menu:` lines in the individual pages.

addressing menu nodes
^^^^^^^^^^^^^^^^^^^^^

Each menu entry corresponds to a menu node in the internal data structure. In the most
general case, this structure is a _Directed Acyclic Graph_, because a node might be
hooked up below several different parent nodes. In this case, such a node will also
be visited multiple times for code generation -- one time for each parent it is
attached below. Amongst these parent nodes, the first parent node attached is called
the _primary parent_, because this first attachment of a node defines the _logical
path_ uniquely describing this node. Note, this logical path can be different to
the actual web paths / URLs generated, and also be different to the file system
path where the source file resides. It is just defined by the chain of parent
nodes leading to the root of the menu data structure.

The leaf element of this logical menu path is called the _ID_ of the node. Typically
this ID corresponds to the filename without the extension. But for the code generation
and the client sides JavaScripts, the full menu path is used as an HTML id element,
because -- generally speaking -- only the full menu path denotes an element unambiguously.

When working with nodes, and especially when writing placement directives in the individual
source files, in most cases it is not necessary to specify the full menu path of a node.
Actually, nodes can be addressed by any path suffix, and even just by the bare node ID.
But when there is an ambiguity, just the first node found is picked. Because nodes have
an unique identity, this can sometimes yield rather wired results. To minimise the
danger of ambiguities, the _discovery_ of source pages always addresses the menu node
to be populated with the full menu path.

configuration example
^^^^^^^^^^^^^^^^^^^^^

[source,python]
--------------------------------------------------------------------------
def addPredefined():
    root = Node(TREE_ROOT, label='Lumiera')                                <1>
    proj = root.linkChild('project')                                       <2>
    proj.linkChild('faq')

    proj.prependChild ('screenshots')                                      <3>
    proj.putChildLast ('press')
    proj.putChildAfter('faq', refPoint=Node('screenshots'))                <4>

    proj.link('http://issues.lumiera.org/roadmap', label="Roadmap (Trac)") <5>
    Node('rfc').sortChildren()
--------------------------------------------------------------------------
<1> the _root node_ by convention uses a special ID token. Additional
    fields of the node object can be given as named parameters. Here
    we define the visual menu label to be ``Lumiera''
<2> a child node `root/project` is attached. Note: this node will
    later be picked up, when the actual page discovery delves down
    into the 'project' subdirectory and encounters a 'index.txt'
    there. Index files are always searced _within_ the directory;
    they may be called `index.txt` or use the same name as the
    enclosing directory.
<3> this placement directive defines that a node `screenshots`
    shall be prepended at the start of the list. Because such a node
    doesn't yet exist, a new node `root/project/screenshots` is
    created as a side-effect.
<4> this directive places an entry after another entry, which is
    assumed to exist when this directive gets applied finally.
    All placement directives get applied in order of definition,
    just before the output for a given node is generated.
    Note also the constructor syntax +Node(\'screenshots')+: here
    the constructor just acts as a general factory function; either
    it creates a new node, or it fetches an existing node with matching
    node path from the internal +NodeIndex+
<5> here we create a submenu entry in the project menu, featuring
    an external link. The ID of that menu node will be derived from
    the name in the url (here `roadmap`) -- it can be defined explicitly
    if necessary (+id=...+)


supported placement directives
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^


[options="header", width="70%",cols="^m,<m,s<",frame="topbot",grid="all"]
`------------------------`------------------------------------`---------------------------------------------------
  internal DSL            Asciidoc source
  Node(<id>)              -- discover `id.txt` --              create new node or retrieve existing node
  linkChild(id)                                                basic function for attaching child node
  linkParent(id)                                               basic function to attach below parent
  putChildLast(id)        [attach] child <id>                  move child to current end of list
  appendChild(id)         [append] child <id>
  putChildFirst(id)                                            move child to current list start
  prependChild(id)        prepend [child] <id>
  putChildAfter(id,ref)   [attach|put] child <id> after <ref>  move child after the given ref entry
  link(url[,id][,label])  [child <id>] link ::<url>[<label>]    attach an entry, holding an external link
  Node(<id>,label=<lbl>)  label|title <lbl>                    define the visible text in the menu entry
  sortChildren()          sort [children]                      sort all children currently in list
  enable(False)           off|disable|deactivate               make node passive; any children/parents added later are ignored
  enable([True])          on|active|activate                   make node active again (this is the default)
  detach()                detach                               cut away any parents and children, disable the node
  discover(srcdirs=...)   include dir <token>[,<token>]        instead of current dir, retrieve children from other dirs (relative)
  discover(includes=...)  include <token>[,<token>]            explicitly use the listed elements as children
  discover(excludes=...)  exclude <token>[,<token>]            after discovering, filter names matching the <token> (without extension)
--------------------------------------------------------------------------------------------------------------------


commandline options
^^^^^^^^^^^^^^^^^^^
The behaviour of the +menugen+ script can be influenced by some options:

predefined:: using the built-in predefined nodes
scan::    discover nodes
debug::   dump data structure after discovery
text::    generate plaintext version of the menu
webpage:: actually generate HTML / JavaScript

a positional parameter denotes the start directory for discovery (default is current).
This directory is assumed also to be the web root; any URLs are generated relative


Design and Implementation notes
-------------------------------
The initial observation was that actually we're parsing and processing some kind of
_Domain Specific Language_ here. Thus the general advice for such undertakings does
apply: we should try to handle the actual language just as a thin layer on top of
some kind of _semantic model_. In our case, this model is the menu tree to be generated,
while the actual ``syntax tree'' is the real filesytem, holding Asciidoc files with
embedded comments. Thus, the semantic model was developed first, and separate of the
syntax of the specifications; it was tested to generate suitable HTML and CSS.

The syntactic elements where then added as a collection of parser or matcher objects,
each with the ability to recognise and implement one kind of placement specification.
Each such +Placement+ subclass exposes an +acceptVerb()+ function for handling invocations
of the internal DSL functions, and an +acceptDSL()+ function to parse and accept a
`//Menu:` line from some Asciidoc source file. This approach makes adding further
configuration options simple.

Another interesting question is to what extent the actual path handling and file discovery
logic should be configurable. My reasoning is, that any attempts towards larger flexibility
are mostly moot, because we can't overcome the fact that this *is* logic to be cast into
program code. Extension points or strategy objects will just have the effect to tear apart
the actual code thus will make the code harder to read. Thus I confined myself just to
configure the index file name and file extensions.


Known issues
~~~~~~~~~~~~

* for sake of simplicity, there is _one_ generated container HTML element
  per menu entry. In case this entry is a submenu, the `<ul>`-element is
  used, _not_ the preceding headline `<li>` -- this is due to the fact
  that this submenu entry is going to be collapsed eventually, but has
  the side-effect of highlighting _only_ that submenu block, _not_ the
  preceding headline.
* the acceptable DSL syntax needs to be documented manually; there is
  no way to generate this information. Doing so would require to add
  specific information methods into Placement subclasses, and it would
  result in duplicated information between the regular expressions
  and the informations returned by such information methods.
  This was deemed dangerous.
* the +\_\_repr\_\_+ of the Placement subclasses is not an _representation_
  but rather a +\_\_str\_\_+ -- but unfortunately the debugger in PyDev
  invokes +\_\_repr\_\_+
* the startdir for automatic discovery is an global variable
* when through the use of redirection, the same file is encountered
  multiple times during discovery, it is treated repeatedly, each times
  associated with another node, because, on discovery, the node-ID is
  generated as +parentPath/fileID+, to avoid mixing up similarly named
  files in different directories. (The NodeIndex allows to retrieve
  a node just by its bare ID, without path anyway)
* no escaping: currently any variable text is written to the generated
  HTML without any sanitising or escaping. This might be a security issue,
  especially because Git pushes immediately trigger menu generation.
* the method Node.matches() is implemented sloppily: it uses just a mutual
  postfix match, while actually it should line up full path components and
  check equality on components, starting from the path end. This cheesy
  implementation can yield surprising side-effects: e.g. an not-yet attached
  node `\'end'` could match a new menu page `\'documentation/backend'`