Idiopidae: code is code; prose is prose.
Idiopidae is my attempt at finally releasing something that
makes it easier for technical documentation authors to write.
The purpose of Idiopidae is to keep the code in the code,
and the prose in the prose, and then merge the two together
based on very light comments in the source.
You can see the original HTML of this file as well as the
final output to compare the two
Concepts
Idiopidae works on the idea that in your “prose” file you’ll
put include statements as comments, and in your “code” file
you’ll put export statements to mark off regions of code that
need to be named.
When you run the idio Python script on your prose file, it
follows the include statements and loads the file and section
you specify into an output result. It will also format it with
the Pygments library
to produce nice typsetting (currently defaults to HTML).
This file you’re reading right now is simply a Textile
prose file that includes and describes Idiopidae’s source.
The process for creating it was:
> cd doc
> webgen
> idio output/index.html > output/test.html
The source is available from a Bazaar repository at:
http://www.zedshaw.com/repository/zapps
As currently just a demo of Zapps but it
will be moved into its own project folder soon as it is good
enough to use and distribute.
The Runtime
It’s best if we start with the runtime.py file, which is responsible for
using the IdiopidaeParser to process the files. It starts off with your
typical boilerplate code but I like the with statement so I include some
future stuff:
1: from __future__ import with_statement
2:
3: # Copyright (C) Zed A. Shaw, licensed under the GPLv3
4:
5: import idiopidae
6: from pygments import highlight
7: from pygments.formatters import HtmlFormatter
8: from pygments.lexers import guess_lexer_for_filename, guess_lexer
9:
10:
Next we need to keep track of stuff:
11: class Builder:
12: """Used by IdiopidaeParser to construct the data structure of
13: a parsed document. Composer then uses this to unify a file
14: against a directory of other files to produce an output."""
15:
16: def __init__(self):
17: self.index = 1
18: self.line = 1
19: self.current = {"command": "export", "section": self.index, "lines": []}
20: self.statements = [self.current]
21: self.exports = {}
22: self.sections = []
23:
Now, there’s three methods that the parser uses heavily
during the parsing phase to chunk up a document into the
proper structure for later analysis:
24: def include(self, file, sections):
25: """ Creates a new include statement which lines are next appended to."""
26: self.next_statement({"command": "include", "file": file, "section": sections, "lines": []})
27:
28: def export(self, section):
29: """ Creates a new export statement which lines are next appended to."""
30: if not section:
31: self.index += 1
32: section = self.index
33: self.next_statement({"command": "export", "section": section, "lines": []})
34:
35: def append(self, text):
36: """ Appends a line to the current statement with line numbers."""
37: self.current["lines"].append((self.line, text))
38: self.line += 1
39:
These aren’t used by callers so much as by the IdiopidaeParser
and the Composer. These methods then use:
53: def next_statement(self, statement):
54: """Just slaps this new statement onto the list of existing
55: statements and then sets the current one for appending
56: the lines."""
57: self.append_current_export()
58: self.current = statement
59: self.statements.append(self.current)
60:
To swap into the next statement and:
61: def append_current_export(self):
62: """When a new export statement is hit, this updates the
63: internals that track sequential export statements
64: for later analysis."""
65: if self.current["command"] == "export":
66: section = self.current["section"]
67: self.exports[section] = self.current
68: self.sections.append(section)
To append each export to a list of exports found.
The process we’re describing involves the IdiopidaeParser
using the Builder under the direction of the Composer:
71: class Composer:
72: """Uses idiopidae.parse to parse the given file into a
73: builder, and then spits out the results using the self.process()
74: method."""
75:
76: def __init__(self, name):
77: """ Creates a composer for one file that needs include processing."""
78: self.name = name
79:
It is built with a simple loop in the idio file that
acts as a binary for users to run:
1: #!/usr/bin/env python
2:
3: import runtime
4: import sys
5:
6: for file in sys.argv[1:]:
7: c = runtime.Composer(file)
8: print c.process()
9:
First we have how a file is loaded and parsed by
the composer:
80: def load(self, name):
81: """Does the actual parsing of a file into a Builder."""
82: with open(name) as file:
83: text = file.read() + "\n\0"
84: return idiopidae.parse('Document', text)
85:
which is actually used by the process method:
86: def process(self):
87: """Performs a full processing of the file returning a string
88: with all the @include sections replaced."""
89: self.builder = self.load(self.name)
90: results = []
91: for st in self.builder.statements:
92: if st["command"] == "export":
93: results.append(self.format(st["lines"]))
94: elif st["command"] == "include":
95: lines = self.include(st["file"], st["section"])
96: try:
97: lexer = guess_lexer_for_filename(st["file"], lines[0][1])
98: except:
99: lexer = None
100: results.append(self.format(lines, lexer, numbered=True))
101:
102: return "\n".join(results)
103:
This is the most complex method since it is where all
the real work is being done. It loads the file we
want to compose, and goes through all the sections.
Any section that’s an export is just printed out, but
any section that’s an import is processed as another
call to include and format to get the text:
104: def format(self, lines, lexer = None, numbered=False):
105: """Given a set of (#,"") line tuples it will return a
106: string with line numbers or not."""
107: if numbered:
108: text = "\n".join(["%5d: %s" % l for l in lines])
109: else:
110: text = "\n".join([l[1] for l in lines])
111:
112: if lexer:
113: return highlight(text, lexer, HtmlFormatter())
114: else:
115: return text
116:
The include method is actually very simple:
117: def include(self, file, section):
118: try:
119: return self.load(file).lines_for(section)
120: except KeyError:
121: print "ERROR: Key '%s' not in file '%s'" % (section, file)
122: print "SECTIONS: %s" % self.builder.exports
123: return [(1,'### @include "%s" "%s"' % (file,section))]
124:
And that’s all of idiopidae except the parser, which we’ll
go over next.
The Parser
The parser is the key to how Idiopidae works and it uses
the Zapps that I adopted recently. It
shows you can easily crank out little parsers for little
languages that are fast enough for real work.
Since most people don’t get parsers, you could do good to
use bzr to grab the code and study
how this file is translated into the idiopidae.py file.
Every parser generator has three main components: code stuff,
tokens, and grammar rules. For Idiopidae there’s not much
code stuff than the import of the runtime:
1:
2: # Copyright (C) Zed A. Shaw, licensed under the GPLv3
3:
4: import runtime
5:
6: %%
7:
Then we just start off the parser declaration, which will
be turned into a class named idiopidae.IdiopidaeParser
that you can run:
8: parser IdiopidaeParser:
9:
Now, we need to have a bunch of tokens which we want to
either discard as just visual aids for the user, or keep
as input data:
10: token WS: "[ \t]+"
11: token NUMBER: "[0-9]+"
12: token STRING: '\'([^\\n\'\\\\]|\\\\.)*\'|"([^\\n"\\\\]|\\\\.)*"'
13: token EOD: "\\0"
14: token EOL: "(\\n|\\r\\n)"
15: token END: "end"
16: token ID: "[a-zA-Z][a-zA-Z\-_0-9]+"
17: token INCLUDE: "include"
18: token EXPORT: "export"
19: token STARTER: "[ \t]*(###|//|\\*)+ @"
20: token NOT_STARTER: "([^#]|[^//]|[^\\*])"
21: token JUNK: "[^\\n]*"
22:
You can’t tell from the above list what it is dropped and what
is kept, for that you have to look in the grammar. The trick
is we define all the base “words” or tokens and then we use the
grammar to sift through them to pull out what is considered Junk
or a Statement:
23: rule Section:
24: ID {{ return ID }}
25: | NUMBER {{ return atoi(NUMBER) }}
26: | STRING {{ return STRING[1:-1] }}
27: rule File: STRING {{ return STRING[1:-1] }}
28: rule Include:
29: INCLUDE WS File WS Section {{ self.doc.include(File, Section) }}
30: rule Export: EXPORT WS Section {{ self.doc.export(Section) }}
31: rule Command: Include | Export | END {{ self.doc.export(None) }}
32: rule Statement: STARTER Command (WS)* EOL
33: rule Junk: (
34: NOT_STARTER JUNK EOL {{ self.doc.append(NOT_STARTER + JUNK) }}
35: | EOL {{ self.doc.append('') }}
36: )
37: rule Line: Statement | Junk
38: rule Document:
39: {{ self.doc = runtime.Builder() }}
40: (Line)*
41: EOD {{ self.doc.append_current_export(); return self.doc }}
42:
43:
More on reading this later.