Idiopidae: code is code; prose is prose.
Idiopidae is my attempt at finally releasing something that
makes it easier for technical documentation authors to write.
The purpose of Idiopidae is to keep the code in the code,
and the prose in the prose, and then merge the two together
based on very light comments in the source.
You can see the original HTML of this file as well as the
final output to compare the two. You can also look
at a version that’s done before wiki rendering
which shows how Idiopidae automagically figures out that the
sample.page file is a text file and properly formats
for that output.
Concepts
Idiopidae works on the idea that in your “prose” file you’ll
put include statements as comments, and in your “code” file
you’ll put export statements to mark off regions of code that
need to be named.
When you run the idio Python script on your prose file, it
follows the include statements and loads the file and section
you specify into an output result. It will also format it with
the Pygments library
to produce nice typsetting (currently defaults to HTML).
This file you’re reading right now is simply a Textile
prose file that includes and describes Idiopidae’s source.
The process for creating it was:
> cd doc
> webgen
> idio output/index.html > output/test.html
The source is available from a Bazaar repository at:
http://www.zedshaw.com/repository/zapps
As currently just a demo of Zapps but it
will be moved into its own project folder soon as it is good
enough to use and distribute.
The Runtime
It’s best if we start with the runtime.py file, which is responsible for
using the IdiopidaeParser to process the files. It starts off with your
typical boilerplate code but I like the with statement so I include some
future stuff:
1: from __future__ import with_statement
2:
3: # Copyright (C) Zed A. Shaw, licensed under the GPLv3
4:
5: import idiopidae
6: from pygments import highlight
7: from pygments.formatters import get_formatter_for_filename, get_formatter_by_name
8: from pygments.lexers import guess_lexer_for_filename, get_lexer_by_name
9:
10:
Next we need to keep track of stuff:
11: class Builder(object):
12: """Used by IdiopidaeParser to construct the data structure of
13: a parsed document. Composer then uses this to unify a file
14: against a directory of other files to produce an output."""
15:
16:
17: def __init__(self):
18: self.index = 0
19: self.line = 1
20: self.current = {
21: "command": "export", "section": self.next_anonymous(),
22: "language": None,
23: "lines": []}
24: self.statements = [self.current]
25: self.exports = {}
26: self.sections = []
27:
28:
Now, there’s three methods that the parser uses heavily
during the parsing phase to chunk up a document into the
proper structure for later analysis:
29: def include(self, file, section, format):
30: """ Creates a new include statement which lines are next appended to."""
31: self.next_statement({
32: "command": "include",
33: "file": file,
34: "section": section,
35: "format": format,
36: "language": None,
37: "lines": []
38: })
39:
40:
41: def export(self, section, language):
42: """ Creates a new export statement which lines are next appended to."""
43: if not section: section = self.next_anonymous()
44:
45: self.next_statement({
46: "command": "export",
47: "section": section,
48: "language": language,
49: "lines": []})
50:
51:
52: def end(self):
53: """Just a method that ends a section to start the
54: next anonymous one."""
55: self.export(None, None)
56:
57:
58: def append(self, text):
59: """ Appends a line to the current statement with line numbers."""
60: self.current["lines"].append((self.line, text))
61: self.line += 1
62:
63:
These aren’t used by callers so much as by the IdiopidaeParser
and the Composer. These methods then use:
79: def next_statement(self, statement):
80: """Just slaps this new statement onto the list of existing
81: statements and then sets the current one for appending
82: the lines."""
83: self.append_current_export()
84: self.current = statement
85: self.statements.append(self.current)
86:
87: def next_anonymous(self):
88: """Increments the anonymous section counter for tracking
89: sections without names."""
90: self.index += 1
91: return str(self.index)
92:
To swap into the next statement and:
93: def append_current_export(self):
94: """When a new export statement is hit, this updates the
95: internals that track sequential export statements
96: for later analysis."""
97: if self.current["command"] == "export":
98: section = self.current["section"]
99: self.exports[section] = self.current
100: self.sections.append(section)
To append each export to a list of exports found.
The process we’re describing involves the IdiopidaeParser
using the Builder under the direction of the Composer:
103: class Composer(object):
104: """Uses idiopidae.parse to parse the given file into a
105: builder, and then spits out the results using the self.process()
106: method."""
107:
108: def __init__(self):
109: self.includes = {}
110: self.loads = {}
111:
112:
It is built with a simple loop in the idio file that
acts as a binary for users to run:
1: #!/usr/bin/env python
2:
3: import runtime
4: import sys
5:
6: c = runtime.Composer()
7:
8: for file in sys.argv[1:]:
9: print c.process(file)
10:
First we have how a file is loaded and parsed by
the composer:
113: def load(self, name):
114: """Does the actual parsing of a file into a Builder and caches the results
115: into self.loads for faster calls later."""
116: if not self.loads.has_key(name):
117: with open(name) as file:
118: text = file.read() + "\n\0"
119: self.loads[name] = idiopidae.parse('Document', text)
120: return self.loads[name]
121:
122:
which is actually used by the process method:
123: def process(self, name):
124: """Performs a full processing of the file returning a string
125: with all the @include sections replaced."""
126: self.builder = self.load(name)
127: results = []
128: for st in self.builder.statements:
129: if st["command"] == "export":
130: self.append_export(results, st)
131: elif st["command"] == "include":
132: self.append_include(results, name, st)
133: return "\n".join(results)
134:
135: def append_include(self, results, name, st):
136: key = "%s/%s/%s" % (name, st["file"], st["section"])
137:
138: if self.includes.has_key(key):
139: # look it up in the cache instead of processing it again
140: text = self.includes[key]
141: else:
142: lines, firsts = self.include(st["file"], st["section"])
143: lexer = self.resolve_lexer(st, firsts)
144: format = self.resolve_format(name, st)
145: text = self.format(lines, lexer, format, numbered=True)
146: self.includes[key] = text
147:
148: results.append(text)
149:
150: def append_export(self, results, st):
151: results.append(self.format(st["lines"]))
152:
153:
154: def resolve_lexer(self, st, firsts):
155: """Responsible for resolving the lexer that should be used on the
156: section of code. It will use the one specified in the export, and
157: then try to guess based on the file name/extension and the first line
158: of the text file."""
159: file, lang = st["file"], st["language"]
160:
161: if lang:
162: return get_lexer_by_name(lang)
163: try:
164: return guess_lexer_for_filename(file, firsts)
165: except:
166: return get_lexer_by_name("text")
167:
168:
169: def resolve_format(self, file, st):
170: """Resolves formats that are specified based on either the
171: file name/extension or an explicitly given format."""
172: # TODO: let them specify options too, probably from some yaml
173: if st["format"]:
174: return get_formatter_by_name(st["format"])
175: else:
176: try:
177: return get_formatter_for_filename(file)
178: except:
179: return get_formatter_by_name("text")
180:
181:
This is the most complex method since it is where all
the real work is being done. It loads the file we
want to compose, and goes through all the sections.
Any section that’s an export is just printed out, but
any section that’s an import is processed as another
call to include and format to get the text:
182: def format(self, lines, lexer = None, format = None, numbered=False):
183: """Given a set of (#,"") line tuples it will return a
184: string with line numbers or not."""
185: # TODO: need to figure out if the format has line numbers and do that instead
186: if numbered:
187: text = "\n".join(["%5d: %s" % l for l in lines])
188: else:
189: text = "\n".join([l[1] for l in lines])
190:
191: if format and lexer:
192: return highlight(text, lexer, format)
193: else:
194: return text
195:
196:
The include method is actually very simple:
197: def include(self, file, section):
198: """Loads the requested section and returns those lines and the first
199: few lines of the whole file for guessing the format. Also does some
200: caching of the requested sections, firsts, and loaded files."""
201:
202: try:
203: target = self.load(file)
204: if not target:
205: print "!!!! ERROR: Failed to parse file %s (see above for error)" % file
206: raise RuntimeError("ERROR: Failed to parse %s (see output)" % file)
207: else:
208: lines = target.lines_for(section)
209: firsts = self.format(target.lines_for(target.sections[0]), numbered=False)
210:
211: return lines, firsts
212: except KeyError:
213: raise KeyError("ERROR: Key '%s' not exported or included in file '%s'" % (section, file))
214:
215:
And that’s all of idiopidae except the parser, which we’ll
go over next.
The Parser
The parser is the key to how Idiopidae works and it uses
the Zapps that I adopted recently. It
shows you can easily crank out little parsers for little
languages that are fast enough for real work.
Since most people don’t get parsers, you could do good to
use bzr to grab the code and study
how this file is translated into the idiopidae.py file.
Every parser generator has three main components: code stuff,
tokens, and grammar rules. For Idiopidae there’s not much
code stuff than the import of the runtime:
1:
2: # Copyright (C) Zed A. Shaw, licensed under the GPLv3
3:
4: import runtime
5:
6: %%
7:
Then we just start off the parser declaration, which will
be turned into a class named idiopidae.IdiopidaeParser
that you can run:
8: parser IdiopidaeParser:
9:
Now, we need to have a bunch of tokens which we want to
either discard as just visual aids for the user, or keep
as input data:
10: token WS: "[ \t]+"
11: token NUMBER: "[0-9]+[0-9\.]*"
12: token STRING: '\'([^\\n\'\\\\]|\\\\.)*\'|"([^\\n"\\\\]|\\\\.)*"'
13: token EOD: "\\0"
14: token EOL: "(\\n|\\r\\n)"
15: token END: "end"
16: token ID: "[a-zA-Z][a-zA-Z\-_0-9]+"
17: token INCLUDE: "include"
18: token EXPORT: "export"
19: token STARTER: "[ \t]*(###|//|\\*)+ @"
20: token NOT_STARTER: "([^#]|[^//]|[^\\*])"
21: token JUNK: "[^\\n]*"
22:
You can’t tell from the above list what it is dropped and what
is kept, for that you have to look in the grammar. The trick
is we define all the base “words” or tokens and then we use the
grammar to sift through them to pull out what is considered Junk
or a Statement:
23: rule Section:
24: ID {{ return ID }}
25: | NUMBER {{ return NUMBER }}
26: | STRING {{ return STRING[1:-1] }}
27: rule Language: WS ID {{ return ID }}
28: rule Format: WS ID {{ return ID }}
29: rule File: STRING {{ return STRING[1:-1] }}
30: rule Include:
31: INCLUDE WS File WS Section Format? {{ self.doc.include(File, Section, Format) }}
32: rule Export: EXPORT WS Section Language? {{ self.doc.export(Section, Language) }}
33: rule Command: Include
34: | Export
35: | END {{ self.doc.end() }}
36: rule Statement: STARTER Command (WS)* EOL
37: rule Junk: (
38: NOT_STARTER JUNK EOL {{ self.doc.append(NOT_STARTER + JUNK) }}
39: | EOL {{ self.doc.append('') }}
40: )
41: rule Line: Statement | Junk
42: rule Document:
43: {{ self.doc = runtime.Builder() }}
44: (Line)*
45: EOD {{ self.doc.append_current_export(); return self.doc }}
46:
47:
More on reading this later.