Working on a python pipeline that transforms media annotations from the Active Archives video wiki (a simplified derivative of the sub-rip “SRT” text format) into HTML with embedded RDFa metadata. A good opportunity to use the GNU make utility, a tool designed to produce and execute flexible BASH scripts (typically to compile software) based on a of data flow paradigm of (file) dependencies.

SRT => HTML

The first transformation, a script to convert the SRT into HTML. So given some input “test.srt”, the new file “test.html” can be generated with the command:

srt2annotation.py test.srt test.html

So in makefile form, test.html is a target with test.srt as a prerequisite:

test.html : test.srt
	srt2annotation.py test.srt test.html

or schematically:

desired-output <= required-input
	command(s) to realize / produce the desired output based on input

Now when I run:

make

It responds by performing the following command, and creating test.html from the srt input.

srt2annotation.py test.srt test.html

The program performs the first rule, to produce test.html and properly runs the command to produce it using test.srt. Now, crucially, if I repeat the make command, the command reports:

make: `test.html' is up to date.

However if I make changes to the test.srt file, and then run make, the change is detected and srt2annotation.py is run again to update the HTML file. To test this, programmers typically would use the touch command which is basically designed for exactly this situation:

touch test.srt
make

HTML(+RDFA) => RDF

Now, adding in the next stage in the pipeline, rdfaextract.py reads the embedded rdfa of the HTML and produces an rdf file in xml format.

test.rdf: test.html
	rdfaextract.py test.html test.rdf
 
test.html : test.srt
	srt2annotation.py test.srt test.html

A very expressive feature of make are implicit rules. Designed to encode general procedural rules to remove the necessity for stating all transformations explicitly; for example the knowledge that (any) C source can be transformed into a corresponding object file with the cc (c compiler) command is built into make by default. Custom implicit rules can be defined to describe other kinds of transformations using a special wildcard character (%). So, the above can become generalized to:

%.rdf: %.html
	rdfaextract.py $< $@
 
%.html: %.srt
	srt2annotation.py $< $@

This states that any filename ending “.rdf” can be produced from the same name ending “.html” by running rdfaextract.py. The special variables $< and $@ are replaced by the names of the prerequisite and target respectively. Now however running make fails because no default target rule is selected (the implicit rules are not considered as candidates to fulfill). You can instead specify a target explicitly on the commandline:

make test.rdf

And it runs the following:

srt2annotation.py test.srt test.html
rdfaextract.py test.html test.rdf
rm test.html

Interestingly, it “cleans up” by removing the intermediary html file. In this case, probably not the desired behaviour but in any case an interesting possibility. Quick fix is to request both targets explicitly:

make test.html test.rdf

Which does the same as before without removing the html. In addition to the % wildcard of implicit rules, the special wildcard and patsubst functions can be used to create useful variables that automatically list all possible input files (*.srt), and (by substituting) produce a list of all possible derivative files (the corresponding .rdf files). In the following make file, the all rule is listed first to become the default behavior of the command. Once again, the html files are considered intermediary and are deleted based on the all run that only requests rdf files.

allsrt = $(wildcard *.srt)
allrdf = $(patsubst %.srt,%.rdf,$(wildcard *.srt))
 
all: $(allrdf)
 
%.rdf: %.html
	rdfaextract.py $< $@
 
%.html: %.srt
	srt2annotation.py $< $@
 
clean:
	rm -f *.html
	rm -f *.rdf

Now when I make in a directory with 5 srt files, it produces the following:

srt2annotation.py advanderhoef1.01.srt advanderhoef1.01.html
rdfaextract.py advanderhoef1.01.html advanderhoef1.01.rdf
srt2annotation.py advanderhoef1.02.srt advanderhoef1.02.html
rdfaextract.py advanderhoef1.02.html advanderhoef1.02.rdf
srt2annotation.py advanderhoef2.01.srt advanderhoef2.01.html
rdfaextract.py advanderhoef2.01.html advanderhoef2.01.rdf
srt2annotation.py advanderhoef2.02.srt advanderhoef2.02.html
rdfaextract.py advanderhoef2.02.html advanderhoef2.02.rdf
srt2annotation.py test.srt test.html
rdfaextract.py test.html test.rdf
rm advanderhoef1.01.html advanderhoef1.02.html advanderhoef2.01.html advanderhoef2.02.html test.html

Again with the smarts to clean up the html afterwards, but since I want the HTML files as well, a fix that includes them in the “all” rule:

allsrt = $(wildcard *.srt)
allhtml = $(patsubst %.srt,%.html,$(allsrt))
allrdf = $(patsubst %.srt,%.rdf,$(allsrt))
 
all: $(allhtml) $(allrdf)
 
%.rdf: %.html
	rdfaextract.py $< $@
 
%.html: %.srt
	srt2annotation.py $< $@
 
clean:
	rm -f *.html
	rm -f *.rdf

Make Manual: http://www.gnu.org/software/make/manual/make.html