Working on a python pipeline that transforms media annotations from the Active Archives video wiki (a simplified derivative of the sub-rip “SRT” text format) into HTML with embedded RDFa metadata. A good opportunity to use the GNU make utility, a tool designed to produce and execute flexible BASH scripts (typically to compile software) based on a of data flow paradigm of (file) dependencies.
SRT => HTML
The first transformation, a script to convert the SRT into HTML. So given some input “test.srt”, the new file “test.html” can be generated with the command:
srt2annotation.py test.srt test.html
So in makefile form, test.html is a target with test.srt as a prerequisite:
test.html : test.srt srt2annotation.py test.srt test.html |
or schematically:
desired-output <= required-input command(s) to realize / produce the desired output based on input |
Now when I run:
make |
It responds by performing the following command, and creating test.html from the srt input.
srt2annotation.py test.srt test.html |
The program performs the first rule, to produce test.html and properly runs the command to produce it using test.srt. Now, crucially, if I repeat the make command, the command reports:
make: `test.html' is up to date. |
However if I make changes to the test.srt file, and then run make, the change is detected and srt2annotation.py is run again to update the HTML file. To test this, programmers typically would use the touch command which is basically designed for exactly this situation:
touch test.srt make |
HTML(+RDFA) => RDF
Now, adding in the next stage in the pipeline, rdfaextract.py reads the embedded rdfa of the HTML and produces an rdf file in xml format.
test.rdf: test.html rdfaextract.py test.html test.rdf test.html : test.srt srt2annotation.py test.srt test.html |
A very expressive feature of make are implicit rules. Designed to encode general procedural rules to remove the necessity for stating all transformations explicitly; for example the knowledge that (any) C source can be transformed into a corresponding object file with the cc (c compiler) command is built into make by default. Custom implicit rules can be defined to describe other kinds of transformations using a special wildcard character (%). So, the above can become generalized to:
%.rdf: %.html rdfaextract.py $< $@ %.html: %.srt srt2annotation.py $< $@ |
This states that any filename ending “.rdf” can be produced from the same name ending “.html” by running rdfaextract.py. The special variables $< and $@ are replaced by the names of the prerequisite and target respectively. Now however running make fails because no default target rule is selected (the implicit rules are not considered as candidates to fulfill). You can instead specify a target explicitly on the commandline:
make test.rdf |
And it runs the following:
srt2annotation.py test.srt test.html
rdfaextract.py test.html test.rdf
rm test.html |
Interestingly, it “cleans up” by removing the intermediary html file. In this case, probably not the desired behaviour but in any case an interesting possibility. Quick fix is to request both targets explicitly:
make test.html test.rdf |
Which does the same as before without removing the html. In addition to the % wildcard of implicit rules, the special wildcard and patsubst functions can be used to create useful variables that automatically list all possible input files (*.srt), and (by substituting) produce a list of all possible derivative files (the corresponding .rdf files). In the following make file, the all rule is listed first to become the default behavior of the command. Once again, the html files are considered intermediary and are deleted based on the all run that only requests rdf files.
allsrt = $(wildcard *.srt) allrdf = $(patsubst %.srt,%.rdf,$(wildcard *.srt)) all: $(allrdf) %.rdf: %.html rdfaextract.py $< $@ %.html: %.srt srt2annotation.py $< $@ clean: rm -f *.html rm -f *.rdf |
Now when I make in a directory with 5 srt files, it produces the following:
srt2annotation.py advanderhoef1.01.srt advanderhoef1.01.html
rdfaextract.py advanderhoef1.01.html advanderhoef1.01.rdf
srt2annotation.py advanderhoef1.02.srt advanderhoef1.02.html
rdfaextract.py advanderhoef1.02.html advanderhoef1.02.rdf
srt2annotation.py advanderhoef2.01.srt advanderhoef2.01.html
rdfaextract.py advanderhoef2.01.html advanderhoef2.01.rdf
srt2annotation.py advanderhoef2.02.srt advanderhoef2.02.html
rdfaextract.py advanderhoef2.02.html advanderhoef2.02.rdf
srt2annotation.py test.srt test.html
rdfaextract.py test.html test.rdf
rm advanderhoef1.01.html advanderhoef1.02.html advanderhoef2.01.html advanderhoef2.02.html test.html |
Again with the smarts to clean up the html afterwards, but since I want the HTML files as well, a fix that includes them in the “all” rule:
allsrt = $(wildcard *.srt) allhtml = $(patsubst %.srt,%.html,$(allsrt)) allrdf = $(patsubst %.srt,%.rdf,$(allsrt)) all: $(allhtml) $(allrdf) %.rdf: %.html rdfaextract.py $< $@ %.html: %.srt srt2annotation.py $< $@ clean: rm -f *.html rm -f *.rdf |
Make Manual: http://www.gnu.org/software/make/manual/make.html
Comments are closed.