Title graphic of the Moonspeaker website. Small title graphic of the Moonspeaker website.

 
 
 
Where some ideas are stranger than others...

When women offer our experiences as truth, as human truth, all maps change.
- Ursula K. LeGuin

Webmaster was in on:
2024-03-23

The Moonspeaker:
Where Some Ideas Are Stranger Than Others...

Including Surprises, Part Two (2023-12-11)

Snapshot of the ur-code used to test C and C-variant compilers where the coder's first language is english. Snapshot of the ur-code used to test C and C-variant compilers where the coder's first language is english.
Snapshot of the ur-code used to test C and C-variant compilers where the coder's first language is english.

The saga of working out a portable means to provide similar file processing capabilities for html files as provided by bbedit continues. *nix systems generally have a wide range of built in utilities that may be run from the command line in a terminal window, so these are the primary candidates to take over the job. I mentioned a couple of these already, the C preprocessor, and make, besides the scripting language perl. Of course, that is hardly scratching the surface of what is available, as anyone who has watched an application installer running in a terminal window has seen. The title bar of the terminal emulator window flashes with the names of the different utilities running at different parts of the installation process. There are a fascinating number of unusual looking names that are actually banal in origin, like "clang" which is just short for "c language family [compiler] front end." The standard selection available from the shell is easy to look up, for instance the commands invoking the utilities in the macosx version of bash are neatly summarized at ss64.com. Some of what I have found myself using is available from outside of the shell, including my cross-platform scripting favourite perl. Still others are not necessarily available by default on a macosx system unless the xcode command line tools are installed, but on other *nix systems they are usually all present, for example make, the C preprocessor, gcc, and so on.

So far the way things are coming together I have found myself working up a combination of perl, awk, sed, m4, and make with the standard file management tools available from bash. Inevitably early versions are less efficient when working on scripts of this kind, although in this case my strong suspicion is that due to not being very experienced yet with several of these utilities, I may have an approach more reminiscent of a Heath Robinson or Rube Goldberg machine than anything else. Nevertheless, the prototyping is instructive and working well enough to show that things are on the right track. The most difficult element so far has turned out to be automating collection of the names of the files to be processed and generating the names for the new files. For more common computer programming and scripting, this is not so difficult. The names of the main program or script files are often small in number, and since the programmer starts work aware that in due time they will use a makefile, they will often have a template ready to fill out. Certainly if I had started out with using make as a preprocessor helper in mind, then I could have updated the file lists over time by hand. That sounds like a way to miss things though, and I likely would have lost patience and worked out a way to automate the task. In other words, at the moment I am having fun and learning in terms of code options and approaches, and will ultimately settle down and do what most people do in this situation: write some perl script or adapt some free/libre perl scripts that are already available.

One of the trickier elements of the task at hand is that I am working between a couple of systems, and one of them has older versions of tools that cannot do things like edit a file "in place" without generating a back up file of the original. I should add, a visible back up (not a dot file). This is counter to the usual approach to updating html files, where all the edits happen in the pre-production version, and once finalized that is pushed out to production. I have back ups covered by other means. Generating back ups is of course not a bad thing at all, and for a multi-step process having a handful of snapshots is a great setup if something goes direly wrong, or simply for debugging. The end result though is that on the older system I need to write some code to clean up after the preprocessing to remove those extra files. The newer system will generate some of the same files, but not all of them, so that difference needs to be accounted for. People who work on installers know all about this sort of challenge, and deal with real versions of it compared to this html example. Overall though, this has been the least of the tricky aspects in the coding process. Finding decent documentation and properly working out the capabilities of each of the utilities is the most challenging part, on top of managing such quirks as comments and character escaping differing between them. I am still not getting into the horrors of quoting and double quoting, which I agree with wiser heads than mine should not be nested if it can be avoided, because it renders the code difficult to read and often unpredictable of behaviour.

The new to me specific tools I have been learning and working with are: m4, a macro processor; make, a program recompilation manager; sed, a text stream editor; and awk, a text processing language. These all have manuals and other introductory materials written and provided as part of the GNU Project, and except for the first two I know they have associated books published by familiar firms like O'Reilly and peachpit press. The GNU materials are a wonderful resource, but suffer from a difficult problem common to programming and scripting books, manuals, tutorials, and web pages. They regularly start with material so basic as to often not be worth actually running yourself, then skip completely over intermediate examples to rapidly escalate into undeniably relevant while also impossible to understand as yet examples and techniques. I am a bit surprised how often the writers forget to note briefly character escaping and commenting practices, because these are not standardized. They often seem standardized because so many utilities and languages have adopted C-style comments and PCRE-style character escaping (which may be C-style, actually). A few have not, maybe because of their archaic origins, although I suspect more often because of the processing level they work at and the data they have to pass to other tools. Writing up teaching materials for scripting and programming is a real challenge, so much so that Kernighan and Ritchie's The C Programming Language is still highly revered as a classic of exposition. Unfortunately a great many people seem to have thrown up their hands and left learners to struggle along with a search engine and close encounters with fora like stackoverflow or askubuntu. It is no easy task to answer queries in a text-only format in the first place, and inevitably there are times when a question triggers a flurry of argument about programming philosophy (seriously, I am not making an arch reference to flame wars). Interesting, but not necessarily an answer to the question in and of itself.

All that said, I also appreciate by personal experience with other computer programming and scripting languages that sometimes getting anything done is sheer torture until the shape of the language becomes clearer. For instance, regular expressions are utterly maddening for use on serious tasks until the learner has figured out variable substitution properly. It isn't difficult to use them once the syntax is worked out in the sense of representing it properly in typed code. This is actually where the drop begins between beginner and rapidly escalating towards expert coverage in many manuals. At times the authors have so deeply internalized how a thing like variable definition and use for substitution works in older text processing languages like sed and awk, they forget to provide a couple of examples to illustrate how it is done. Older languages often do things a bit differently than their descendants and cousins, and may also have a non-obvious necessary tweak or two for command line versus script usage. Looping is another tricky and highly instructive area that can be quite difficult to work out adequately in the early stages of learning a language. Once internalized, these are the very things that give a feeling for the language so that it is easier to compose in it. Perhaps official computer science majors develop an ability to go straight from reading a description to using the language without any examples to start from, although I find that hard to believe.

Copyright © C. Osborne 2024
Last Modified: Saturday, March 23, 2024 13:15:01