Friday, December 21, 2007

Unix makes Computer Science easy

As I've learned more Computer Science over the years, I've grown to realize that Unix packs a whole lot of CS power into a bunch of seemingly small commands with simple interfaces.

Consider the diff utility. It figures out the smallest number of changes required to transform one file into another. It uses some pretty heavy computer science, but the end user just tells it which files to compare and gets back the differences.

Or how about bc and dc? Arbitrary precision arithmetic was a bit of a trick in those days, but with Unix, you just run "bc" and type in your expressions, including exponential or trigonometric functions, and out pop your answers.

External sorting takes some skill and work to pull off. To sort a file that's ten times the size of available RAM, you need to sort lots of little chunks of the file and merge them together over multiple passes. Unix's sort utility just does it for you, creating temporary files as needed and hiding all of the complexity from you. It even does variable length records without batting an eye.

Once you've sorted your text file, and want to find something in it quickly, Unix lets you do a binary search in a file with variable length records with just a single command. look. And you don't even have to know what log(n) means.

Shell scripting provides something similar to functional programming, with files treated as lists of lines. The map, reduce, and filter operations from functional programming are made available in things like sed, wc, and grep.

Awk can do all three. Patterns without actions are a filter. Actions that do output perform mappings. Actions that accumulate information to be spat out by an END block do reductions.

For programmers, things get even better. Topological sorting is where you take a bunch of things, and a bunch of rules saying that one thing must come before another thing, and come up with an ordering of the things that satisfies all of the rules. It's kind of useful when you need to figure out what order in which to run all of the commands that build your program. make handles it all in a way that completely hides the complexity of the topological sort from the programmer.

Compilers, and parsing in particular, used to be deep magic, understood fully by very few people. lex and yacc made it easy for anyone to write a few regular expressions and a grammar and get out a parser (especially if they owned a copy of The Unix Programming Environment), without any need to worry about the ins and outs of finite state machines or push down automata or a whole bunch of stuff in the dragon book.

And all of this was available in Unix Seventh Edition. In 1979. On a computer that only gave each program 64k of memory in which to run.

[edit: Changed to reflect the fact that I was talking more about parsing than compiling in general. From what I have been able to gather from old books on compilers, parsing used to dominate the compiler writing effort much more than it does nowadays, but the distinction is still a valid one. Thanks to simen and Blackheart for pointing this out in their comments on reddit.]

16 comments:

asahel said...

"And all of this was available in Unix Seventh Edition. In 1979. On a computer that only gave each program 64k of memory in which to run."

Wow. So, what exactly has improved in Unix since then? All I can think of is (i) GUI, and (ii) support for all sorts of weird PC hardware, instead of tape drives and monochrome text monitors.

jsnx said...
This post has been removed by the author.
jsnx said...
This post has been removed by the author.
jsnx said...
This post has been removed by the author.
jsnx said...

In response to asahel -- I think you're missing the point. All these utilities have gotten better with time, and the support system for them -- kernels, filesystems -- have gotten better, too. Other operating systems -- the ones that are still in wide use, anyway -- haven't caught up yet. Does the taskbar tray rate as a major improvement to you?

madmetrics said...

Excellent post. I code a lot, but I had never really thought about how well *nix connects the types of basic operations you learn early on in CS to the higher-level programming most of us do on a daily basis.

apatheticagnostic said...

I wouldn't say that Unix makes comp-sci easy - but it does make it easy to do things that are founded on ideas in computer science.

Rather, the Unix environment makes implementing the expression of an idea from computer science easier.

That said, it really is amazing just how powerful a lot of the tools are that are at your fingertips in a unix environment.

tike said...

In response to jsnx -- yes, the taskbar is a major improvement. Any kind of GUI is a major improvement. You think programming/using Unix is easy? It's only easy if you already know the operating system intimately. Design and usability are JUST AS important as functionality (if not more), especially when dealing with novices.

Kamil said...

For me it makes problem solving (for computer science and engineering) enjoyable. Thanks for the post!

sjs said...

"In response to jsnx -- yes, the taskbar is a major improvement. Any kind of GUI is a major improvement."

No one has discovered how to provide the power and flexibility of the command line via GUI yet. A GUI is a great complement to the command line.

"You think programming/using Unix is easy? It's only easy if you already know the operating system intimately."

So there's a learning curve, big deal. If you work with computers you sure as hell better be willing to learn to use them. There is nothing wrong with having to learn something to be professional and competent. I'll take the more powerful system over the one that's easier to use without learning anything any day of the week.

"Design and usability are JUST AS important as functionality (if not more), especially when dealing with novices."

No offense, but who really cares about the novice that much? You learn for a short while and use forever after that Design and usability are just as important as functionality, which is exactly why the command line is still the best tool for the job in many cases!

Glomek said...

apatheticagnostic: My point exactly. Something that has been gnawing at my mind for a few years is this contradiction between my view of classic Unix as "lots of small utilities that each do one thing well" and my growing realization that in order to write all of those utilities myself, I would need to use some fairly advanced techniques. This post is, in a sense, an attempt at reconciling those two views inside my own head, and an expression of admiration for the fact that the folks at Bell Labs managed to make so much advanced stuff be so easy to use.

tag said...

Nice article. I agree -- the Unix tools show us how to do functional programming and parallel processing without a wretched mutex in sight.

Here's an article I wrote showing how to do set theory using Unix tools:

He sells shell scripts to intersect sets

vasudevram said...

Very good post. I agree. Also, the reason why "UNIX makes computer science easy" is straightforward - UNIX was developed by some of the top computer scientists at Bell Labs - and they set out to make a system that would make it easy for them to do their other work. Being experts, they applied a lot of computer science concepts to the design of UNIX. And of course, UNIX in turn became one of the best platforms for further computer science research, in ways too numerous to list here. Just one example - there are so many programming languages - both mainstream and more esoteric/research-type languages - freely available on UNIX, that one can do a lot of research using them, and the development of these languages is itself a kind of CS research - into programming languages.

Vasudev Ram
Dancing Bison Enterprises
Software consulting and training
Biz site:
http://www.dancingbison.com
Blog (on software innovation): http://jugad.livejournal.com
Quick and easy PDF creation toolkit:
http://www.dancingbison.com/products.html

pixelbeat said...

Thanks for the insightful post.
I wrote about the functional properties of unix filters here

Also I wrote about unix CLI set operations here

Maht said...

> No one has discovered how to provide the power and flexibility of the command line via GUI yet. A GUI is a great complement to the command line.

Yes they have

http://plan9.bell-labs.com/plan9/

It's what the Unix people did when they decided to start from scratch with their new knowledge.

Coursework said...
This post has been removed by a blog administrator.