The section on input buffering was pretty short. Basically, buffering input saves you time and is more convenient than jumping around an input stream a lot. Nothing groundbreaking.
The chapter on regexes probably would be groundbreaking if you weren’t, you know, an actually programmer. There’s a ton of problems, I’m going to do a few here until I have to go.
3.3.2 Describe the languages denoted by the following regular expressions: (hard problem)
- a(a|b)*a
– the language of all strings of as and bs that both start and end with a - ((e|a)b*)*
– equivalent to ((a?)b*)*, which is just (a?b*)*. a?b* is the set of all strings of bs that start with zero or one as. The regex is zero or more of those. I’m pretty sure this is equivalent to all strings of as and bs. - (aa|bb)*((ab|ba)(aa|bb)*(ab|ba)(aa|bb)*)*
– I don’t have a good way to explain in english what language this matches, but it’s the kind of regex people get punched over.
3.3.3. In a string of length n, how many of the following are there?
- Prefixes
– n+1 (empty string counts) - Suffixes
– n + 1 (empty string counts) - Proper prefixes
– n – 1, except the empty string has exactly 0 proper prefixes - Substrings (hard problem)
– I think this is the answer: 2 + (SUM (i = 0..n) (n – i) – 2). the (ni-1) – 2 is because you want to exclude empty strings and the full substring from being recounted, and the 2+ at the beginning is to count those once. I don’t have a closed form handy. - Subsequences (hard problem)
I’m actually not going to try this one, just mention that (a) it’s irrelevant and (b) it’s a math problem.
I just typed up the answers to most of the rest of this section and wordpress ate the post. So those are gone 😦
Next up: Recognition of tokens!