I'm fairly certain that most programmers out there are familiar with Regular Expressions (herein known as regex), and for the most part, regex seems to be pretty consistent in different computer languages.

All of the following strings match the same expressions:




Now, I use search-and-replace a lot. Both in programming and writing, and I find that for some odd reason, they feel an incessant need to reinvent the wheel.

In Notepad++, most of the regex is quite nice. They don't support the {n}/{n,m} (used to denote exactly n matches or between n and m matches) or ? (used to denote 0 or 1 matches) regex tags, which makes our string above unsearchable. However the regex is still pretty standard and despite the fact that it's missing a few things, it's pretty easy to nail the first try. (Something like [ac]+b*[0-5]+ would be valid, for example. Note, however, that there are no surrounding /^$ characters.)

In Visual Studio, their regex is still pretty nice, but they don't support the simple symbols you might expect in Notepad++ or computer languages. For example, there are no +*? characters. Instead, @ matches 0+ occurrences, # matches 1+, and you can search for an exact number of matches, using ^n where n is a number. You might think that this could be used as a substitute for ?, like so: ([a]^0) | ([a]^1), however Visual Studio does not allow for use of ^0. So, our string above is again invalid, and you really have to look up VS's search-and-replace to use it at all.

The last place I use regex is for my writing, which I do in MicroSoft Word. They call their system the "wildcard system", which in my experience means that * matches any number of any character, and ? means one match to any character. Instead, their "wildcard system" is actually a stupid form of regex: Again, they don't support + but rather @ as the one-or-more delimiter. * is the standard zero-or-more delimiter, which is good, and they do support the {n,m} (matching n-m occurrences) tag. Luckily they have a drop-down list of all the insertable special characters--but here's where it gets stupid...

Their "Manual line break" search-and-replace tag doesn't work. Seriously. They use the shortcut ^l (which bugs me for two reasons: one, that looks like a capital I or a | (the pipe; that thing by your return key). The second problem is its lack of conforming to the regex standard: \\n. Did it never occur to anyone that they could use \\n, \\t, and so on? But instead, a tab is ^t and a newline is really ^13 (the ASCII for a CRLF).

So let me ask you this: as programmers, why don't they just link their search-and-replace criteria directly to a regex library?

Next Post Previous Post