Regular Expressions

Recently, one of the projects I was working on was to insert a list of U.S. states into an options list, with contact phone numbers that would be then passed to another script.

The client sent me a list as a text file, looking something like this (I’ve changed all the number to something more generic, for obvious reasons - and the list I got from the client was much longer):

Kansas 555-123-4567
Kentucky 555-123-4567
Louisiana 555-123-4567
Maine 555-123-4567
Maryland 555-123-4567

From here, I have to make some workflow decisions on how to get this into a form object with the proper syntax. What tool do I use? Do I take the time to do it manually, or should I automate it somehow?

For the tool, I chose BBedit because I chose to automate the process, although Dreamweaver can also do this to a certain extent.

When you bring up the Find dialog, you’ll notice a very innocent looking button that has a lot of power when used to its full extent.

In BBEdit its labeled “Use Grep”.

And in Dreamweaver “Use regular expression”.

Even though they pretty much do the same thing, with the exception of some minor differences in language, which I’ll get into later.

What is Grep/regular expressions?

Let’s say you want to find something on a page - like a number. You just don’t know what number, or you’d like to have every number. The best way to search for it is with an expression that represents “number”.

This looks like a d in grep language. The d simply stands for “digit”.

Try doing a search for d on a page with the Grep expression button checked. You’ll see that the results will bring back every single instance of a digit that is on the page.

Starting to see where this is going? Let’s look over more of the syntax, and how we can combine thigns to find what we’re looking for.

Basic Syntax

So we know d is a digit. What if we want more digits in a row than just one?

Easy, if you want to match one or more digits, you add “+” right after the d, looking like this: “d+”.

will give you a match to a specific number of digits (replacing ‘n’ with a real number. So, “d” will match every area code on the page, and nothing else.

There’s a lot of good information already available on the specific syntaxes you can use, so I’ll leave that up to you and google. In the meantime, let’s shift gears a bit and do some group hugs.

Matching Patterns

When building an effective find/replace routine, you can actually get pretty sophiscticated about managing the patterns and how they get replaced. You can replace entire chuncks of text, or just move a bit of text after another, which was one of my goals for the phone number list.

I knew that my list looked like this: a state name, a space, three digits, a dash, three more digits, another dash, and then four digits. By using elipsis ( ), I can break apart the search into the chunks that I need to use for replacement. That gives me something more like this:

(state name, space)(d-d-d)

I can then add it to my phone number pattern:

([^d]w+)(d-d-d)

So, great - I can now match every State/Phone combo I have. What should I do with it now?

Now, we start replacing things. You can replace patters easily with just two keystrokes for each pattern.

In BBEdit, it goes like this - replace with 12 will give me exactly what I searched for, and 21 will reverse them - phone number first, then State. 121 will give me my pattern, then add the state again at the end - see how it works?

In Dreamweaver, it’s the same, but write it with dollar signs instead: $1$2, $2$1, and $1$2$1 for the above examples.

So, cutting right to the chase scene, here’s what I used to turn my long list of states and phone numbers into a proper list of options for an html form:

Step 1. Strip the spaces so that they don’t trip up my patterns:

Find: s (to find spaces and tabs) Replace: (left blank, will replace with nothing, which is the same as deleting it) My list instantly becomes a long string of characters:

Kansas555-123-4567Kentucky555-123-4567Louisiana555-123-4567 Maine555-123-4567Maryland555-123-4567 Step 2. Run the main routine and create my html:

Find: ([^d]w+)(d-d-d) Replace: 1r One click, and that string turns into this:

KansasKentuckyLouisianaMaineMaryland Remember, this is just the beginning of what you can do with this. Once you start including more heavy-duty query commands, all the worlds power will be at your fingertips. Or at least you’ll have more time to plan world domination.