Using regular expressions to make life easier.

An entry from the archives.

I was going through some of my old material, and happened to come accross an entry that I wrote about using Regular Expressions. Since the post still has merit, I’d like to add it here for your enjoyment.

Work smarter, not harder - Original Post from March 3rd, 2005:

Recently, one of the projects I was working on was to insert a list of U.S. states into an options list, with contact phone numbers that would be then passed to another script.

The client sent me a list as a text file, looking something like this (I’ve changed all the number to something more generic, for obvious reasons - and the list I got from the client was much longer):

Kansas 555-123-4567
  Kentucky 555-123-4567
  Louisiana 555-123-4567
  Maine 555-123-4567
  Maryland 555-123-4567

From here, I have to make some workflow decisions on how to get this into a form object with the proper syntax. What tool do I use? Do I take the time to do it manually, or should I automate it somehow?

For the tool, I chose BBedit because I chose to automate the process,   although Dreamweaver can also do this to a certain extent.

When you bring up the Find dialog, you’ll notice a very innocent looking button that has a lot of power when used to its full extent.

In BBEdit its labeled “Use Grep”.

And in Dreamweaver “Use regular expression”.

Even though they pretty much do the same thing, with the exception of some minor differences in language, which I’ll get into later.

What is Grep/regular expressions?

Let’s say you want to find something on a page - like a number. You just don’t know what number, or you’d like to have every number. The best way to search for it is with an expression that represents “number”.

This looks like a d in grep language. The d simply stands for “digit”.

Try doing a search for d on a page with the Grep expression button checked. You’ll see that the results will bring back every single instance of a digit that is on the page.

Starting to see where this is going? Let’s look over more of the syntax, and how we can combine thigns to find what we’re looking for.

Basic Syntax

So we know d is a digit. What if we want more digits in a row than just one?

Easy, if you want to match one or more digits, you add “+” right after the d, looking like this: “d+”.

will give you a match to a specific number of digits (replacing ‘n’ with a real number. So, “d” will match every area code on the page, and nothing else.

There’s a lot of good information already available on the specific syntaxes you can use, so I’ll leave that up to you and google. In the meantime, let’s shift gears a big and do some group hugs.

Matching Patterns

When building an effective find/replace routine, you can actually get pretty sophiscticated about managing the patterns and how they get replaced. You can replace entire chuncks of text, or just move a bit of text after another, which was one of my goals for the phone number list.

I knew that my list looked like this: a state name, a space, three digits, a dash, three more digits, another dash, and then four digits. By using elipsis ( ), I can break apart the search into the chunks that I need to use for replacement. That gives me something more like this:

(state name, space)(d-d-d)

See how I started building in the grep language? Easy as pie. But I still need a more effective way of getting the state name into the pattern, since obviously “state name” won’t work. I know that “w” is any word character - this does include numbers, so I need to be careful with my usage here. In this case, I just want to search letters until it hits the first number. So, I need an exeption wildcard put in place. The wildcard looks like a carrot “^” and needs to be in brackets for it to become a “stop when this happens” command. So to search for all letters until I hit a number, I come up with a pattern that resembles this:

([^d]w+)

 

I can then add it to my phone number pattern:

([^d]w+)(d-d-d)

Inserting content between patterns.

So, great - I can now match every State/Phone combo I have. What should I do with it now?

Now, we start replacing things. You can replace patters easily with just two keystrokes for each pattern.

In BBEdit, it goes like this - replace with 12 will give me exactly what I searched for, and 21 will reverse them - phone number first, then State.   121 will give me my pattern, then add the state again at the end - see how it works?

In Dreamweaver, it’s the same, but write it with dollar signs instead:   $1$2, $2$1, and $1$2$1 for the above examples.

So, cutting right to the chase scene, here’s what I used to turn my long list of states and phone numbers into a proper list of options for an html form:

Step 1. Strip the spaces so that they don’t trip up my patterns:

Find: s (to find spaces and tabs)   Replace: (left blank, will replace with nothing, which is the same as deleting it)

My list instantly becomes a long string of characters:

Kansas555-123-4567Kentucky555-123-4567Louisiana555-123-4567   Maine555-123-4567Maryland555-123-4567

Step 2. Run the main routine and create my html:

Find: ([^d]w+)(d-d-d)   Replace: r

One click, and that string turns into this:

Remember, this is just the beginning of what you can do with this. Once you start including more heavy-duty query commands, all the worlds power will be at your fingertips. Or at least you’ll have more time to plan world domination.

Comments Back to Top

1. jordan

Jan 27th, 2006

I absolutely despise regular expressions, because the things are so incredibly complicated, and I’m very error-prone when I write them. I have used them a little to write a BBCode parser, and it was semi-functional.

Regardless, I’ll leave you with the inevitable jwz quote:

<cite>Some people, when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.</cite>

2. Anton

Jan 27th, 2006

I once heard someone say the same things about computers…  “incredibly complicated”, “error-prone”, “semi-functional”...

As it is, if you are not comfortable with them, go on about what you normally do.  This is simply something I like to do when faced with editing 60 static html pages that are expected to be online in less than an hour.

Plus, I test my expressions regularly (pun intended)... :)So, they are rarely error-prone.  And now, I absolutely cannot live without them.

3. jordan

Jan 27th, 2006

I’m not saying they aren’t useful, because it’s obvious that they are. I’m just saying that I don’t personally like them. :)

4. luxuryluke

Mar 27th, 2006

…I don’t like doing situps.

5. Bob H

Apr 13th, 2006

I must say I LOVE regular expressions. Use them in DW several times a week, at least. Or when it doesn’t do it the way I want, I jump over to NoteTab (I’m a PC guy—but definitely not PC ). Same thing with Photoshop actions…Love ‘em! Anytime I can sit back and let my computer do my work for me, I’m all in!
——-