[geeks] bash string matching

Shannon Hendrix shannon at widomaker.com
Wed Jun 4 20:24:39 CDT 2008


On Jun 4, 2008, at 18:59 , Anthony Ortenzi wrote:

> Shannon Hendrix said:
>> BTW: for those who don't know, the [[ ]] above is just a shortened if
>> statement, the same as:
>>
>> if [[ "$STRING" == *geek* ]]
>> then
>> 	echo geek
>> fi
>>
>
> Two things...
>
> First, the [[ ]] isn't the "if", the && is.

I don't recall saying the [[ ]] was an if, but rather the whole  
expression is a shortened variation on a full if statement.

See below for the problem and the solution.  It's a bash/shell bug and  
design issue that you just have to work around.

> Second, regarding quotes, my man page for bash 2.05 contains mention  
> of
> quote removal.
>
> [[ expression ]]
>    Return  a  status  of  0 or 1 depending on the evaluation of the
>    conditional expression expression.  Expressions are composed  of
>    the  primaries  described  below  under CONDITIONAL EXPRESSIONS.
>    Word splitting and pathname expansion are not performed  on  the
>    words  between  the  [[  and  ]]; tilde expansion, parameter and
>    variable expansion, arithmetic expansion, command  substitution,
>    process substitution, and quote removal are performed.

Ha!  Run a bash script with tracing turned on and you'll see the man  
page is lying to you, in more than one area.

You are also misinterpreting what the man page is saying.  They are  
talking about parameter expansion not happening between [[ and ]], as  
it does between [ and ].

That's because [ is actually /bin/[, alias to /bin/test, while [[ is a  
built in shell command and it has a lot of other extensions and  
different rules too.

Also, the man page is misleading because bash *DOES* do splitting  
inside of [[ and ]], and you can verify that by tracing a shell script.

Maybe that is a bug, but what happens is this:

[[ "$STRING" == *one world order* ]]

...is interpreted as this:

	[[ "$STRING" == *one

...which is broken syntax.  bash seems to terminate parsing the RHS  
expression on the first space, or even some other characters, in spite  
of what the man page says.

It looks to me like a bash bug, but then it could be done to make sure  
[[ is compatible with the old fake [ syntax.

The old Bourne shells were essentially broken because commands like  
[ were not really commands at all, they were just designed to fake it,  
and make it look like part of the language, except that being external  
they were subject to parameter expansion.

That has made it hellish to create new shells that are Bourne  
compatible.  There is no grammar for Bourne shell either, because it's  
syntax was fabricated on-the-fly and largely by whim, and externally  
extended with ugly hacks like /bin/[.

Anyway, the way you got around this bug in the old shells was this:

	[ "string" = "quoted stuff with spaces" ]

Unfortunately, quoting breaks the pattern engine in bash.

If you do this:

	[[ "$STRING" == "*stuff with spaces*" ]]

...then bash auto-escapes a lot of it, which breaks the matching engine.

I don't know if that's a bug or a compatibility hack.  Anyone have any  
idea?

In other words, bash does something like this:

	[[ "$STRING" == '\*\ stuff\ with\ spaces\*' ]]

You can see if if you run bash with tracing turned on.

Anyway, here is the solution:

	[[ "$STRING" == *[[:space:]]stuff[[:space:]]with[[:space]]spaces* ]]

The meta patterns keep you from having to put spaces in there, so the  
RHS expression is interpreted as a single word and doesn't break the  
[[ syntax.

The same fix works for =~ regex matching too.

Ugly, but it works.

It seems like this could have been handled better, because it does  
suck to create strings like that.

Anyone else found another way around this?




-- 
"Where some they sell their dreams for small desires."



More information about the geeks mailing list