[geeks] bash string matching
Shannon Hendrix
shannon at widomaker.com
Wed Jun 4 20:24:39 CDT 2008
On Jun 4, 2008, at 18:59 , Anthony Ortenzi wrote:
> Shannon Hendrix said:
>> BTW: for those who don't know, the [[ ]] above is just a shortened if
>> statement, the same as:
>>
>> if [[ "$STRING" == *geek* ]]
>> then
>> echo geek
>> fi
>>
>
> Two things...
>
> First, the [[ ]] isn't the "if", the && is.
I don't recall saying the [[ ]] was an if, but rather the whole
expression is a shortened variation on a full if statement.
See below for the problem and the solution. It's a bash/shell bug and
design issue that you just have to work around.
> Second, regarding quotes, my man page for bash 2.05 contains mention
> of
> quote removal.
>
> [[ expression ]]
> Return a status of 0 or 1 depending on the evaluation of the
> conditional expression expression. Expressions are composed of
> the primaries described below under CONDITIONAL EXPRESSIONS.
> Word splitting and pathname expansion are not performed on the
> words between the [[ and ]]; tilde expansion, parameter and
> variable expansion, arithmetic expansion, command substitution,
> process substitution, and quote removal are performed.
Ha! Run a bash script with tracing turned on and you'll see the man
page is lying to you, in more than one area.
You are also misinterpreting what the man page is saying. They are
talking about parameter expansion not happening between [[ and ]], as
it does between [ and ].
That's because [ is actually /bin/[, alias to /bin/test, while [[ is a
built in shell command and it has a lot of other extensions and
different rules too.
Also, the man page is misleading because bash *DOES* do splitting
inside of [[ and ]], and you can verify that by tracing a shell script.
Maybe that is a bug, but what happens is this:
[[ "$STRING" == *one world order* ]]
...is interpreted as this:
[[ "$STRING" == *one
...which is broken syntax. bash seems to terminate parsing the RHS
expression on the first space, or even some other characters, in spite
of what the man page says.
It looks to me like a bash bug, but then it could be done to make sure
[[ is compatible with the old fake [ syntax.
The old Bourne shells were essentially broken because commands like
[ were not really commands at all, they were just designed to fake it,
and make it look like part of the language, except that being external
they were subject to parameter expansion.
That has made it hellish to create new shells that are Bourne
compatible. There is no grammar for Bourne shell either, because it's
syntax was fabricated on-the-fly and largely by whim, and externally
extended with ugly hacks like /bin/[.
Anyway, the way you got around this bug in the old shells was this:
[ "string" = "quoted stuff with spaces" ]
Unfortunately, quoting breaks the pattern engine in bash.
If you do this:
[[ "$STRING" == "*stuff with spaces*" ]]
...then bash auto-escapes a lot of it, which breaks the matching engine.
I don't know if that's a bug or a compatibility hack. Anyone have any
idea?
In other words, bash does something like this:
[[ "$STRING" == '\*\ stuff\ with\ spaces\*' ]]
You can see if if you run bash with tracing turned on.
Anyway, here is the solution:
[[ "$STRING" == *[[:space:]]stuff[[:space:]]with[[:space]]spaces* ]]
The meta patterns keep you from having to put spaces in there, so the
RHS expression is interpreted as a single word and doesn't break the
[[ syntax.
The same fix works for =~ regex matching too.
Ugly, but it works.
It seems like this could have been handled better, because it does
suck to create strings like that.
Anyone else found another way around this?
--
"Where some they sell their dreams for small desires."
More information about the geeks
mailing list