Page 1 of 1

regex for empty string after a space?

Posted: Thu Feb 02, 2012 3:24 am
by Matafou
Hi, a little question to emacs lisp regex experts. I am a user of the \b \B \< \> special regex that match empty regex at special position. But I am stuck on the following:

Is it possible to define a regex that matches an empty string after a space?

Or, as a workaround to this in my case, is it possible to define a regex that matches an empty string if at the beginning of anything but a space (word, puctuation etc)?

Best regards,

P.

Re: regex for empty string after a space?

Posted: Fri Feb 03, 2012 8:26 pm
by nuntius
I don't understand the question.
Could you post some examples of expected matches?

Re: regex for empty string after a space?

Posted: Sat Feb 04, 2012 3:18 am
by ramarren
As far as I can tell the engine Emacs uses for regular expressions doesn't include Perl-style look-around assertions which are necessary for this sort of thing.

Depending on exactly what you are doing it might be possible to implement equivalent functionality, or even use shell-command-on-region to call Perl.

Re: regex for empty string after a space?

Posted: Sun Feb 05, 2012 6:20 am
by edgar-rft
Every regular expression matching a space also matches the empty string after the space, but there is no way to find out where the empty string comes from. From the view point of a regexp engine [no matter what programming language] there are empty strings before and after every character in the string, so if you have a string "ABC", then the regexp engine sees it as:

<start-of-string><empty-string>A<empty-string>B<empty-string>C<empty-string><end-of-string>

That's the reason why the ELisp manual writes:

\b - matches the empty string, but only at the beginning or end of a word.

\B - matches the empty string, but not at the beginning or end of a word.

The reason for this somewhat strange sounding definition is that from the view point of the regexp engine there are empty strings before and after every character in the string and the only way to find out if an empty string occurs at the beginning or end of a word is to look at the characters before and after the empty string between the characters.

The question is: which empty string do you want to match?

The particular problem in practice is that after concatenating two strings, the regexp engine has no chance to find out where the concatenation happened and if an empty string was concatenated or not. In such situations the only way is to use lists or vectors of strings instead of concatenation.

To match the empty string between a space and a non-space character or the empty string after a last space character in a string you could use:

Code: Select all

(string-match " \\([^ ]\\|\\'\\)" "ABC")   => NIL ; no space character
(string-match " \\([^ ]\\|\\'\\)" " ABC")  => 0
(string-match " \\([^ ]\\|\\'\\)" "A BC")  => 1
(string-match " \\([^ ]\\|\\'\\)" "AB C")  => 2
(string-match " \\([^ ]\\|\\'\\)" "ABC ")  => 3
(string-match " \\([^ ]\\|\\'\\)" "A B C") => 1
(string-match " \\([^ ]\\|\\'\\)" "A  B ") => 2
(string-match " \\([^ ]\\|\\'\\)" "A   B") => 3
This gives the position of the last space character before the empty string after the space.

Is that what you wanted?

- edgar

Re: regex for empty string after a space?

Posted: Thu May 17, 2012 7:08 am
by Matafou
Thanks all for your answers. Iknow that what I ask is not possible with standard regular expressions. But \b and others are made to allow for more control on what is *matched" and go a bit beyond regex power.

When writing emacs modes it is particularly important to have the exact matched string. The point of this empty strings stuff in emacs regexp is to be able to accept a string depending of what is around it *without including what's around it in the matched text*. In particular it allows for good behavior with functions like re-search-forward (which stops at the right place then).

Shy regex (written (?:...)) are sometimes ok but not always.

For example in the following code kindlt proposed by edgar-rft:

> (string-match " \\([^ ]\\|\\'\\)" " ABC") => 0

the problem is that this does not match the empty string, in matches the character following the space (i.e. "A"). therefore if I do (re-search-forward thisregex) I end up with the point *after* the A, and not before.

My explanations are maybe unclear sorry. If you have ideas I am interested.

Thanks again for your time anyway.