regex question

Discussion:

regex question

(too old to reply)

Joseph Rosevear

2024-09-28 05:28:38 UTC

Hello,

This may be off topic, but let me try anyway.

I've been editing HTML. I used to write my comments (incorrectly) like
this:

<! comment.>

I asked my AI friend, Pi, for help and he said do this in vim:

:%s/<!\s*(.*)>//g

I've used vi, so I tried that and vim. I don't know if it matters.
Anyway, I got "No match found."

I could do other sorts of search and replaces, but not using the (.*) and
\1 combination.

This was frustrating, because Pi kept telling me it should work. But it
didn't. I'm wondering if I'm missing something?

Of course my goal is to simplify the editing of the above bad comment,
changing it to:



Help me if you can?

-Joe

Petri Kaukasoina

2024-09-28 08:55:26 UTC

Permalink

Post by Joseph Rosevear
:%s/<!\s*(.*)>//g

Try sed:
sed -i.bak 's/<!$ .*$>//g' *.html

Or, if you prefer vi:
:%s/<!$ .*$>//g

They won't work if the comment is more then one line long.

Joseph Rosevear

2024-10-11 06:49:25 UTC

Permalink

Post by Petri Kaukasoina

Post by Joseph Rosevear
:%s/<!\s*(.*)>//g

sed -i.bak 's/<!$ .*$>//g' *.html
:%s/<!$ .*$>//g
They won't work if the comment is more then one line long.

Thanks. I got it working in JOE (Joe's Own Editor), which is my
preferred editor. That helped, using a familiar tool. I didn't know it
could do regex. Here's what I used:

^ku go to the top of the file
^kf find
<!\y> the thing to find
r replace
 the replacement thing
y "yes" do the replace, and find the next match

I decided to remove the leading space from the search. I noticed that
that you put it in the parentheses like this:

$ .*$

Perhaps I'll need that some day.

Note that in JOE "^y" is shorthand for "$.*$".

Thanks.

Eric Pozharski

2024-09-29 14:03:21 UTC

Permalink

Post by Joseph Rosevear
This may be off topic, but let me try anyway.

Yes, yes it is. comp.editors would be glad to feature yet another
vi v. vim war (it's a thing).

Post by Joseph Rosevear
I've been editing HTML. I used to write my comments (incorrectly)
<! comment.>
:%s/<!\s*(.*)>//g
I've used vi, so I tried that and vim. I don't know if it matters.
Anyway, I got "No match found."
I could do other sorts of search and replaces, but not using the (.*)
and \1 combination.
This was frustrating, because Pi kept telling me it should work. But
it didn't. I'm wondering if I'm missing something?

Yes, ':h /magic' (we are still in vim-context) would clean that up. See
for yourself:

source:

LFOT PLFA FYRK VCCB
KBUY <!KCVR UKBB> DNEN
<!ETHB KJCX> KSNA LHBM
LFSL DFOD <!DNMK KWLR>
<!CWOC OPBN DIOD KOLE>
PAKN DIAE <!> KATW DGLN
KNBQ KGGE RPUB FMNH

Either

:%s/<!\s*$.*$>//g

or

:%s/\v\<!\s*(.*)\>//g

would do what you want. Watch for backslashes.

*SKIP* [ 4 lines 1 level deep]

p.s. Acquiring vi-zoo is deep down on my TODO.lst . Thus I can't say
how to fix halucinations.

p.s.s. Yes, People vs. /magic war should be featured on comp.editors .

--
Torvalds' goal for Linux is very simple: World Domination
Stallman's goal for GNU is even simpler: Freedom

Mike Spencer

2024-09-30 07:40:35 UTC

Permalink

Post by Joseph Rosevear
Hello,
This may be off topic, but let me try anyway.
I've been editing HTML. I used to write my comments (incorrectly) like
<! comment.>
:%s/<!\s*(.*)>//g
I've used vi, so I tried that and vim. I don't know if it matters.
Anyway, I got "No match found."

I've never used regex's in vi/vim so beware.

In perl, ".*" is greedy and, in the above locution, will eat up the
'>', thus never separately finding the sought-for closing '>'.

I think Perl would say,

s/<!\s*([^>]+)>//;

where "[^>]" is "any char not '>'".

I think this might fail on multi-line comments if you tell perl to read
the input file line by line (perl -n or while(<>) ).

A workaround for that is to have a perl script read a whole file into
a single variable $foo, then do the substitution on that var.

$foo =~ s/<!\s*([^>]+)>//sg;

where modifier 's' is "Treat the string as single line." and 'g' is
"globally match the pattern repeatedly in the string".

I guess that would fail if you have any '>' chars withing your
comments. Hopefully not.

Post by Joseph Rosevear
I could do other sorts of search and replaces, but not using the (.*) and
\1 combination.
This was frustrating, because Pi kept telling me it should work. But it
didn't. I'm wondering if I'm missing something?
Of course my goal is to simplify the editing of the above bad comment,

--
Mike Spencer Nova Scotia, Canada

Mike Spencer

2024-10-01 20:13:18 UTC

Permalink

Post by Mike Spencer

Apparently wrong. :-\

From the command line:

perl -n -e 's/<!\s*([^>]+)>//;print;'

This fails. "<! " is converted to "<!- ", not "/;

???

Post by Mike Spencer
I think this might fail on multi-line comments if you tell perl to read
the input file line by line (perl -n or while(<>) ).
A workaround for that is to have a perl script read a whole file into
a single variable $foo, then do the substitution on that var.
$foo =~ s/<!\s*([^>]+)>//sg;
where modifier 's' is "Treat the string as single line." and 'g' is
"globally match the pattern repeatedly in the string".

In a script:

#!/usr/bin/perl

$x = '';

while(<>)
{ $x = $x . $_;
}

$x =~ s/<!\s*([^>]+)>//sg;

print $x;

the regex works as I wrote that it should.

Sorry. It was late and I didn't test my own advice. Now trying to
figure out why the first one fails.

CAVEAT: You probably don't want to change the "<!DOCTYPE..." line if
you have one. "<!D" matches "/<!\s*" so depending on the text
within your existing misconfigured comments, you might use "/<!\s+".

--
Mike Spencer Nova Scotia, Canada

How to avoid recursive wrongness in Usenet posts: Don't post. :-)

Eli the Bearded

2024-10-01 22:09:19 UTC

Permalink

Post by Mike Spencer

Post by Mike Spencer
I've never used regex's in vi/vim so beware.

They do differ from Perl. How different depends on version and config.
Elvis will be different from Vim. Vim has options to change this, etc.
Regexps are very sensitive to regexp engine in use.

Post by Mike Spencer

Post by Mike Spencer
I think Perl would say,
s/<!\s*([^>]+)>//;
where "[^>]" is "any char not '>'".

Apparently wrong. :-\
perl -n -e 's/<!\s*([^>]+)>//;print;'

That command line is working perfectly for me. I would do it
differently, but that works.

$ echo '<! huh? >' | perl -n -e 's/<!\s*([^>]+)>//;print;'

$ echo '' | perl -n -e 's/<!\s*([^>]+)>//;print;'

$

You can shorten the one liner by using '-p' instead of '-n':

perl -p -e 's/<!\s*([^>]+)>//;'

The '-p' is "loop with printing' and '-n' is 'loop not printing'.

As for the regex, as my second example hints at, I'd change it to
require the leading whitespace, or at least not to alter existing
comments.

# require at least one space version
perl -pe 's/<!\s([^>]+)>//'

# don't require a space, but don't alter previous output
perl -pe 's/<!\s*([^->][^>]*)>//'

Of course, the full HTML comment rules are notoriously a mess.

https://www.rfc-editor.org/rfc/rfc1866#section-3.2.5

Mostly, so long as $1 does not contain a double hyphen, the "don't
require a space, but don't alter previous output" will generate correct
output.

Elijah
------
cannot guess how Mike Spencer's command was outputing single hyphens

Sylvain Robitaille

2024-10-03 13:48:07 UTC

Permalink

Post by Mike Spencer

Post by Mike Spencer
I think Perl would say,
s/<!\s*([^>]+)>//;
where "[^>]" is "any char not '>'".

Apparently wrong. :-\
perl -n -e 's/<!\s*([^>]+)>//;print;'
This fails. "<! " is converted to "<!- ", not "/;
...
Sorry. It was late and I didn't test my own advice. Now trying to
figure out why the first one fails.

The best that I can tell, is it looks like the shell is grabbing the
"!-" sequence, and apparently leaving behind only the "!". Try your
first approach, and let it fail, then using command history, review the
command. Notice that it now says "... <!- ...". That's at least what
I'm seeing with tcsh.

: elvira[syl] ~; echo '<! comment.>' |perl -n -e 's/<!\s*([^>]+)>//;print;'
<!- comment.-->

(press the "up" cursor key ...)

: elvira[syl] ~; echo '<! comment.>' |perl -n -e 's/<!\s*([^>]+)>/<!- $1-->/;print;'

If instead I escape the "!", it's handled as you intended:

: elvira[syl] ~; echo '<! comment.>' | perl -n -e 's/<!\s*([^>]+)>/<\!-- $1-->/;print;'


I hope that this helps.

--
----------------------------------------------------------------------
Sylvain Robitaille ***@therockgarden.ca
----------------------------------------------------------------------

Mike Spencer

2024-10-07 05:58:55 UTC

Permalink

Post by Sylvain Robitaille

Post by Mike Spencer
perl -n -e 's/<!\s*([^>]+)>//;print;'
This fails. "<! " is converted to "<!- ", not "/;
...
Sorry. It was late and I didn't test my own advice. Now trying to
figure out why the first one fails.

Yes. Good catch. I do most command line stuff in emacs shell where
command history comes from emacs, not the shell. If I do as you
suggest in an xterm, just what you say happens!

Post by Sylvain Robitaille
: elvira[syl] ~; echo '<! comment.>' |perl -n -e 's/<!\s*([^>]+)>//;print;'
<!- comment.-->
(press the "up" cursor key ...)
: elvira[syl] ~; echo '<! comment.>' |perl -n -e 's/<!\s*([^>]+)>/<!- $1-->/;print;'
: elvira[syl] ~; echo '<! comment.>' | perl -n -e 's/<!\s*([^>]+)>/<\!-- $1-->/;print;'

I hope that this helps.

TYVM.

--
Mike Spencer Nova Scotia, Canada