> Hi,
> there's a DocBook XML file which I want to modify. The file contains
[quoted text clipped - 42 lines]
>
> Tom
Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
can then use a replacement along the lines of "<!-- PathToImage
-->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
building the pattern.
Hope that helps.
Pan
======================================================================
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
bauer@b3s.de - 24 May 2005 17:22 GMT
> Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
> can then use a replacement along the lines of "<!-- PathToImage
> -->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
> building the pattern.
>
> Hope that helps.
Not really ... this results in the same problem I already described.
Instead of substituting \1\2\3 with the matching groups I get only this
special char (looks like a square, cannot displayed here). Btw I even
noticed that you used $1$2$3. This is perl, right? In Java it would be
\1\2\3 or am I wrong?
You can try it yourself. Save the following content to a file:
<chapter>
<title>Chapter 1</title>
<sect1>
<title>Section 1</title>
<para>
Test Test Test Test Test Test Test Test Test
</para>
<mediaobject>
<imageobject>
<imagedata fileref="image.svg" format="SVG"/>
</imageobject>
</mediaobject>
<para>
Test Test Test Test Test Test Test Test Test
</para>
</sect1>
</chapter>
Read this file with
public String readPlain( File file ) throws Exception
{
String content = new String();
String line = new String();
BufferedReader brd = new BufferedReader( new FileReader( file ) );
while ( ( line = brd.readLine() ) != null )
content += line + "\r\n";
brd.close();
return content;
}
and then apply a
content = Pattern.compile( "(<mediaobject)(.*)(</mediaobject>)",
Pattern.MULTILINE|Pattern.DOTALL).matcher(
content).replaceAll("<!-- Test -->\1\2\3");
_
Tom
bauer@b3s.de - 24 May 2005 17:28 GMT
Damn Java regex !!! It is $1$2$3. That was the point. I used the wrong
syntax for backrefs. But in Java API 1.4.2 under
java.util.regex.Pattern stands
Back references
\n Whatever the nth capturing group matched
So what ... ?!?
TechBookReport - 24 May 2005 17:43 GMT
> Damn Java regex !!! It is $1$2$3. That was the point. I used the wrong
> syntax for backrefs. But in Java API 1.4.2 under
[quoted text clipped - 4 lines]
>
> So what ... ?!?
Did you escape the backslashes? Also, the funny square character is
probably the \r\n you are using. Try
System.getProperty("line.separator") instead.
Pan
======================================================================
TechBookReport Java http://www.techbookreport.com/JavaIndex.html
bauer@b3s.de - 24 May 2005 18:08 GMT
TechBookReport schrieb:
> > Damn Java regex !!! It is $1$2$3. That was the point. I used the wrong
> > syntax for backrefs. But in Java API 1.4.2 under
[quoted text clipped - 8 lines]
> probably the \r\n you are using. Try
> System.getProperty("line.separator") instead.
No the funny square char is not the \r\n cause if so it would be on
every line independant of the regex code. I'm on Windows and the app
runs only on this system but you are right, better I use
getProperty("line.separator").
I guess the funny square is some unicode character (\1=0x01?) if I use
\1 without escaping the backslash.
But that doesn't matter anymore, my problem is solved. Thanks for your
help.
Alan Moore - 24 May 2005 22:54 GMT
>Have you tried a pattern of "(<mediaobject)(.*)(</mediaobject>)". You
>can then use a replacement along the lines of "<!-- PathToImage
>-->$1$2$3". I'd also use Pattern.MULTILINE | Pattern.DOTALL when
>building the pattern.
If there can be more than one mediaobject element in a document, you
need to use a reluctant dot-star:
"<mediaobject.*?</mediaobject>"
Otherwise, it will match everything from the first opening tag to the
last closing tag. Even if there's only one such element, it will
probably be more efficient this way.
You don't really need to use capturing parentheses, since you're
re-inserting the whole match; just use $0:
str = str.replaceAll("<mediaobject.*?</mediaobject>",
"<!-- PathToImage -->$0");
The JDK regex package uses the same syntax as Perl WRT
backreferences--"\n" within the regex and "$n" in the replacement
string--except that it uses $0 instead of $& for the whole match, and
doesn't emulate the other dollar-plus-punctuation variables: $`, $',
and $+.