> I'm trying to work with XML files for storing my data, which is for an
> MP3 player program. The structure of the file is relatively simple:
[quoted text clipped - 19 lines]
> go about removing this excess junk from the file, as it disrupts the
> rest of the program?
I'm assuming you want to remove the <song> node (you don't explicitly
say this, and it's not clear what "this excess junk" refers to).
Instead of removing oldNode, why don't you remove the song node
itself? Going through your code, line by line:
> NodeList theNodes = doc.getElementsByTagName("song");
theNodes is now a list of all the song nodes in the file.
> for (int i=0; i<theNodes.getLength(); i++) {
We're going to do something with each song node in the file.
> Node oldNode = theNodes.item(i).getFirstChild();
We're going to get the first child of each song node in the file (i.e.
the text element, "ACDC - Hell's Bells.mp3").
> Node oldParent = oldNode.getParentNode();
We're going to get the parent of the text element, i.e. the song node
again (but why do this when we already had access to the song node from
the for loop?)
> oldParent.removeChild(oldNode);
We're removing the text node from the song node. So <song>ACDC -
Hell's Bells.mp3</song> becomes <song></song>, or equivalently, <song/>.
> }
- Oliver
jackroofman@gmail.com - 10 Apr 2007 19:35 GMT
> <jackroof...@gmail.com> wrote in message
>
[quoted text clipped - 57 lines]
>
> - Oliver
Sorry I wasn't entirely clear about what I wanted to do. I want to
remove those "<song/>" nodes that get left over, or find a way to
prevent them from being left in the first place. After looking over
your suggestion, I tried removing the redundancies in my code, simply
going with:
for (int i=0; i<theNodes.getLength(); i++) {
Node oldNode = theNodes.item(i);
Node oldParent = oldNode.getParentNode();
oldParent.removeChild(oldNode);
}
The problem is that this only removes every other line. For example,
if I had
<mp3player>
<subdir location="\Music\Rock">
<song>ACDC - Hell's Bells.mp3</song>
<song>ACDC - Hell's Bells (live).mp3</song>
<song>ACDC - Thunderstruck.mp3</song>
<song>ACDC - Who Made Who.mp3</song>
</subdir>
</mp3player>
and then ran that code, the result of the file is still not what I'm
going for. The original code would result in:
<mp3player>
<subdir location="\Music\Rock">
<song/>
<song/>
<song/>
<song/>
</subdir>
</mp3player>
and the modified code posted in this reply results in:
<mp3player>
<subdir location="\Music\Rock">
<song>ACDC - Hell's Bells (live).mp3</song>
<song>ACDC - Who Made Who.mp3</song>
</subdir>
</mp3player>
This leads me to the conclusion that there's a fundamental flaw in my
logic and/or my understanding of the inner workings of XML and Nodes.
What's really interesting is that, if I run the code again, the
resulting file becomes:
<mp3player>
<subdir location="\Music\Rock">
<song>ACDC - Who Made Who.mp3</song>
</subdir>
</mp3player>
And another run yields:
<mp3player>
<subdir location="\Music\Rock">
</subdir>
</mp3player>
But no matter what, those blank lines won't go away. What is it about
the behavior of XML files that I'm missing?
jackroofman@gmail.com - 10 Apr 2007 19:52 GMT
It seems I've been defeated by the automatic formatting of the posts.
With each removal of the <song> nodes, a blank line is left. That is,
the last example above has four blank lines, not just one. I have
managed to avoid the whole issue of removing every other line; it
readjusts the count with each Node removed, so once Node 1 is removed
and the i variable goes on to 2, it's actually acting on what WAS the
third Node. A simple i--; fixed that, but the excessive blank lines
still remain. What's the best way to go about removing those?
Oliver Wong - 10 Apr 2007 21:56 GMT
> It seems I've been defeated by the automatic formatting of the posts.
Shows up fine here. If you're using GoogleGroups, try using the "raw"
view to see what other usenet users actually see.
> With each removal of the <song> nodes, a blank line is left. That is,
> the last example above has four blank lines, not just one. I have
[quoted text clipped - 3 lines]
> third Node. A simple i--; fixed that, but the excessive blank lines
> still remain. What's the best way to go about removing those?
"Best" way is probably to define a DTD or Schema explicitly stating
the format of your XML format, and in what locations is whitespace not
significant, and places where it IS significant. For example, in your
<song> tag, whitespace IS significant. The elements <song>some file
name.mp3</song> and <song>some file name.mp3</song> point to two
different files on the file system.
Failing that, you can always write a hack:
<SSCCE>
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
import org.w3c.dom.Document;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.w3c.dom.Text;
import org.xml.sax.SAXException;
public class XMLTest {
public static void main(final String[] args) throws Exception {
final Document document = inputXML();
doProcessing(document);
stripWhitespace(document.getDocumentElement());
outputXML(document);
}
private static Document inputXML() throws ParserConfigurationException,
SAXException, IOException {
final DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
final DocumentBuilder builder = factory.newDocumentBuilder();
final Document document = builder.parse(new File("songs.xml"));
return document;
}
private static void doProcessing(final Document document) {
final NodeList songNodes = document.getElementsByTagName("song");
while (songNodes.getLength() > 0) {
final Node songNode = songNodes.item(0);
songNode.getParentNode().removeChild(songNode);
}
}
private static void outputXML(final Document document) throws
TransformerFactoryConfigurationError, TransformerConfigurationException,
TransformerException {
final TransformerFactory tFactory = TransformerFactory.newInstance();
final Transformer transformer = tFactory.newTransformer();
final DOMSource source = new DOMSource(document);
final StreamResult result = new StreamResult(System.out);
transformer.transform(source, result);
}
private static void stripWhitespace(Node e) {
NodeList children = e.getChildNodes();
List<Node> childrenToRemove = new ArrayList<Node>();
for (int i = 0; i < children.getLength(); i++) {
final Node currElement = children.item(i);
if (currElement.getNodeType() == Node.TEXT_NODE) {
Text t = (Text) currElement;
if (t.getData().trim().length() == 0) {
childrenToRemove.add(t);
}
}
}
for (Node n: childrenToRemove) {
e.removeChild(n);
}
for (int i = 0; i < children.getLength(); i++) {
stripWhitespace(children.item(i));
}
}
}
</SSCCE>
This program expects your data to be in the song.xml file, and outputs
the results to standard out.
Notice I also provide two alternative tricks for removing elements
without resorting to "i--;" which I think will be bug prone.
You'll probably want to modify stripWhitespace() to do something a bit
more intelligent than just obliterating any text elements which contain
only whitespace.
- Oliver