Java Forum / General / May 2006
Problem Writing Binary Data Stream To File
bmcdougald@hotmail.com - 17 May 2006 17:34 GMT I have written a servlet that makes an HTTP connection to our report repository system and returns data to the calling browser in either text or binary format. The binary formats returned are either Adobe PDF's or Excel spreadsheets. This is working well and presents data correctly to a user's browser and/or initiates an HTTP download session in the browser, which, again, stores the data correctly as text or binary.
Now, I want to write a batch driven backend java program (not servlet) that will call the the report repository via the same method above, but store the data as a file (text or PDF) in a directory somewhere on the server. I can get the text data portion of the program to write to a flat file correctly, but the PDF causes the Adobe Reader to fail when the file is opened. When I inspect the .pdf file to see the contents, it is all text, not binary as you would expect.
I used the same method for reading Bytes in this program function that I used in my servlet. Only thing different is that the DataOutputStream object pipes to a file instead of the HttpServletResponse object in the servlet, and I'm not setting any response headers with mime types and such.
What am I missing?
Here is my function code:
static void callUrl( String sType, String sSession, String sRptName, String sRid, String sIndexes, String sIPAddr, String sFolder){
byte[] buffer = new byte[8192]; //8k page boolean binaryFlag = false;
BufferedInputStream in; BufferedReader ir;
String sUrl; String inString; String sFilename = "", sExt = "";
sUrl="http://" + sIPAddr + "/webaccess/bmc-ctd-wa-cgi.exe?0=report&sid=" + sSession + "&rid=" + sRid + "&index=" + sIndexes + "&mode=External&errorflowelem=onerrorxml%2Etxt";
try{
URL url = new URL(sUrl); URLConnection uc = url.openConnection();
uc.setDoOutput(true); uc.setDoInput(true); uc.setAllowUserInteraction(false);
/* * Is Report a PDF - set binary flag true */
if(sType.equals("P")){ binaryFlag = true; sFilename=sRptName +".pdf"; sExt="PDF"; sUrl += "&18=Txt_P_2_Pdf_D"; }
/* * Is Report a TXT file - set binary flag flase */
if(sType.equals("T")){ binaryFlag = false; sExt="TXT"; sFilename=sRptName +".TXT"; }
sFilename = "x:/users/GS/"+sFolder+"/"+sExt+"/"+sFilename;
File aFile = new File(sFilename);
aFile.createNewFile();
/* Build data stream pipe to output file */
DataOutputStream myStream = new DataOutputStream( new FileOutputStream(aFile));
if (binaryFlag == true){
/* * write binary data to output file *** NOT WORKING *** */
in = new BufferedInputStream(url.openStream());
while (true) {
int nBytes = in.read(buffer);
if (nBytes < 0) break; // EOF ?
myStream.write(buffer,0,nBytes); // write binary data to file
}
in.close();
}else{
/* * write plain text data to output file *** WORKING *** */
ir = new BufferedReader( new InputStreamReader(url.openStream()));
while ( (inString = ir.readLine()) != null ) {
myStream.writeChars(inString);
}
ir.close();
}
myStream.flush(); myStream.close();
}catch (MalformedURLException malformed) { System.out.println("Malformed URL"); malformed.printStackTrace(); } // end catch malformed catch (IOException ioe) { System.out.println("BAD IO"); ioe.printStackTrace(); } // end catch IO catch (Exception e) { System.out.println("Exception"); e.printStackTrace(); } // }
Rhino - 17 May 2006 19:17 GMT >I have written a servlet that makes an HTTP connection to our report > repository system and returns data to the calling browser in either [quoted text clipped - 141 lines] > } // > } I haven't looked at your code, I just read the description of your problem. You may want to consider using iText to compose your PDF. I've used it for a few things and it creates PDFs that are easily read by Adobe Acrobat and/or browsers. It's also quite easy to use and well-supported by the developers. You can find out more at http://www.lowagie.com/iText/. I should mention that I haven't tried to access the documents created with iText from a servlet but I'd be very surprised if there was a problem in doing that.
Why re-invent the wheel?
-- Rhino
bmcdougald@hotmail.com - 17 May 2006 19:42 GMT Thanks, I'll look at this product.
However, just to clarify, the datastream coming from the report repository system is already in either binary PDF or ascii TXT format. My standalone program just redirects that datastream into a file, rather than passing it along to a browser as was the case in my servlet.
Rhino - 17 May 2006 21:36 GMT > Thanks, I'll look at this product. > [quoted text clipped - 3 lines] > rather than passing it along to a browser as was the case in my > servlet. I just suggested iText because, if you use it, you shouldn't have the reading problems you're getting now. I can imagine that may be too radical a solution for you and that you'd prefer to fix the code you posted. Unfortunately, I don't have enough experience with the kind of thing you're doing to figure out what's going wrong for you.
-- Rhino
Matt Humphrey - 18 May 2006 03:01 GMT >I have written a servlet that makes an HTTP connection to our report > repository system and returns data to the calling browser in either [quoted text clipped - 19 lines] > > What am I missing? Nothing jumps out at me as to why this is not working. Aside from thinking that sType does not really equal "P", this case raises some questions to me. How much data is written to the pdf in comparison to the actual size of the real pdf? (Say, if you download from the URL directly and SaveAs.) Do you get exactly the right output file names? What type does the content-header say the PDF is? When you say the PDF looks like text (which it can) how closely does it resemble the real pdf? Is it totally different text (and what does it say?) or are some characters corrupt? Are \r \n preserved correctly?
I would try out your code, but without knowing exactly what your parameters are I can't really test it. Is the web site publically accessible? I wonder if webserver is returning a different result because of some other content acceptor or parameter is not specified.
Cheers, Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
bmcdougald@hotmail.com - 18 May 2006 16:54 GMT When I download the file via the Browser/Servlet the PDF is approx 72K. When I create the .pdf through the standalone, it is approx 116K and all text.
I do get the right output file names, so the "P" flag is working.
The text in the PDF is formatted correctly. However, when I do a "type" command on the contents of the 72K file I get the PDF header followed by a bunch of non-sensical characters, as expected with a binary file. When I type the 116k version, I see the actual text with no PDF header.
This portion of the website is not publically available at the moment.
I wrote a quick program to open up a good .pdf file as an input stream and write it back out to another file using my binary I/O routine and it works fine. Ouput file is binary, same file size, and opens perfectly in Adobe.
Matt Humphrey - 18 May 2006 17:21 GMT > When I download the file via the Browser/Servlet the PDF is approx 72K. > When I create the .pdf through the standalone, it is approx 116K and [quoted text clipped - 14 lines] > it works fine. Ouput file is binary, same file size, and opens > perfectly in Adobe. This suggests that the web server is giving you different content or somehow transforming the content. I'm not really up-to-date on what the possibilities are for that problem. When you run your program, can you read off the content header of what it thinks it is sending you? try getContentEncoding () and getContentType () and see what they say.
Cheers, Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
bmcdougald@hotmail.com - 18 May 2006 21:12 GMT This could be true. In my servlet, I am intercepting the datastream from the repository system. I then turn around use the HttpServletResponse to stream the data down to the browser. I also set the content-type and header properties of this object accordingly if it is PDF or plain text.
bmcdougald@hotmail.com - 18 May 2006 21:45 GMT Content-Type is coming down as text/plain for the PDF, encoding is null. Don't know why, it's the same URL I'm calling from my servlet. For grins, maybe I should try using the getContentType from my servlet and see what it is getting.
Oliver Wong - 19 May 2006 22:01 GMT >> When I download the file via the Browser/Servlet the PDF is approx 72K. >> When I create the .pdf through the standalone, it is approx 116K and [quoted text clipped - 7 lines] >> binary file. When I type the 116k version, I see the actual text with >> no PDF header. How does the "actual text" compare to the semantic contents of the PDF file? Is it gibberish, or is it the text of the PDF, without formatting and images and all that extra stuff?
>> This portion of the website is not publically available at the moment. >> [quoted text clipped - 8 lines] > read off the content header of what it thinks it is sending you? try > getContentEncoding () and getContentType () and see what they say. I didn't fully follow the OP's original problem, but as part of the HTTP protocol, the browser specifies what types of contents it's capable of handling, and the webserver can customize it's output based on that.
So for example, a webserver might check if the browser claims it can support PNG, and if so, it sends its images as PNG files. If not, it could on-the-fly convert the PNG file to a JPG file before transmitting it (and then cache the JPG files to speed up further requests).
It's conceivable that the webserver might check if the browser reports PDF as one of its allowed types and if not, to actually parse the contents of the PDF file, and convert it to an ASCII file.
However, I read something about "running it stand alone" which didn't make sense to me in this context, so perhaps this isn't what's going on at all. =)
- Oliver
Matt Humphrey - 20 May 2006 00:10 GMT <prelude snipped />
> I didn't fully follow the OP's original problem, but as part of the > HTTP protocol, the browser specifies what types of contents it's capable > of handling, and the webserver can customize it's output based on that. It's a program that reads either text or PDF documents from an existing web server.
I had to look it up, but I think content negotiation is done by having the browser specify an "Accept" header which contains the desired types. It's a sophisticated scheme, but I think it can simply be set to "application/pdf" to get the right result.
> So for example, a webserver might check if the browser claims it can > support PNG, and if so, it sends its images as PNG files. If not, it could [quoted text clipped - 8 lines] > make sense to me in this context, so perhaps this isn't what's going on at > all. =) Actually, what you've described is what I think is happening, especially since the OP reported that the web server's response to the PDF request has content type "text/plain". I'm just not experienced with content negotiation details.
Cheers, Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
EJP - 18 May 2006 03:06 GMT I can't see anything wrong with your binary I/O code on a quick inspection, but what I am really wondering is why do you bother with having two I/O routines when you could just use the binary version for both? That way you can debug it with text as well as binary PDF.
iText doesn't have anything to do with this as you already have the PDF.
Andy Flowers - 20 May 2006 06:43 GMT > I have written a servlet that makes an HTTP connection to our report > repository system and returns data to the calling browser in either [quoted text clipped - 21 lines] > > Here is my function code: Have you considered using the Jakarta HttpClient classes, http://jakarta.apache.org/commons/httpclient/
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|