Java Forum / General / March 2006
Writing multiple files with one stream
davidjdoherty@gmail.com - 21 Mar 2006 16:52 GMT Hi, I have a bit of a performance problem. I have 5 live servers that are synchonised with the same files. When I upload I am uploading the same files to each of the 5 servers. Regularly, I need to upload a large number (about 50) of small files (3k) to all 5 servers. 3 of the 5 servers are very quick and it only takes a few seconds, but for the slow severs (based in Hong Kong and Auz - I'm in the UK) it can take 30 mins. This is because my program is opening up a new stream for each file, and this connection time is what is causing the slow transfer.
Is there anyway to only open one stream, but write multiple files across to it?
I thought about using the java ZIP API to zip the files, upload the file, and then unzip them. What is the performance like when you unzip a folder sitting in a remote directory?
Is there an easy way?
Cheers, Dave
Oliver Wong - 21 Mar 2006 17:12 GMT > Hi, > I have a bit of a performance problem. I have 5 live servers that are [quoted text clipped - 5 lines] > mins. This is because my program is opening up a new stream for each > file, and this connection time is what is causing the slow transfer. This doesn't make sense to me. You first claim that the slow transfers are due to the servers themselves being slow, then you later claim that the slow transfers are due to the opening up a new stream for each file. If the problem were actually the latter case, then all servers would be perceived to be equally slow. So which is it?
> Is there anyway to only open one stream, but write multiple files > across to it? You can write anything you want to a stream; the question is whether the server on the other side knows what to do with the data in that stream.
> I thought about using the java ZIP API to zip the files, upload the > file, and then unzip them. What is the performance like when you unzip > a folder sitting in a remote directory? The performance of a program depends (among other things) on the computer running that program. Typically, zipping a file, sending it, and unzipping it, is faster than sending the uncompressed file, but it depends on the compressability of the files involved.
> Is there an easy way? It's not clear what options are available to you. You haven't specified, for example, what control you have over the five servers.
- Oliver
Chris Uppal - 21 Mar 2006 17:35 GMT > I have a bit of a performance problem. I have 5 live servers that are > synchonised with the same files. When I upload I am uploading the same [quoted text clipped - 4 lines] > mins. This is because my program is opening up a new stream for each > file, and this connection time is what is causing the slow transfer. How are you uploading the files, and what sort of access do you have to the servers ? Can you execute a program on them ? If not then you are probably hosed...
-- chris
davidjdoherty@gmail.com - 21 Mar 2006 20:46 GMT I have access to the servers via 5 mapped network drives on windows. So when I upload the files I'm just opening up file output streams that refer to "S:\filename", etc...
I could write a server side program that allows a client to open a connection and send multiple files over that connection, but that would involve me convincing other people to allow me this sort of access, which could be arduous. I was hoping for a quicker (or lazier fix).
Oliver: I referred to two of the servers as slow because they are located fair away. The servers themselves are fast, but performing operations such as opening a folder through my mapped network drive seems slow due to the overhead of creating a connection to a location that is located very far away from me. (Two of the other servers are located onsite, and the third is just a couple miles away so operations are very noticeably quicker).
Oliver Wong - 21 Mar 2006 21:07 GMT >I have access to the servers via 5 mapped network drives on windows. So > when I upload the files I'm just opening up file output streams that [quoted text clipped - 12 lines] > located onsite, and the third is just a couple miles away so operations > are very noticeably quicker). I wish you had mentioned the mapped network drivers earlier. It makes a big difference, and is probably not what mostp people envisioned from your original description. Zipping is make things worse in your situation. Let's say a file is 5 MB when uncompressed, and 1 MB when compressed.
The "Don't zip it" solution involves you sending the 5MB file to the remote folder. And that's it. So total bandwidth used is 5MB.
The "Zip it" solution involves you sending the 1MB file to the remote folder. Then, you re-read that 1MB back locally. Then you write the uncompressed version, which is 5MB, back to the server. Total bandwidth used is 7MB.
The term "overhead" is usually used to mean "additional costs due to the strategy I'm using". In your case, these are not "additional" costs, but actually the MAIN cost of the task you're trying to perform; namely, sending files across a network, or getting a list of files in a remote folder.
- Oliver
davidjdoherty@gmail.com - 22 Mar 2006 10:50 GMT Yeah, I got the feeling that I wouldn't be able to zip it and have the remote machine unzip it for me. I was hoping that there might have been someway to do it. I guess I'll have to see if I can run a server on that remote machines.
Sorry if I was using the term overhead incorrectly. I was using the term in its general meaning and not its specific meaning in terms of data communication over networks. If you want to create a syntax to differentiate between different meanings of the same word I am happy to use it. Maybe something like: overhead[general], overhead[data communication over networks]. This way we can avoid pedantic discussions in the future. Sorry to be so rude, but if you understood me anyway, then what was the point of bringing it up?
Thanks for the help, Dave
Oliver Wong - 22 Mar 2006 16:42 GMT > Sorry to be so rude, but if you understood > me anyway, then what was the point of bringing it up? In the particular message you are referring to, I had enough context to "guess at" what you really meant. However, if I hadn't pointed out my confusion to you, you might have, in a future message, used words incorrectly again, but this time, did NOT provide sufficient context for people to guess at what you really meant, and thus end up confusing everybody. And when people are confused about what you're saying, it's more difficult for them to help you. From my point of view, I was doing you a favour, by making it easier for people to understand you, and thus making it easier for you to get the help you want.
A similar thing happened to me a while ago. I was talking about encryption channels, and I referred to the stuff through which the communication happened as "mediums". Someone then pointed out to me that "mediums" is the plural for the term which refers to a person who communicates with the dead, and that I probably meant "media". Yeah, it was a bit embarassing for me, but he was right: I had used the wrong word. Because of the context (encryption channels), it was clear what I meant, but in another context, it might not have been so clear. So I appreciated him correcting me.
I'm not saying you HAVE to appreciate what I did, or that you "owe" me a favor in return or anything. I'm just trying to explain how I saw the situation, and that I didn't mean any harm.
- Oliver
Chris Uppal - 22 Mar 2006 11:46 GMT > I have access to the servers via 5 mapped network drives on windows. So > when I upload the files I'm just opening up file output streams that > refer to "S:\filename", etc... Hmm. But how is the connection made, is the remote "drive" a WebDav drive, or an SMB/CIFS connection (a normal Windows/SAMBA shared drive), or an FTP connection, or what ? And is the connection layered over something like SSL ? I ask because a setup+transfer time of over half a minute per 3K file doesn't make any sense at all. And it makes even less sense if the connection is layered over SSL, since the connection then should be established only once (at the TCP level) and the other stuff all mutliplexed over that. It sounds as if you might have an undiagnosed problem with your network, in which case the right thing is to fix that.
> I could write a server side program that allows a client to open a > connection and send multiple files over that connection, but that would > involve me convincing other people to allow me this sort of access, > which could be arduous. I was hoping for a quicker (or lazier fix). You might find it easier to persuade the powers-that-be if their alternative is for /them/ to do some work to fix the network problem ;-)
It might be worth checking that you are not attempting to transport all 50 files simultaneously (which could be overloading something). Other than that (and assuming you are stuck with the network as is) there is no way that you can fix this unilaterally.
-- chris
Chris Uppal - 22 Mar 2006 19:26 GMT I wrote:
> And it makes even less > sense if the connection is layered over SSL, since the connection then > should be established only once (at the TCP level) In the interest of precision (and since we seem to have got into a discussion of clear language[*]) I should correct myself. What I meant was "if the connection is /tunnelled/ over SSL [...etc]" Layering would imply a new SSL connection for each real connection; tunnelling (as in a VPN) would not.
-- chris
([*] which always interests me)
Rhino - 21 Mar 2006 19:08 GMT > Hi, > I have a bit of a performance problem. Oh dear! Hmm, maybe some Viagra?
Sorry, couldn't help myself :-)
-- Rhino
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|