Java Forum / General / April 2006
concurrent 1.5
Homer - 03 Apr 2006 21:24 GMT Hi All,
I am trying to write a multi-thread code to read a directory and move all files to another directory (well and some other stuff but lets talk about the core part). What I am trying to do is:
1- Read the directory. 2- Create Threads for moving each file. 3- Thread will put the file name somewhere so other thread wont try to move the same file. 4- Limit number of the threads to 10 but don't kill/create threads when they are done (recycle).
Is concurrent package is good for doing something like that or I better write my own?
Any example, class, point to start?
Thanks in advance,
Homer
Remon van Vliet - 03 Apr 2006 21:50 GMT > Hi All, > [quoted text clipped - 17 lines] > > Homer I cant really think of a good reason to do this with more than 1 "move my files" thread. Why do you require 10? It certainly wont speed up the process. Anyway, if you insist on this simply have all your move threads grab filenames of a single blocking queue filled by your "main" thread (in other words, fill it with the results from reading the directory). This will ensure they wont start moving a file twice or skip files. The java.util.concurrent package provides some utility classes for thread concurrency but it's not directly related to the solution for your problem.
Remon van Vliet
Matt Humphrey - 04 Apr 2006 13:59 GMT >> Hi All, >> [quoted text clipped - 21 lines] > files" thread. Why do you require 10? It certainly wont speed up the > process. I'm curious on what you're basing this assertion. Even though the operation is disk-bound, it seems to me that giving the disk scheduler multiple requests may reduce latency and maintain saturation.
Cheers,
Matt Humphrey matth@ivizNOSPAM.com http://www.iviz.com/
Alex Hunsley - 04 Apr 2006 14:27 GMT >>> Hi All, >>> [quoted text clipped - 25 lines] > is disk-bound, it seems to me that giving the disk scheduler multiple > requests may reduce latency and maintain saturation. It may reduce latency, but you may end up with more seeks, and each seek resulting in a small amount of bytes being accessed. It all of course depends on what optimisations are happening unseen in your particular system... I see what you're getting at though. But even with that in mind, 10 threads wouldn't be helping much. 2 threads perhaps (if > 1 were beneficial). I personally think 1 thread is best though, unless you have particular knowledge about what is going on with disks etc...
Alex Hunsley - 04 Apr 2006 14:18 GMT (piggy-backing on Remon's reply, as I can't see the original post...)
>> Hi All, >> [quoted text clipped - 17 lines] >> >> Homer Don't do this. This is a classic example of thread overkill - there is no benefit to using multiple threads to move a load of files. In fact, as you have noticed, it only adds hassle - you have to worry about concurrency issues. What do you imagine that multiple threading will add to this task?
Also, moving multiple files simultaneously will probably be *slower* than if you just moved them one at a time from one thread. Why is this? Well, even just on a physical level, you will have the read/write head of the hard disk jumping about, reading and writing itty-bitty bits of files for each thread[1]. Whereas if you just had it doing one file at a time, it can handle decent sized chunks of file at a time - it would be quicker (less seek time) and would it would also entail less wear and tear on your hard disk seek mechanism. Also, the at a Java level, the JVM is keeping extra state and wasting time doing context switching between threads, so it's making everything less efficient that way.
Maybe you've thinking of doing this because you've heard that threads are often used to do I/O - this is correct. But that tends to be just one thread, doing some I/O on a disk, while other threads are free to do computation and generally orchestrate things. So, in essence, it might make sense for your program to have multiple threads, but I would limit it to one thread only that actually does the file moving. A good example of when your program would use multiple threads: suppose you want your file mover program to have a GUI that shows progress. In this case, you'd want to a separate thread that did the file moving, or else you'd clog up the GUI (event dispatch thread) and your app would become unresponsive.
One of the other classic misuses of threads is when people want to write a game, and they think: "oh, games involve independent entities doing their own thing in a world, so I should use a thread for each game entity". Threads aren't really applicable here though. I think the misguided person is thinking that because game entities have their own properties (sense of state) and they all do things simultaneously in the game, the magic word "threads" pops up. But it's usually a mistake to go down this route. In particular, a games engine usually wants each entity to have its own 'go' at moving and updating its state, all in strict lock-step, so that at the end of each game 'update' loop, every entity has had exactly one chance of updating itself. If you did try to use multiple threads for different game entities, various entities would possibly proceed through the world at different rates (depending on how lengthy their update code was!) To stop this you could use concurrency etc. to make everything work in lock-stop. But then why bother with threading each entity in the first place? For a regular game, this would just be wrong. For some things it might be more applicable - e.g. simulating lifeforms, and wanting the slowness of their 'thought processes' (i.e. update code length) to be reflected in the simulation. But this is not usually the case...
Oops, I'm going off on one. Enough.
[1] ... maybe. Some of it could be affected by optimisations that could be made anywhere from the JVM to the hard disk controller....
Alex Hunsley - 04 Apr 2006 14:32 GMT > (piggy-backing on Remon's reply, as I can't see the original post...) > [quoted text clipped - 25 lines] > concurrency issues. > What do you imagine that multiple threading will add to this task? Little postscrot here - I actually thought, wrongly, that the OP was talking about either copying files, or moving files to a different hard disk. In other words, something that would involve reading and writing lots of bytes for some files. But if it's a file move from one place to another on the same disk/partition, most system don't actually move data, they only change where the link to that data is in the file system. (Like link/unlink in the *nix world.) Anyway, still not convinced that > 1 thread would help much...
Homer - 04 Apr 2006 15:52 GMT Here is the story:
Source and destination of each file is different in my case (or it can be) and some are mapped drives (slow connection maybe). Some of those files are huge (200Mb) and some very small (<5Kb). Unfortunately delivering of small files is time sensitive. They come around 3:30PM and need to be transferred for another process in less than two minutes. Now imagine if I have one thread-program and have couple of those big ones inside the queue and one small one behind all of them.
That's why I am interested in Multi-Threading in this case.
Homer - 04 Apr 2006 15:59 GMT I should also add that while ago I used a similar strategy to write an FTP Gateway. I know that using this method for FTP makes far more sense than Hard Drive but I was wondering if I can use the same idea for this case too. The FTP Gateway code was quite successful though that I was not aware of concurrent package in that time.
Alex Hunsley - 04 Apr 2006 16:41 GMT > I should also add that while ago I used a similar strategy to write an > FTP Gateway. I know that using this method for FTP makes far more sense > than Hard Drive but I was wondering if I can use the same idea for this > case too. The FTP Gateway code was quite successful though that I was > not aware of concurrent package in that time. Yup, multithreaded strategy does make sense for an FTP/multiuser comms application! I wouldn't just throw lots of threads at your file copying problem - I would use two or three, in a structured way, as described in t'other message.
Alex Hunsley - 04 Apr 2006 16:44 GMT > I should also add that while ago I used a similar strategy to write an > FTP Gateway. I know that using this method for FTP makes far more sense Btw Homer, the concurrent package is pretty handy, yes. Note that J2SE5 includes a java.util.concurrent package, which I think is based on Duog Lea's package; if you're using J2SE5, just use java.util.concurrent. See the note at the top of this page: http://gee.cs.oswego.edu/dl/classes/EDU/oswego/cs/dl/util/concurrent/intro.html
Alex Hunsley - 04 Apr 2006 16:39 GMT > Here is the story: > [quoted text clipped - 7 lines] > > That's why I am interested in Multi-Threading in this case. Ah, ok, this makes more sense. So your requirement is that the system can be copying files happily, but then notice a hi-priority file appears, and then pause the other non-essential copying activity until the important stuff is done.
First idea that comes to mind: Have two threads. The system can be copying normal files across from a file queue in a 'main' thread. As soon as an important file appears, the main thread pauses its activity (in mid file-copy, if need be)[1], and a 'priority' thread starts copying the important files across. When the priority thread is finished, the 'main' thread can continue.
You could always have three threads - the extra one to be watching the source folder, and controlling what the two copying threads get up to. This would have the extra benefit of moving controlling logic out of a file copier thread, and then your two copying threads ('main' and 'priority') could be instances of the same object, which copies files in a file queue.
Of course, another option is just have one thread at any one time, and give it the ability to remember where it was in the non-priority file being copied. You could do this in a cheap and cheeky way by using recursion (i.e. file copier calls itself when it notices a more important file, and once done with that, it bottoms out and resumes the original non-priority file). I don't think this is a good design though; messy, with too many concerns all in one place.
[1] A file being paused in mid-copy may cause confusion - with partially copied files existing and maybe messing things up. You may want to have a policy of copying a file to a modified destination name, and only on completion of the copy rename the file to its regular name. For example, while a file 'data.txt' is being copied, it is copied to a destination filename called 'data.txt-PARTIAL'. Upon copy completion, the destination file is renamed by the program to 'data.txt'. This way it's clear to humans (and to scripts, if need be) which are non-complete files.
Homer - 05 Apr 2006 15:10 GMT Thanks Alex. Very Helpful suggestions.
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|