Java Forum / General / March 2007
Filter unique elements in an array algorithm
ck - 07 Mar 2007 20:04 GMT Hi, I would like to get opinion from the group members about efficiency of the program below.
I need to filter an input array of int to obtain an output array which contains distinct elements of the input array. That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should be {1,2,43,4,5,20} The condition being I cannot use Collection framework, just use array to get the most efficient algorithm. Could you guys please tell me if the following algorithm would be considered efficient?
public class ArrayUniqueElements {
public static int [] filter (int [] input) { int [] output = new int[input.length]; output[0]=input[0]; boolean flag=true; int pos=0; // iteration for each element in input for (int i=1;i<input.length;i++){ for(int j=0;j<=i;j++){ if(input[i]==output[j]){ flag=false; break; // exit out of inner loop on first duplicate element } } if(flag){ output[++pos]=input[i]; } else flag=true; } int [] tempOutput = new int[pos]; // This for loop is required to filter out default values from the output for (int i = 0; i < pos; i++) { tempOutput[i]=output[i]; } return tempOutput; } }
-- Ck http://www.gfour.net
Patricia Shanahan - 07 Mar 2007 20:28 GMT > Hi, > I would like to get opinion from the group members about efficiency of [quoted text clipped - 8 lines] > Could you guys please tell me if the following algorithm would be > considered efficient? "guys"?
For long arrays, the Omega(N*N) performance is not very good. A hashing solution would be O(N).
Are you allowed to use System.arraycopy instead of the final loop?
Patricia
John W. Kennedy - 08 Mar 2007 04:05 GMT >> Could you guys please tell me if the following algorithm would be >> considered efficient?
> "guys"? Hey, I've heard women address groups made up entirely of other women as "guys".
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
David Segall - 09 Mar 2007 02:27 GMT >> Could you guys please tell me if the following algorithm would be >> considered efficient? > >"guys"? Yes guys! Here is more information than you wish to know about its usage <http://www.research.umbc.edu/~korenman/wmst/guys.html>. In Australia it is also used to include women, at least by guys under forty.
Kai Schwebke - 07 Mar 2007 20:28 GMT Hello,
ck schrieb:
> Hi, > I would like to get opinion from the group members about efficiency of > the program below. The program is not efficient - it's complexity is O(n^2). The task could be solved with O(n*log n) using some sorting algorithm or even faster with hashing techniques.
> The condition being I cannot use Collection framework, just use array > to get the most efficient algorithm. Don't exclude Collection framework just for performance reasons. A HashSet or TreeSet will solve the task much more efficient than your code.
Kai
Chris Uppal - 07 Mar 2007 21:26 GMT > I need to filter an input array of int to obtain an output array which > contains distinct elements of the input array. > That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should > be {1,2,43,4,5,20} > The condition being I cannot use Collection framework, just use array > to get the most efficient algorithm. If you are allowed to assume that the range of values to be checked is fairly small, then you can get a linear algorithm by using a boolean[] array. How big "fairly small" is depends on your instructor.
-- chris
Wojtek - 07 Mar 2007 21:30 GMT ck wrote :
> Hi, > I would like to get opinion from the group members about efficiency of [quoted text clipped - 4 lines] > That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should > be {1,2,43,4,5,20} Are the values in the int[] always positive?
If so, you could create a boolean array with a size of the maximum positive value of an int (all elements will be false by default), then directly indice the boolean array with the values in the int array setting the value to true. Next traverse the boolean array and count the number of true elements (to get the size of the final array). The final step would be to traverse the boolean array again and store the indice in your int array.
This requires n*3 passes.
 Signature Wojtek :-)
ck - 07 Mar 2007 22:51 GMT > On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote: > Are the values in the int[] always positive? Thank you everyone for sharing opinion about the problem. I was asked to solve this in an interview with time limit of 5 minutes. Had never thought about some problem like this in real life. The length of the list could be anything from units to thousands, and I was not told that the numbers are going to be positive only
> If so, you could create a boolean array with a size of the maximum > positive value of an int (all elements will be false by default), then [quoted text clipped - 5 lines] > > This requires n*3 passes. I actually did not understand this part, I am not sure what is indice.
@ Patricia - Yes I think arraycopy would be a better idea. I am not sure what is "Hashing solution" that you have mentioned.
@ Kai - Not using Collection was a constraint provided by the person who was taking the interview.
I will look for more information on different approaches that you all have suggested.
Thanks
-- Ck http://www.gfour.net
Patricia Shanahan - 07 Mar 2007 23:07 GMT >> On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote: >> Are the values in the int[] always positive? [quoted text clipped - 16 lines] > > I actually did not understand this part, I am not sure what is indice. I would have written "index". Incidentally, this approach is a good idea if the integers are known to all fall in any narrow range, regardless of whether the limits are positive or negative. Just add an offset to each integer before using it as an index.
It may not work so well if the maximum possible range of the integers is large, even if they are all positive.
> @ Patricia - Yes I think arraycopy would be a better idea. I am not > sure what is "Hashing solution" that you have mentioned. Do the equivalent of using HashSet, but write it yourself to avoid java.util. In general, I think it is worth understanding the main algorithms used in java.util. It contributes to understanding their use, and you may sometimes need to code without a nice collections package.
> @ Kai - Not using Collection was a constraint provided by the person > who was taking the interview. > > I will look for more information on different approaches that you all > have suggested. Rather than focusing on our answers to this particular question, which you will probably never meet again, it might be better to spend the time on general study of algorithms and data structures. That would equip you to answer arbitrary questions of this general type.
Patricia
Wojtek - 08 Mar 2007 14:46 GMT Patricia Shanahan wrote :
>> I actually did not understand this part, I am not sure what is indice. > > I would have written "index". Indice is the index value. So I could have said "store the index value".
 Signature Test Sig
Daniel Pitts - 08 Mar 2007 05:13 GMT > > On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote: > > Are the values in the int[] always positive? [quoted text clipped - 30 lines] > -- > Ckhttp://www.gfour.net My solution (in under 5 minuets): // Operational complexity: O(N log N) public int[] uniques(int[] array) { if (array == null || array.length < 2) { return array; } Arrays.sort(array); int top = 1; int last = array[0]; for (int i : array) { if (i != last) { array[top++] = last; } last = i; } final int[] result = new int[top]; System.arraycopy(array, 0, result, 0, top); return result; }
Daniel Pitts - 08 Mar 2007 05:43 GMT > > > On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote: > > > Are the values in the int[] always positive? [quoted text clipped - 51 lines] > > } Oops, there's a bug in that code. Should have been:
public static int[] uniques(int[] array) { if (array == null || array.length < 2) { return array; } Arrays.sort(array); int top = 1; int last = array[0]; for (int i : array) { if (i != last) { // array[top++] = last; // <-- BUG array[top++] = i; } last = i; } final int[] result = new int[top]; System.arraycopy(array, 0, result, 0, top); return result; }
ck - 08 Mar 2007 09:07 GMT On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com> wrote:
> Oops, there's a bug in that code. Should have been: > [quoted text clipped - 16 lines] > return result; > } Thanks Daniel, I think there is something wrong with Arrays.sort() method. Let me know if I am wrong instead Consider the code below --- Code --- public void someMethod(){ int [] input = {1,4,-2,3,4,-1}; System.out.println("just input"); printArray(input); System.out.println("Clone input before sorting"); printArray(inputClone); Arrays.sort(inputClone); System.out.println("Clone input after sorting"); printArray(inputClone); } public static void printArray(int [] input){ System.out.print("["); for (int i=0;i<input.length;i++) System.out.print(input[i]+ ", "); System.out.println("]"); } --- Code End ---
I get output as below -- Output -- just input [1, 4, -2, 3, 4, -1, ] Clone input before sorting [1, 4, -2, 3, 4, -1, ] Clone input after sorting [-2, -1, 1, 3, 4, 4, ] <-- bug?? -- output End -- "<-- bug??" is of course not the part of output. Hence I wrote filter2 method as follow which actually fails to give the right output. I think that is due to this probable bug? --- Code --- public static int [] filter2(int [] input){ int [] output= new int[input.length]; int [] inputClone = input.clone(); int pos =0; output[0]=inputClone[0]; for (int i=1;i<inputClone.length;i++){ if (inputClone[i]!=inputClone[i-1]){ output[++pos]=inputClone[i]; } } int [] tempout=new int[pos]; System.arraycopy(output, 0, tempout, 0, pos); return tempout; } --- Code End ---
@ John and @ Particia Any pointers (like recommended online articles or book to brush up these things quickly?) Or at least about "hashing"? I am looking at JDK source code, but haven't been able to make much sense out of the algorithms that they have used. I probably need to practice more, and clear up some fundamentals.
Thanks -- Ck http://www.gfour.net
ck - 08 Mar 2007 09:17 GMT > On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com> > wrote: [quoted text clipped - 3 lines] > > if (array == null || array.length < 2) { > > return array; [Clipped ]
> I think there is something wrong with Arrays.sort() method. Let me I think I goofed up by thinking that Arrays.sort() is not giving right output. Sorry about that. I would need to review filter2() method that I have written.
-- Ck http://www.gfour.net
ck - 08 Mar 2007 09:34 GMT > On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com> > wrote: [quoted text clipped - 82 lines] > -- > Ckhttp://www.gfour.net ** correction ** Filter2() would be as given below public static int [] filter2(int [] input){ int [] output= new int[input.length]; int [] inputClone = input.clone(); Arrays.sort(inputClone); int pos =0; output[0]=inputClone[0]; for (int i=1;i<inputClone.length;i++){ if (inputClone[i]!=inputClone[i-1]){ output[++pos]=inputClone[i]; } } ++pos; int [] tempout=new int[pos]; System.arraycopy(output, 0, tempout, 0, pos); return tempout; }
-- Ck http://www.gfour.net
Wojtek - 08 Mar 2007 14:44 GMT Wojtek wrote :
> ck wrote : >> Hi, [quoted text clipped - 7 lines] > > Are the values in the int[] always positive? Thought about this some more....
Use two arrays, one for positive integers, the other for negative integers. This would allow for the full range of possible integer values.
And you could do a first pass to grab the highest/lowest numbers in the source array, to limit the size of the arrays you are creating. The extra time to do this may be saved if the values are small (so the compiler does not need to initialize the entire integer range).
 Signature Test Sig
Patricia Shanahan - 08 Mar 2007 14:50 GMT > Wojtek wrote : >> ck wrote : [quoted text clipped - 18 lines] > extra time to do this may be saved if the values are small (so the > compiler does not need to initialize the entire integer range). Remember this is supposed to be efficient. Even on machines with enough memory for two maximum sized arrays, multiple sequential scans of 4GB is going to give the cache replacement algorithms quite a lot of work.
And even a two element input array could contain both extremes.
If the arrays were the only O(N) solution going, I might consider it. In practice, for this problem, I would use the OP's original solution for small input array, and a hash set for large.
Patricia
John W. Kennedy - 08 Mar 2007 03:24 GMT > Hi, > I would like to get opinion from the group members about efficiency of [quoted text clipped - 8 lines] > Could you guys please tell me if the following algorithm would be > considered efficient? No. It will take about .5*n^2 trips through the inner loop, and it's always bad to have a "squared" or "cubed" there.
Patricia has mentioned hashing, which is a good all-round solution. If your array is known to have reasonably small values, though, an even better way is to make an array of boolean, one for each possible value. If it is known to have medium-sized values, it is better yet to make an array of ints with one int for every 32 possible values, each value getting a single bit. (The Bitset class from the collections framework does this automatically.) If the values range up into the tens of millions, though, this usually isn't such a good idea.
 Signature John W. Kennedy "The blind rulers of Logres Nourished the land on a fallacy of rational virtue." -- Charles Williams. "Taliessin through Logres: Prelude"
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|