Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / March 2007

Tip: Looking for answers? Try searching our database.

Filter unique elements in an array algorithm

Thread view: 
ck - 07 Mar 2007 20:04 GMT
Hi,
I would like to get opinion from the group members about efficiency of
the program below.

I need to filter an input array of int to obtain an output array which
contains distinct elements of the input array.
That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
be {1,2,43,4,5,20}
The condition being I cannot use Collection framework, just use array
to get the most efficient algorithm.
Could you guys please tell me if the following algorithm would be
considered efficient?

public class ArrayUniqueElements {

 public static int [] filter (int [] input) {
   int [] output = new int[input.length];
   output[0]=input[0];
   boolean flag=true;
   int pos=0;
   // iteration for each element in input
   for (int i=1;i<input.length;i++){
     for(int j=0;j<=i;j++){
       if(input[i]==output[j]){
           flag=false;
           break; // exit out of inner loop on first duplicate element
           }
       }
     if(flag){
        output[++pos]=input[i];
     } else
        flag=true;
     }
   int [] tempOutput = new int[pos];
   // This for loop is required to filter out default values from the
output
   for (int i = 0; i < pos; i++) {
        tempOutput[i]=output[i];
    }
   return tempOutput;
 }
}

--
Ck
http://www.gfour.net
Patricia Shanahan - 07 Mar 2007 20:28 GMT
> Hi,
> I would like to get opinion from the group members about efficiency of
[quoted text clipped - 8 lines]
> Could you guys please tell me if the following algorithm would be
> considered efficient?

"guys"?

For long arrays, the Omega(N*N) performance is not very good. A hashing
solution would be O(N).

Are you allowed to use System.arraycopy instead of the final loop?

Patricia
John W. Kennedy - 08 Mar 2007 04:05 GMT
>> Could you guys please tell me if the following algorithm would be
>> considered efficient?

> "guys"?

Hey, I've heard women address groups made up entirely of other women as
"guys".

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"

David Segall - 09 Mar 2007 02:27 GMT
>> Could you guys please tell me if the following algorithm would be
>> considered efficient?
>
>"guys"?
Yes guys! Here is more information than you wish to know about its
usage <http://www.research.umbc.edu/~korenman/wmst/guys.html>. In
Australia it is also used to include women, at least by guys under
forty.
Kai Schwebke - 07 Mar 2007 20:28 GMT
Hello,

ck schrieb:
> Hi,
> I would like to get opinion from the group members about efficiency of
> the program below.

The program is not efficient - it's complexity is O(n^2). The task could
be solved with O(n*log n) using some sorting algorithm or even faster
with hashing techniques.

> The condition being I cannot use Collection framework, just use array
> to get the most efficient algorithm.

Don't exclude Collection framework just for performance reasons.
A HashSet or TreeSet will solve the task much more efficient than
your code.

Kai
Chris Uppal - 07 Mar 2007 21:26 GMT
> I need to filter an input array of int to obtain an output array which
> contains distinct elements of the input array.
> That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
> be {1,2,43,4,5,20}
> The condition being I cannot use Collection framework, just use array
> to get the most efficient algorithm.

If you are allowed to assume that the range of values to be checked is fairly
small, then you can get a linear algorithm by using a boolean[] array.  How big
"fairly small" is depends on your instructor.

   -- chris
Wojtek - 07 Mar 2007 21:30 GMT
ck wrote :
> Hi,
> I would like to get opinion from the group members about efficiency of
[quoted text clipped - 4 lines]
> That is if my input array is {1,2,43,5,2,4,5,20,2,1} the output should
> be {1,2,43,4,5,20}

Are the values in the int[] always positive?

If so, you could create a boolean array with a size of the maximum
positive value of an int (all elements will be false by default), then
directly indice the boolean array with the values in the int array
setting the value to true. Next traverse the boolean array and count
the number of true elements (to get the size of the final array). The
final step would be to traverse the boolean array again and store the
indice in your int array.

This requires n*3 passes.

Signature

Wojtek :-)

ck - 07 Mar 2007 22:51 GMT
> On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote:
> Are the values in the int[] always positive?

Thank you everyone for sharing opinion about the problem. I was asked
to solve this in an interview with time limit of 5 minutes. Had never
thought about some problem like this in real life.
The length of the list could be anything from units to thousands, and
I was not told that the numbers are going to be positive only

> If so, you could create a boolean array with a size of the maximum
> positive value of an int (all elements will be false by default), then
[quoted text clipped - 5 lines]
>
> This requires n*3 passes.

I actually did not understand this part, I am not sure what is indice.

@ Patricia - Yes I think arraycopy would be a better idea. I am not
sure what is "Hashing solution" that you have mentioned.

@ Kai - Not using Collection was a constraint provided by the person
who was taking the interview.

I will look for more information on different approaches that you all
have suggested.

Thanks

--
Ck
http://www.gfour.net
Patricia Shanahan - 07 Mar 2007 23:07 GMT
>> On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote:
>> Are the values in the int[] always positive?
[quoted text clipped - 16 lines]
>
> I actually did not understand this part, I am not sure what is indice.

I would have written "index". Incidentally, this approach is a good idea
if the integers are known to all fall in any narrow range, regardless of
whether the limits are positive or negative. Just add an offset to each
integer before using it as an index.

It may not work so well if the maximum possible range of the integers is
large, even if they are all positive.

> @ Patricia - Yes I think arraycopy would be a better idea. I am not
> sure what is "Hashing solution" that you have mentioned.

Do the equivalent of using HashSet, but write it yourself to avoid
java.util. In general, I think it is worth understanding the main
algorithms used in java.util. It contributes to understanding their use,
and you may sometimes need to code without a nice collections package.

> @ Kai - Not using Collection was a constraint provided by the person
> who was taking the interview.
>
> I will look for more information on different approaches that you all
> have suggested.

Rather than focusing on our answers to this particular question, which
you will probably never meet again, it might be better to spend the time
on general study of algorithms and data structures. That would equip you
to answer arbitrary questions of this general type.

Patricia
Wojtek - 08 Mar 2007 14:46 GMT
Patricia Shanahan wrote :
>> I actually did not understand this part, I am not sure what is indice.
>
> I would have written "index".

Indice is the index value. So I could have said "store the index
value".

Signature

Test Sig

Daniel Pitts - 08 Mar 2007 05:13 GMT
> > On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote:
> > Are the values in the int[] always positive?
[quoted text clipped - 30 lines]
> --
> Ckhttp://www.gfour.net

My solution (in under 5 minuets):
// Operational complexity: O(N log N)
public int[] uniques(int[] array) {
   if (array == null || array.length < 2) {
      return array;
   }
   Arrays.sort(array);
   int top = 1;
   int last = array[0];
   for (int i : array) {
      if (i != last) {
         array[top++] = last;
      }
      last = i;
   }
   final int[] result = new int[top];
   System.arraycopy(array, 0, result, 0, top);
   return result;
}
Daniel Pitts - 08 Mar 2007 05:43 GMT
> > > On Mar 8, 2:30 am, Wojtek <nowh...@a.com> wrote:
> > > Are the values in the int[] always positive?
[quoted text clipped - 51 lines]
>
> }

Oops, there's a bug in that code. Should have been:

 public static int[] uniques(int[] array) {
   if (array == null || array.length < 2) {
     return array;
   }
   Arrays.sort(array);
   int top = 1;
   int last = array[0];
   for (int i : array) {
     if (i != last) {
       // array[top++] = last; // <-- BUG
       array[top++] = i;
     }
     last = i;
   }
   final int[] result = new int[top];
   System.arraycopy(array, 0, result, 0, top);
   return result;
 }
ck - 08 Mar 2007 09:07 GMT
On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com>
wrote:
> Oops, there's a bug in that code. Should have been:
>
[quoted text clipped - 16 lines]
>     return result;
>   }

Thanks Daniel,
I think there is something wrong with Arrays.sort() method. Let me
know if I am wrong instead
Consider the code below
--- Code ---
public void someMethod(){
int [] input = {1,4,-2,3,4,-1};
System.out.println("just input");
printArray(input);
System.out.println("Clone input before sorting");
printArray(inputClone);
Arrays.sort(inputClone);
System.out.println("Clone input after sorting");
printArray(inputClone);
}
public static void printArray(int [] input){
 System.out.print("[");
 for (int i=0;i<input.length;i++)
 System.out.print(input[i]+ ", ");
 System.out.println("]");
}
--- Code End ---

I get output as below
-- Output --
just input
[1, 4, -2, 3, 4, -1, ]
Clone input before sorting
[1, 4, -2, 3, 4, -1, ]
Clone input after sorting
[-2, -1, 1, 3, 4, 4, ]  <-- bug??
-- output End --
"<-- bug??" is of course not the part of output.
Hence I wrote filter2 method as follow which actually fails to give
the right output. I think that is due to this probable bug?
--- Code ---
public static int [] filter2(int [] input){
int [] output= new int[input.length];
int [] inputClone = input.clone();
int pos =0;
output[0]=inputClone[0];
for (int i=1;i<inputClone.length;i++){
  if (inputClone[i]!=inputClone[i-1]){
    output[++pos]=inputClone[i];
   }
 }
 int [] tempout=new int[pos];
 System.arraycopy(output, 0, tempout, 0, pos);
 return tempout;
}
--- Code End ---

@ John and @ Particia
Any pointers (like recommended online articles or book to brush up
these things quickly?) Or at least about "hashing"? I am looking at
JDK source code, but haven't been able to make much sense out of the
algorithms that they have used. I probably need to practice more, and
clear up some fundamentals.

Thanks
--
Ck
http://www.gfour.net
ck - 08 Mar 2007 09:17 GMT
> On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com>
> wrote:
[quoted text clipped - 3 lines]
> >     if (array == null || array.length < 2) {
> >       return array;
[Clipped ]
> I think there is something wrong with Arrays.sort() method. Let me

I think I goofed up by thinking that Arrays.sort() is not giving right
output. Sorry about that. I would need to review filter2() method that
I have written.

--
Ck
http://www.gfour.net
ck - 08 Mar 2007 09:34 GMT
> On Mar 8, 10:43 am, "Daniel Pitts" <googlegrou...@coloraura.com>
> wrote:
[quoted text clipped - 82 lines]
> --
> Ckhttp://www.gfour.net
** correction **
Filter2() would be as given below
    public static int [] filter2(int [] input){
        int [] output= new int[input.length];
        int [] inputClone = input.clone();
        Arrays.sort(inputClone);
        int pos =0;
        output[0]=inputClone[0];
        for (int i=1;i<inputClone.length;i++){
            if (inputClone[i]!=inputClone[i-1]){
                output[++pos]=inputClone[i];
            }
        }
        ++pos;
        int [] tempout=new int[pos];
        System.arraycopy(output, 0, tempout, 0, pos);
        return tempout;
    }

--
Ck
http://www.gfour.net
Wojtek - 08 Mar 2007 14:44 GMT
Wojtek wrote :
> ck wrote :
>> Hi,
[quoted text clipped - 7 lines]
>
> Are the values in the int[] always positive?

Thought about this some more....

Use two arrays, one for positive integers, the other for negative
integers. This would allow for the full range of possible integer
values.

And you could do a first pass to grab the highest/lowest numbers in the
source array, to limit the size of the arrays you are creating. The
extra time to do this may be saved if the values are small (so the
compiler does not need to initialize the entire integer range).

Signature

Test Sig

Patricia Shanahan - 08 Mar 2007 14:50 GMT
> Wojtek wrote :
>> ck wrote :
[quoted text clipped - 18 lines]
> extra time to do this may be saved if the values are small (so the
> compiler does not need to initialize the entire integer range).

Remember this is supposed to be efficient. Even on machines with enough
memory for two maximum sized arrays, multiple sequential scans of 4GB is
going to give the cache replacement algorithms quite a lot of work.

And even a two element input array could contain both extremes.

If the arrays were the only O(N) solution going, I might consider it. In
practice, for this problem, I would use the OP's original solution for
small input array, and a hash set for large.

Patricia
John W. Kennedy - 08 Mar 2007 03:24 GMT
> Hi,
> I would like to get opinion from the group members about efficiency of
[quoted text clipped - 8 lines]
> Could you guys please tell me if the following algorithm would be
> considered efficient?

No. It will take about .5*n^2 trips through the inner loop, and it's
always bad to have a "squared" or "cubed" there.

Patricia has mentioned hashing, which is a good all-round solution. If
your array is known to have reasonably small values, though, an even
better way is to make an array of boolean, one for each possible value.
If it is known to have medium-sized values, it is better yet to make an
array of ints with one int for every 32 possible values, each value
getting a single bit. (The Bitset class from the collections framework
does this automatically.) If the values range up into the tens of
millions, though, this usually isn't such a good idea.

Signature

John W. Kennedy
"The blind rulers of Logres
Nourished the land on a fallacy of rational virtue."
  -- Charles Williams.  "Taliessin through Logres: Prelude"



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.