Hi,
I am really stuck with my program that will compare two strings and
assign scores between pairs of letters.
my sequences are read in ok
seq1 = ATCGTCGTA
seq2 = TCGTACTAA
a second file formatted as so...
A A 1
T T 1
C C 1
G G 1
is now in a 2D char array
I just cannot think how on earth to get my values out of my array, i
know if I get past this milestone I will be fine (I do not have much
java experience or knowledge at my disposal - in fact it all i can do
to understand the code I have wrote)
thanks,
Anna
to creat a dot plot import java.io.*;
import java.util.*;
class BlackBox3
{
public static void main(String[] args)
{
int count = 0;
String allChars = "";
int loopcount;
int i1 = 0;
int i2 = 0;
int array1 = 0;
int array2 = 0;
String t, c1 = "", c2 = "", c3 = "";
StringTokenizer st;
boolean flag = false;
char tempChar;
String s;
char anna = 'D';
char[][] triplet;
try
{
InputStreamReader input = new InputStreamReader(System.in);
BufferedReader keyboardInput = new BufferedReader(input);
String File1;
System.out.print("Enter name of file 1: ");
File1 = keyboardInput.readLine();
FileReader file=new FileReader (File1);
BufferedReader buffer=new BufferedReader(file);
String seq1;
seq1 = buffer.readLine();
System.out.println("Sequence 1: " + seq1);
String seq2;
seq2 = buffer.readLine();
System.out.println("Sequence 2: " + seq2);
buffer.close();
String File2;
System.out.print("Enter name of file 2: ");
File2 = keyboardInput.readLine();
FileReader file2=new FileReader (File2);
BufferedReader buffer2=new BufferedReader(file2);
t = buffer2.readLine();
while(t != null)
{
st = new StringTokenizer(t);
for(int stCount = 0; stCount <2; stCount++)
{
s=st.nextToken();
if (stCount == 0)
{
c1 = c1 + s;
}
else
{
c2 = c2 + s;
}
if (count == 0)
{
count++;
allChars = s;
}
else
{
flag = false;
tempChar = s.charAt(0);
for (loopcount = 0; loopcount < allChars.length(); loopcount++)
{
if(tempChar == allChars.charAt(loopcount))
{
flag = true;
}
}
if (flag == false)
{
allChars = allChars + s;
}
}
}
s = st.nextToken();
c3 = c3 + s;
t = buffer2.readLine();
}
System.out.println(c1 + c2 + c3);
System.out.println("allChars :" + allChars);
buffer2.close();
triplet = new char[allChars.length()][allChars.length()];
for(int column1 = 0; column1 < allChars.length(); column1++)
{
for(int column2 = 0; column2 < allChars.length(); column2++)
{
triplet[column1][column2]='0';
}
}
for(int column3 = 0; column3 < allChars.length(); column3++)
{
triplet[column3][column3]=c3.charAt(column3);
}
/////////////////////////////////////////////////////////////////////////////////////
//I am trying to say, when seq1charAt = O i dont know im so stuck
//
// for(int loop0 = 0; loop0 < seq1.length(); loop0++)
// {
// for(int loop1 = 0; loop1 < seq1.length(); loop1++)
// {
// for(int loop2 = 0; loop2 < allChars.length(); loop2++)
// {
// if(seq1.charAt(loop1) == c1.charAt(loop2))
// {
// i1 = loop2;
// }else{}
// }
// }
// for(int loop3 = 0; loop3 < seq1.length(); loop3++)
// {
// for(int loop4 = 0; loop4 < allChars.length(); loop4++)
// {
// if(seq2.charAt(loop3) == c2.charAt(loop4))
// {
// i2 = loop4;
// }else{}
// }
// }
// System.out.print(triplet[i1][i2]);
// }
/////////////////////////////////////////////////////////////////////////////////////
}
catch( IOException e ) {System.out.println(e);}
}
}
Oliver Wong - 24 Jan 2006 19:51 GMT
> Hi,
>
[quoted text clipped - 19 lines]
> java experience or knowledge at my disposal - in fact it all i can do
> to understand the code I have wrote)
[long code snipped]
Might help if you specify how scores are calculated between pairs of
letters. Not all of us are experts in biology. Based on your "second file",
it would see like every letter gets a score of 1. So would the score then be
equal to the length of the sequence?
Or are you looking for the longest common subsequence between two DNA
strands, and scoring is just one of the metrics you're using for some sort
of heuristic based search, or what?
- Oliver
Fred Kleinschmidt - 24 Jan 2006 20:23 GMT
> Hi,
>
[quoted text clipped - 25 lines]
>
><snip>
You don't say what you mean by "compare two strings", or what "assing a
score" means.
Do you mean that if seq1[i] is A and seq2[i] is G, then you assign score[i]
to be equal to the score (taken from the second file) assigned to the
sequence "AG" ?
Her's a simplistic way:
If you only have 4 different letters (A,T,C,G) then think about assigning 1,
2, 3, 4 to the different letters, and place in an int array, so seq1 becomes
iseq1={1,2,3,4,2,3,4,2,1}, etc. Then add iseq1[i] + 10*iseq2[1] and
compare to a score table created from the letter pairs:
AA becomes 11, so score[11] is set to 1
TT becomes 22, so score[22] is set to 1
AT becomes 21, so score[21] is set to n, where n is from the second file's
line "A T n"
Refine this using powers of 2 if there are more than 10 different letters.

Signature
Fred L. Kleinschmidt
Boeing Associate Technical Fellow
Technical Architect, Software Reuse Project
annascalise@gmail.com - 24 Jan 2006 21:46 GMT
I should have said that I need to out put a dot plot of the fashion...
1 0 0 0
0 1 0 0
0 0 1 0
0 0 0 1
this would be the dot plot resulting from the comparison of ATCG with
ATCG using the scoring...
A A 1
T T 1
C C 1
G G 1
where 0, 1 or any value specified in file 2 indicates how 'good' a
match the sequence at position [i] is
jtowey7@gmail.com - 24 Jan 2006 23:21 GMT
With help of a friend I figured it out heres the code...
JB
/////////////////////////////////////////////////////////////////////////////////////
for(int loop0 = 0; loop0 < seq1.length(); loop0++)
{
for(int loop1 = 0; loop1 < seq1.length(); loop1++)
{
i1 = allChars.indexOf(seq1.charAt(loop0));
i2 = allChars.indexOf(seq1.charAt(loop1));
currentDotValue = triplet[i1][i2];
System.out.print(currentDotValue);
}
System.out.println();
}
/////////////////////////////////////////////////////////////////////////////////////
Oliver Wong - 24 Jan 2006 23:42 GMT
> With help of a friend I figured it out heres the code...
>
[quoted text clipped - 13 lines]
> }
> /////////////////////////////////////////////////////////////////////////////////////
Ah... now it's clear what you wanted.
I think your code (erroneously?) assumes that the first sequence and the
second sequence are identical in length and content.
- Oliver
Dimitri Maziuk - 25 Jan 2006 17:16 GMT
annascalise@gmail.com sez:
> Hi,
>
[quoted text clipped - 19 lines]
> java experience or knowledge at my disposal - in fact it all i can do
> to understand the code I have wrote)
Heh. I'd suggest you read about FASTA, BLAST, and NLP algorithms
first.
Dima

Signature
We're sysadmins. Sanity happens to other people. -- Chris King
Oliver Wong - 25 Jan 2006 18:59 GMT
> annascalise@gmail.com sez:
>> Hi,
[quoted text clipped - 23 lines]
> Heh. I'd suggest you read about FASTA, BLAST, and NLP algorithms
> first.
Not sure that that would be the order I would recommend, given that the
OP seems to be having problems grasping nested for-loops.
- Oliver
Dimitri Maziuk - 26 Jan 2006 17:13 GMT
Oliver Wong sez:
...
>> Heh. I'd suggest you read about FASTA, BLAST, and NLP algorithms
>> first.
>
> Not sure that that would be the order I would recommend, given that the
> OP seems to be having problems grasping nested for-loops.
One can hope she'd give up somewhere in the middle of BLAST and
consider a career in lawnmoving instead.
Dima

Signature
Q276304 - Error Message: Your Password Must Be at Least 18770 Characters
and Cannot Repeat Any of Your Previous 30689 Passwords -- RISKS 21.37
Luc The Perverse - 26 Jan 2006 23:58 GMT
> Oliver Wong sez:
>>
[quoted text clipped - 10 lines]
>
> Dima
That is either a very sexist or very negative opinion.
People just starting out in programming need encouragement - and things like
loops can seem exceptionally abstract to someone who has never thought along
those lines
--
LTP
:)
Dimitri Maziuk - 27 Jan 2006 17:12 GMT
Luc The Perverse sez:
>> Oliver Wong sez:
>>>
[quoted text clipped - 12 lines]
>
> That is either a very sexist or very negative opinion.
Sexist? And here I thought "he" was the sexist one.
> People just starting out in programming need encouragement - and things like
> loops can seem exceptionally abstract to someone who has never thought along
> those lines
People just starting out in programming should not attempt to
code gene sequence matching algorithms. If you knew anything
about those algorithms, you'd know that.
Dima

Signature
I have not been able to think of any way of describing Perl to [person]
"Hello, blind man? This is color." -- DPM