Hi all..
Thanx for all answers first of all i looked for all tools that u
suggested Swishe-e looks great im supposed to do very simple one
like swish- so they are good example for me
As Julian said i started that steps
First Parse Html then exclude tags and unwanted words than index
them
the question is how to index onemore point is i dont know how to
explain mybe this example helps. { if that page has a word like
investigation i have a tool which seperate that word to
investigation-investigate-investigate and i will index that link to
this words }
im planning such a structure so that if someone search investigation
first results will be investigation then investigate....)
hope im clear
till now everything is ok the question is indexing algorithm and
what else to index? it shouldnt be too complicated maybe one more
importand thing i should index again i will give example
if we search "simple investigation" in first results the pages which
has "simple investigation" should came and then
"simple,,,,,,,,,,,,,,,,,,,,,,,,,,,,, investigation"
so only this two criteria is importand for me thats why i should find
a such a kind of indexing algorithm
Thank You
> >>>> I have almost 100 html pages on my local disk and first i have to
> >>>> index them then i have to make simple word search on that index to
[quoted text clipped - 44 lines]
>
> Julian
Julian Treadwell - 06 Dec 2006 00:37 GMT
> Hi all..
>
[quoted text clipped - 74 lines]
>>
>> Julian
One way to allow phrase searching would be to include the word position
in your index table.
So the table structure would be:
field1: word (key)
field2: page # (multi-value)
field3: position (multi-value,linked to page)
So if the user searches for "simple investigation" and your search
program found "simple" on page 100 at position 32 and "investigation" at
page 100 at position 33 it could decide there's a phrase match and list
page 100 at the top of the list.
ibrahimover@gmail.com - 06 Dec 2006 12:40 GMT
Hi i forget to say there is a problem
im not alloved to use any DB so i have to solev this issue by text
files
im planning to make an index file which has smthng like
investigate|5
investigation|7
field|56
...
smthng like this i should orders this words in some order like b tree
than search "investigation " on that index when i find that i will get
the poineter "7"
but the problem is i dont want to build btree everytime so i guess i
have to know how to implement btree over text file how to
add/delete/search instead of in memory but im not sure just
thought with my little knowladge
than in onother object file the structure about "investigation" like
page#,position,.. will be in 7 th object so that with one search i can
go directly to 7th object and get informations about it
another way that i thought is dictionary isting idont know much about
it but i think its smthng like
invest
---igate(5)
---igation(7)
---igator(88)
etc so that first indexing like this would be hard but later its easy
to search but this time i dont know how to save that indexing on file
i guess im confused :(
> One way to allow phrase searching would be to include the word position
> in your index table.
[quoted text clipped - 9 lines]
> page 100 at position 33 it could decide there's a phrase match and list
> page 100 at the top of the list.
Andrew Thompson - 06 Dec 2006 13:12 GMT
> Hi i forget to say there is a problem
>
> im not alloved to use any DB so i have to solev this issue by text
> files
Huh?
Do you mean your boss said "Don't use a database!"
Why would the boss care, so long as it does
not cost anything?
Or, is it that you are teaching yourself Java, and
set the (arbitrary) rule that this code would not
use a database?
OTOH, if this is a college assignment, just how
much do you expect to learn by asking...
"does anybody have simple code to understand
what to index and how to index , how to search
writen with java "?
Something else, stranger altogether??
Andrew T.
ibrahimover@gmail.com - 06 Dec 2006 14:36 GMT
Hi thanx for answer even doesnt seems helpfull to me
as i said befor
"Im doing for exercise infect this isnt my exercise.. but if i
succes
this most part will be done other parts are just usual reports etc..
"
as u can guess im student and if you read my last post i have some idea
to do but im not an expert and i dont want to waste lot time by trying
useless or worst algorithms im not asking for b tree code or smthng
else just want to find the best way to do and i guess asking helps me
to find
if its not the way how it goes here im very sory i just thought i may
get some ideas some guide
> > Hi i forget to say there is a problem
> >
[quoted text clipped - 20 lines]
>
> Andrew T.
Julian Treadwell - 07 Dec 2006 01:10 GMT
> Hi thanx for answer even doesnt seems helpfull to me
>
[quoted text clipped - 36 lines]
>>
>> Andrew T.
You can use an alphabetically ordered text file instead of a database to
store your word index but you'll have to read through it sequentially
each time you do a search instead of doing a direct read. But with
modern computer speeds that won't be noticeable. You'll need to
have a line for each occurrence of each word.