Hi all,
Could some one help me out with this Q?
If I designed a db (MySQL or PostgreSQL) & it contained more than 5+
million users with all unique user names... how can I speed up db
access ...etc to search a new users name if it has already been taken?
All hints & tips welcome (plus good db books on this matter & some)...
The reason I ask is to know how sites like hotmail, gmail, ebay ...etc
all do it so quickly! But I'd prefer to use either MySQL or
PostgreSQL.
TIA
IchBin - 24 Nov 2005 16:50 GMT
> Hi all,
>
[quoted text clipped - 11 lines]
>
> TIA
Out side of a user name as the primary key or just indexed don't now.
You should look at the 100 percent java database called HSQLDB. I moved
away from MySQL to this DBMS.
Performance wise it runs rings around MySQL, PostgreSQL and most all of
the other dbms's.. Its has a small footprint. You can run it as
embedded, standalone or server. It was just ranked third as a enterprise
databases after Oracle and DB2. All that info is at http://hsqldb.org/
If you want something even faster they have a commercial version called
HXSQL. They have all of the performance information there at
http://www.hxsql.com/

Signature
Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
Aerodyne - 25 Nov 2005 12:43 GMT
Thanks IchBin .... but I'm sure even with HSQLDB searching through
millions of unique user names can be optimize in some way or another.
What are the best solutions for this task or do most just leave the
database do it with out a good design.
Hotmail has 150+ million users ... yet they get back to you very
quickly as if the user name is already in use & several suggestions!
I'm just looking for good database design if any or tricks for fast
searching... ie instead of putting the complete city name in a field
(LosAngeles) you just use LA, that way you only have to search for two
characters rather than all 10 for equality or like wise for all other
cities ... just a 3 letter acronym.
Dyreatnews@sun.com - 25 Nov 2005 13:28 GMT
> Thanks IchBin .... but I'm sure even with HSQLDB searching through
> millions of unique user names can be optimize in some way or another.
[quoted text clipped - 9 lines]
> characters rather than all 10 for equality or like wise for all other
> cities ... just a 3 letter acronym.
You are on to something here. I believe it is called hashing :)

Signature
dt
However, experience shows that for many people and many applications a
dose of paranoia is reasonable - Bjarne Stroustrup
IchBin - 25 Nov 2005 15:48 GMT
>> Thanks IchBin .... but I'm sure even with HSQLDB searching through
>> millions of unique user names can be optimize in some way or another.
[quoted text clipped - 11 lines]
>
> You are on to something here. I believe it is called hashing :)
Yes it is called hashing but the Database does all of this thru keys and
indexes under the covers for. Defining these relationship's using PK, FK
and indexes correctly makes queries run so fast.
IMHO - It still depends on the relationship of those keys and indexes to
be defined correctly and the knowledge of correct SQL statement
definition to optimized the use of the fore mentioned keys an indexes.
Also to normalize you database the best you can.
Looking for a state can be optimized by defining it as a unique index or
primary key. Defined as a FK when if referenced by other tables. You do
not want to do any hashing. This is why you have a DBMS. You may want to
google for "Database normalization techniques" also.
Here are some good links for review..
http://en.wikipedia.org/wiki/Hash_table
http://www.datamodel.org/index.html
http://en.wikipedia.org/wiki/Database_normalization
http://www.datamodel.org/NormalizationRules.html

Signature
Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
IchBin - 25 Nov 2005 15:56 GMT
>>> Thanks IchBin .... but I'm sure even with HSQLDB searching through
>>> millions of unique user names can be optimize in some way or another.
[quoted text clipped - 32 lines]
> http://en.wikipedia.org/wiki/Database_normalization
> http://www.datamodel.org/NormalizationRules.html
Forgot to saying something about lookup tables or maintenance tables.
Good Example would be a State table.
http://www.datamodel.org/DataModelLookup.html
http://www.datamodel.org/DataModelKeys.html

Signature
Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
__________________________________________________________________________
'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)
Bill Karwin - 25 Nov 2005 18:58 GMT
> Hi all,
>
[quoted text clipped - 9 lines]
> all do it so quickly! But I'd prefer to use either MySQL or
> PostgreSQL.
I'd put an index on the username field and increase the size of the
MySQL key cache. The key cache is a specific block of memory that is
used to store indexes. If this is large enough, MySQL may be encouraged
to store the entire index in memory, so it won't need to access the disk
to use that index.
See also http://dev.mysql.com/doc/refman/5.0/en/myisam-key-cache.html,
and especially about index cache preloading:
http://dev.mysql.com/doc/refman/5.0/en/index-preloading.html
Anyway, that should put the index into system memory, and subsequent
lookups (B-tree lookups on a data structure cached in memory) will be
about as fast as you could hope for, given the technology choice.
I'm not sure if PostgreSQL has a similar tuning option. I'll leave that
research to you. :)
Regards,
Bill K.