Hello,
i am working on a project where users are able to submit any ammount
of large binary files.
Currently all data is stored in a MySql database, but i am uncertain
if it would be better/more efficient/faster to store the binary files
on the filesystem. This would certainly save MySql from a lot of work,
but ensuring data integrity could become complicated.
I am uncertain if it is a good idea to move the binary data out of the
database, so i appreciate any information.
Thanks in advance.
joeNOSPAM@BEA.com - 14 May 2008 15:26 GMT
> Hello,
>
[quoted text clipped - 7 lines]
> database, so i appreciate any information.
> Thanks in advance.
Yes, I'd certainly consider keeping huge piles of binary
files in the file system. Use the DBMS to store records
about the files.
Joe Weinstein at BEA Systems
Dave Miller - 14 May 2008 15:31 GMT
> Hello,
>
[quoted text clipped - 7 lines]
> database, so i appreciate any information.
> Thanks in advance.
AFAIK, the db manipulates blob handles and id not the data itself. If
that's correct, it's doing very_minimally more work than would be done
by the file system if you wrote straight to disk.

Signature
Dave Miller
Java Web Hosting at:
http://www.cheap-jsp-hosting.com/
David Harper - 14 May 2008 18:06 GMT
>> Hello,
>>
[quoted text clipped - 11 lines]
> that's correct, it's doing very_minimally more work than would be done
> by the file system if you wrote straight to disk.
MySQL's MyISAM engine stores the actual blob data in the .MYD files. It
doesn't do anything tricksy with blob locators like some other systems.
David Harper
Cambridge, England
Dave Miller - 14 May 2008 19:52 GMT
>>> Hello,
>>>
[quoted text clipped - 17 lines]
> David Harper
> Cambridge, England
Thanks - I didn't know that. If it's not too late to change my answer -
never mind, I agree with Joe W.

Signature
Dave Miller
Java Web Hosting at:
http://www.cheap-jsp-hosting.com/
Jeffrey H. Coffield - 14 May 2008 21:34 GMT
> Hello,
>
[quoted text clipped - 7 lines]
> database, so i appreciate any information.
> Thanks in advance.
Aside from direct performance questions, also consider what it would
take to back up and restore the database. If the large blobs don't
change much (archiving documents/images) then storing the data in files
and using an incremental backup scheme could be much more efficient.
Jeff Coffield
www.digitalsynergyinc.com
Arne Vajhøj - 15 May 2008 00:50 GMT
>> i am working on a project where users are able to submit any ammount
>> of large binary files.
[quoted text clipped - 10 lines]
> change much (archiving documents/images) then storing the data in files
> and using an incremental backup scheme could be much more efficient.
I think backup is an argument for storing in database.
To do backup and restore of both database and file system that
need to match adds complexity to the procedures.
Arne
Jeffrey H. Coffield - 15 May 2008 02:27 GMT
>>> i am working on a project where users are able to submit any ammount
>>> of large binary files.
[quoted text clipped - 18 lines]
>
> Arne
True, but when there are close to a terabyte of images that are archived
(write once, read seldom) we do an incremental backup of only what's
added. If it was in the database, you would have to back up the whole
thing. If a restore is ever done, the database can be up quickly while
the old images are restored.
Jeff C
Arne Vajhøj - 15 May 2008 03:19 GMT
>>>> i am working on a project where users are able to submit any ammount
>>>> of large binary files.
[quoted text clipped - 22 lines]
> thing. If a restore is ever done, the database can be up quickly while
> the old images are restored.
I see the point.
I do not like the idea of having the database and files not in sync.
But with the scenario you describe, then it may be the necessary
solution.
Arne
Arne Vajhøj - 15 May 2008 00:48 GMT
> i am working on a project where users are able to submit any ammount
> of large binary files.
[quoted text clipped - 5 lines]
> database, so i appreciate any information.
> Thanks in advance.
If it is huge files I would suggest using the file system
(unless you have database software and a real powerhouse of a database
server to handle it - think Oracle or DB2 on p595 or similar).
But for smaller files (let us say below 2 MB to pick an arbitrary
size) on decent hardware (not your old 486), then I would find using
the database attractive.
Performance will be approx. the same, but it will be much
easier to manage both app wise and operational.
Arne
BoBi - 17 May 2008 08:13 GMT
> > i am working on a project where users are able to submit any ammount
> > of large binary files.
[quoted text clipped - 18 lines]
>
> Arne
Hi All,
Thanks a lot all of you. I was struggling with the same question.
Since most of my blobs are well below 2 MB I'll use my mysql db.
:c), BoBi
http://www.dongo.org
Martin Gregorie - 17 May 2008 11:54 GMT
> Thanks a lot all of you. I was struggling with the same question.
> Since most of my blobs are well below 2 MB I'll use my mysql db.
Another datapoint: I'm doing the same (CLOBs in the database) but with
Postgres. It is fast and has been easy to program and use. I have only one
caveat: I'm storing e-mails and have found that some characters found in
mail headers don't play nicely with the backup/restore utility.
The bug has been reported, but I'm working on a custom approach too. I'll
back up the address index tables, which can be modified and carry
additional information such as 'never archive' flags, to a CSV file and
copy the messages to an mbox file. Quite apart from solving the Postgres
restore problem this makes me independent of the database implementation
in case I want to move to Derby. Recovery then consists of using the
normal loader to restore the messages before updating the addresses from
the CSV file.

Signature
martin@ | Martin Gregorie
gregorie. |
org | Zappa fan & glider pilot
Roedy Green - 16 May 2008 17:00 GMT
On Wed, 14 May 2008 05:53:35 -0700 (PDT), mebe
<internetsux@hotmail.com> wrote, quoted or indirectly quoted someone
who said :
> This would certainly save MySql from a lot of work,
What in addition is the database doing for you?
1. arranging transaction rollback -- get rid of a blob not completely
stored.
2. giving you ability to search by multiple tags associated with the
blob.
3. making sure backup includes everything current.
4. if the input data exist in the filesystem, rather than coming in
the wire, you will have an extra disk copy.
5. deal with concurrency problems.
My gut feeling is let the database handle it , and only if you
discover performance problems, experiment with using manual files.
If you don't actually serve the files, then having them separate would
make more sense. You are then just maintaining an index.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
Thomas Kellerer - 17 May 2008 13:11 GMT
Roedy Green wrote on 16.05.2008 18:00:
> On Wed, 14 May 2008 05:53:35 -0700 (PDT), mebe
> <internetsux@hotmail.com> wrote, quoted or indirectly quoted someone
[quoted text clipped - 23 lines]
> If you don't actually serve the files, then having them separate would
> make more sense. You are then just maintaining an index.
I vote for storing them in the database as well. Roedy has some very good points
there (although MySQL isn't the best DBMS when it comes to handle concurrency,
but that is a different story).
From my experience there is another advantage when storing the files in the
database: if the number of files gets above a certain number you need to
distribute them across directories. Most filesystems aren't very good in
handling thousands (if not millions) of files in a single directory.
Thomas
Roedy Green - 17 May 2008 16:37 GMT
On Sat, 17 May 2008 14:11:36 +0200, Thomas Kellerer
<FJIFALSDGVAF@spammotel.com> wrote, quoted or indirectly quoted
someone who said :
>Most filesystems aren't very good in
>handling thousands (if not millions) of files in a single directory.
FAT packs it in after even 1000 files per directory. NTFS is much
better. It uses an indexed lookup.
Unix has inodes which I presume requires substantial i/o just to list
the files in a directory.
Any database is going to use some sort of btree with most of tree
cached in RAM.

Signature
Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com
RedGrittyBrick - 17 May 2008 17:50 GMT
>> Most filesystems aren't very good in
>> handling thousands (if not millions) of files in a single directory.
>
> Unix has inodes which I presume requires substantial i/o just to list
> the files in a directory.
To just list the (names of) files in a directory requires no access to
inode tables. File names are stored in directories, not in inode tables.
I think most Unix filesystems (since FFS in '83?) aim to keep metadata
in the same cylinder group as the corresponding data.
> Any database is going to use some sort of btree with most of tree
> cached in RAM.
Filesystems like ext3 use Htree (specialised Btree) and RAM caching.
Not that I'm advocating filesystems over databases!

Signature
RGB