Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / October 2007

Tip: Looking for answers? Try searching our database.

Sanitize file name

Thread view: 
Philipp - 25 Oct 2007 09:06 GMT
Hello,

On some platforms, file names cannot contain certain characters (eg. on
windows no ? is allowed in a file name and path).
Is there a way in the API to sanitize a user-supplied string so that it
can be used as a valid filename?
Is there a way to test if a filename is valid on a certain platform?

Thanks Phil
Andrew Thompson - 25 Oct 2007 10:47 GMT
...
>Is there a way to test if a filename is valid on a certain platform?

This E.G. makes for some interesting results, though I
am not sure if it really helps with the problem.  The
programmer would need to specially account for the
'last situation' where the user puts a character in the
name that is used as (or is generally understood to be)
a path separator.  

Irritatingly, although Win's path separator is '\', '/'
will apparently also work (here, on this Win XP pro
box).

<sscce>
import java.io.File;
import java.io.IOException;

class TestFileName {

 static void testFileName(String name) {
   try {
     File f = new File(name);
     System.out.println( f.getCanonicalPath() );
   } catch(IOException ioe) {
     System.err.println( ioe.getMessage()  + " '" + name + "'");
   }
 }

 public static void main(String[] args) {
   testFileName("123.txt");
   testFileName("12?3.txt");
   testFileName("12[3.txt");
   testFileName("12{3.txt");
   testFileName("12!3.txt");
   testFileName("12/3.txt");
 }
}
</sscce>

[OP]
D:\projects\123.txt
Invalid argument '12?3.txt'
D:\projects\12[3.txt
D:\projects\12{3.txt
D:\projects\12!3.txt
D:\projects\12\3.txt
Press any key to continue . . .
[\OP]

Signature

Andrew Thompson
http://www.athompson.info/andrew/

Philipp - 25 Oct 2007 13:36 GMT
> ..
>> Is there a way to test if a filename is valid on a certain platform?
[quoted text clipped - 45 lines]
> Press any key to continue . . .
> [\OP]

As far as I know the invalid characters for filenames are:
On Windows \ / : * ? " < > |
On UNIX :

Running your SSCCE with these signs (although in a different order)
gives (on WinXP):

[OP]
Invalid argument '12?3.txt'
D:\workspace\test\123.txt
Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12:3.txt'
D:\workspace\test\12[3.txt
D:\workspace\test\12{3.txt
D:\workspace\test\12!3.txt
D:\workspace\test\12\3.txt
D:\workspace\test\12;3.txt
D:\workspace\test\12<3.txt
D:\workspace\test\12>3.txt
Invalid argument '12*3.txt'
D:\workspace\test\12"3.txt
Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12|3.txt'
[/OP]

Note that the getCanonicalPath() method throws IOException for only some
of them. So this does not seem a good method to identify bad chars.

The method described in
http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0
(thanks Sabine for the link) actually creates the file. Well, that
definitely works, but it's really ugly (IMHO).

Best regards
Phil
Gordon Beaton - 25 Oct 2007 13:50 GMT
> As far as I know the invalid characters for filenames are:
> On Windows \ / : * ? " < > |
> On UNIX :

No, ':' should be valid on Unix, but read on...

AFAIK the only invalid char on traditional Unixy filesystems is '/'
because it's the path component separator, anything else should be ok.
That doesn't mean that other characters are *easy* to use or supported
by every tool though.

Note that what's valid or not is actually file system dependent, so
the answer really depends. For example, you can mount a VFAT, HPFS or
NTFS volume on your unix host, and the names are then limited by VFAT
or HPFS or NTFS as appropriate. There's more to consider than just
valid filename characters: max filename length and file length are
there too, among other things.

Personally I wouldn't try to put this logic into my application; leave
it in the OS where it belongs (and where it will always be correct).
If the user specifies a filename then just use it and be prepared to
fail gracefully.

/gordon

--
Lew - 25 Oct 2007 14:17 GMT
Philipp wrote:
>> As far as I know the invalid characters for filenames are:
>> On UNIX :

> No, ':' should be valid on Unix, but read on...

On my Fedora 7 box:

$ echo foo > temp\:.txt
$ ls *.txt
temp:.txt
$

Signature

Lew

Gordon Beaton - 25 Oct 2007 14:19 GMT
> On my Fedora 7 box:
>
> $ echo foo > temp\:.txt
> $ ls *.txt
> temp:.txt
> $

And there was probably no need to escape the ':' either (but that
depends on your choice of shell).

/gordon

--
Sherman Pendley - 25 Oct 2007 21:22 GMT
> Irritatingly, although Win's path separator is '\', '/'
> will apparently also work (here, on this Win XP pro
> box).

Any programmatic API I can think of that runs on Windows, in any language,
will accept both '/'s and '/'s as path delimiters. The command shell still
wants you to use backslashes, but that's pretty much it.

That being the case, I'd be irritated if '/' *didn't* work - it's a valid
path delimiter on Windows, so there's no reason it should be rejected.

sherm--

Signature

Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net

Philipp - 26 Oct 2007 09:35 GMT
> The command shell still
> wants you to use backslashes, but that's pretty much it.

Not on XP. / are working well in the cmd shell.
Sabine Dinis Blochberger - 25 Oct 2007 12:04 GMT
> Hello,
>
[quoted text clipped - 5 lines]
>
> Thanks Phil

There's a possible solution in this thread:
http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0
Signature

Sabine Dinis Blochberger

Op3racional
www.op3racional.eu

Stefan Ram - 25 Oct 2007 13:50 GMT
>On some platforms, file names cannot contain certain characters (eg. on
>windows no ? is allowed in a file name and path).
>Is there a way in the API to sanitize a user-supplied string so that it
>can be used as a valid filename?

 I have specifed and implemented a conversion for this,
 which is called »Filode«.

 Since one can never know in advance under which FileSystem a
 JVM will be hosted, Filode only assumes that a filename may
 contain characters A-Z of a single case.

http://www.purl.org/stefan_ram/pub/filode

http://www.purl.org/stefan_ram/html/ram.jar/de/dclj/ram/notation/filode/Text.html
Gordon Beaton - 25 Oct 2007 13:56 GMT
>>Is there a way in the API to sanitize a user-supplied string so that it
>>can be used as a valid filename?
[quoted text clipped - 5 lines]
>   JVM will be hosted, Filode only assumes that a filename may
>   contain characters A-Z of a single case.

One really irritating thing an application can do is prevent me from
using the full capabilities offered by my system. Why would you want
to enforce such a limitation? The OS will tell you whether a filename
was valid or not when you try to create a file with it.

/gordon

--
Eric Sosman - 25 Oct 2007 14:26 GMT
>>> Is there a way in the API to sanitize a user-supplied string so that it
>>> can be used as a valid filename?
[quoted text clipped - 9 lines]
> to enforce such a limitation? The OS will tell you whether a filename
> was valid or not when you try to create a file with it.

    On at least some versions of Windows, certain filenames
are valid but surprising.  Try using the file name "con.txt"
and see what happens (on my XP box, "type con.anything" echoes
what's typed at the keyboard).

Signature

Eric Sosman
esosman@ieee-dot-org.invalid

Gordon Beaton - 25 Oct 2007 14:37 GMT
>      On at least some versions of Windows, certain filenames
> are valid but surprising.  Try using the file name "con.txt"
> and see what happens (on my XP box, "type con.anything" echoes
> what's typed at the keyboard).

The same is true of "cat /dev/stdin" on Linux, and the technique is
actually pretty useful for getting a program that requires a filename
to read from stdin or print to stdout.

/gordon

--
Eric Sosman - 25 Oct 2007 16:41 GMT
Gordon Beaton wrote On 10/25/07 09:37,:

>>     On at least some versions of Windows, certain filenames
>>are valid but surprising.  Try using the file name "con.txt"
[quoted text clipped - 4 lines]
> actually pretty useful for getting a program that requires a filename
> to read from stdin or print to stdout.

   Perhaps I wasn't clear enough.  The surprising thing
isn't that devices have names in the file system, but that
Windows "imports" those names to every directory, and also
gives them an unlimited number of aliases in every directory.
"cat /dev/stdin.dat" will tell you it can't find any such
file, while "type con.dat" and "type con.foobar" will both
go straight to the CON: device.

   I had to deal with this once, at a PPOE.  Users could
give their documents whatever names they liked, and could
even have multiple identically-named documents in the same
directory.  Behind the scenes, the product constructed file
names by "mangling" the user-provided names and attaching
various extensions and disambiguating goodies.  When we did
our first Windows port, somebody created a pair of short
documents describing the arguments for and against something,
and assigned them the names "pro" and "con".  These name
stems passed through our mangler largely unchanged, yielding
file names like "PRO.DOC" and "PRO.DC@" -- and I was the
guy who fielded the bug report that resulted when we tried
to store data in "CON.DOC" and "CON.DC@" ...

Signature

Eric.Sosman@sun.com

Daniel Pitts - 25 Oct 2007 17:51 GMT
> Gordon Beaton wrote On 10/25/07 09:37,:
>>
[quoted text clipped - 28 lines]
> guy who fielded the bug report that resulted when we tried
> to store data in "CON.DOC" and "CON.DC@" ...

Wow, talk about namespace clutter.  You can't even use relative or
absolute paths. Stupid M$.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Daniel Pitts - 25 Oct 2007 17:46 GMT
>>> Is there a way in the API to sanitize a user-supplied string so that it
>>> can be used as a valid filename?
[quoted text clipped - 11 lines]
>
> /gordon

Heh, that was basically my reply. Ohwell.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>

Roedy Green - 25 Oct 2007 16:56 GMT
>On some platforms, file names cannot contain certain characters (eg. on
>windows no ? is allowed in a file name and path).
>Is there a way in the API to sanitize a user-supplied string so that it
>can be used as a valid filename?
>Is there a way to test if a filename is valid on a certain platform?

See StringTools.isLegal.

This will test if the filename contains only some limited safe set.

What do you do with the bad chars?  Convert them to something else?
Delete them?  Complain to the user?

I suppose you could test if the file exists, and if it does not,
create it, and see if it exists. IF not you have a problem.  Then
delete it.

If you are going to be moving files from platform to platform, you
want to restrict them ALL to same safe set, e.g.

A-Z  a-z 0-9 .

Then allow the platform specific FileSeparator. e.g. / \ or only allow
/ which seems to work universally.

I woud would avoid space, particularly lead or trail space.

You might also allow _, but to be safe, I would leave that out too. It
has magic meaning to various people.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 25 Oct 2007 17:04 GMT
On Thu, 25 Oct 2007 15:56:52 GMT, Roedy Green
<see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted
someone who said :

>See StringTools.isLegal.
bundled in http://mindprod.com/products1.html#COMMON11
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Daniel Pitts - 25 Oct 2007 17:45 GMT
> Hello,
>
[quoted text clipped - 5 lines]
>
> Thanks Phil
If you don't need to protect the current system from the user (eg,
you're running locally on the user's computer), then let the user enter
whatever they want. If it's not a valid filename, let the file operation
throw the exception.  Ofcourse, you should catch it and display the
appropriate dialog.

On the other hand, just because a string has no "forbidden" characters,
doesn't mean its a valid file for your purpose.  If it happens to be the
same name as a directory, then reading and/or writing to it will fail in
most cases.  If it happens to be a read-only file, then writing to it
might fail, depending on user privilege.  If the file doesn't exist, but
the parent path is read-only, you won't be able to create.

So, in short. Let if its not a security issue, let the OS tell you the
validity.  If it *is* a security issue, let the security manager tell
you the validity.

Hope this helps,
Daniel.

Signature

Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>



Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.