Java Forum / General / October 2007
Sanitize file name
Philipp - 25 Oct 2007 09:06 GMT Hello,
On some platforms, file names cannot contain certain characters (eg. on windows no ? is allowed in a file name and path). Is there a way in the API to sanitize a user-supplied string so that it can be used as a valid filename? Is there a way to test if a filename is valid on a certain platform?
Thanks Phil
Andrew Thompson - 25 Oct 2007 10:47 GMT ...
>Is there a way to test if a filename is valid on a certain platform? This E.G. makes for some interesting results, though I am not sure if it really helps with the problem. The programmer would need to specially account for the 'last situation' where the user puts a character in the name that is used as (or is generally understood to be) a path separator.
Irritatingly, although Win's path separator is '\', '/' will apparently also work (here, on this Win XP pro box).
<sscce> import java.io.File; import java.io.IOException;
class TestFileName {
static void testFileName(String name) { try { File f = new File(name); System.out.println( f.getCanonicalPath() ); } catch(IOException ioe) { System.err.println( ioe.getMessage() + " '" + name + "'"); } }
public static void main(String[] args) { testFileName("123.txt"); testFileName("12?3.txt"); testFileName("12[3.txt"); testFileName("12{3.txt"); testFileName("12!3.txt"); testFileName("12/3.txt"); } } </sscce>
[OP] D:\projects\123.txt Invalid argument '12?3.txt' D:\projects\12[3.txt D:\projects\12{3.txt D:\projects\12!3.txt D:\projects\12\3.txt Press any key to continue . . . [\OP]
 Signature Andrew Thompson http://www.athompson.info/andrew/
Philipp - 25 Oct 2007 13:36 GMT > .. >> Is there a way to test if a filename is valid on a certain platform? [quoted text clipped - 45 lines] > Press any key to continue . . . > [\OP] As far as I know the invalid characters for filenames are: On Windows \ / : * ? " < > | On UNIX :
Running your SSCCE with these signs (although in a different order) gives (on WinXP):
[OP] Invalid argument '12?3.txt' D:\workspace\test\123.txt Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12:3.txt' D:\workspace\test\12[3.txt D:\workspace\test\12{3.txt D:\workspace\test\12!3.txt D:\workspace\test\12\3.txt D:\workspace\test\12;3.txt D:\workspace\test\12<3.txt D:\workspace\test\12>3.txt Invalid argument '12*3.txt' D:\workspace\test\12"3.txt Syntaxe du nom de fichier, de répertoire ou de volume incorrecte '12|3.txt' [/OP]
Note that the getCanonicalPath() method throws IOException for only some of them. So this does not seem a good method to identify bad chars.
The method described in http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0 (thanks Sabine for the link) actually creates the file. Well, that definitely works, but it's really ugly (IMHO).
Best regards Phil
Gordon Beaton - 25 Oct 2007 13:50 GMT > As far as I know the invalid characters for filenames are: > On Windows \ / : * ? " < > | > On UNIX : No, ':' should be valid on Unix, but read on...
AFAIK the only invalid char on traditional Unixy filesystems is '/' because it's the path component separator, anything else should be ok. That doesn't mean that other characters are *easy* to use or supported by every tool though.
Note that what's valid or not is actually file system dependent, so the answer really depends. For example, you can mount a VFAT, HPFS or NTFS volume on your unix host, and the names are then limited by VFAT or HPFS or NTFS as appropriate. There's more to consider than just valid filename characters: max filename length and file length are there too, among other things.
Personally I wouldn't try to put this logic into my application; leave it in the OS where it belongs (and where it will always be correct). If the user specifies a filename then just use it and be prepared to fail gracefully.
/gordon
--
Lew - 25 Oct 2007 14:17 GMT Philipp wrote:
>> As far as I know the invalid characters for filenames are: >> On UNIX :
> No, ':' should be valid on Unix, but read on... On my Fedora 7 box:
$ echo foo > temp\:.txt $ ls *.txt temp:.txt $
 Signature Lew
Gordon Beaton - 25 Oct 2007 14:19 GMT > On my Fedora 7 box: > > $ echo foo > temp\:.txt > $ ls *.txt > temp:.txt > $ And there was probably no need to escape the ':' either (but that depends on your choice of shell).
/gordon
--
Sherman Pendley - 25 Oct 2007 21:22 GMT > Irritatingly, although Win's path separator is '\', '/' > will apparently also work (here, on this Win XP pro > box). Any programmatic API I can think of that runs on Windows, in any language, will accept both '/'s and '/'s as path delimiters. The command shell still wants you to use backslashes, but that's pretty much it.
That being the case, I'd be irritated if '/' *didn't* work - it's a valid path delimiter on Windows, so there's no reason it should be rejected.
sherm--
 Signature Web Hosting by West Virginians, for West Virginians: http://wv-www.net Cocoa programming in Perl: http://camelbones.sourceforge.net
Philipp - 26 Oct 2007 09:35 GMT > The command shell still > wants you to use backslashes, but that's pretty much it. Not on XP. / are working well in the cmd shell.
Sabine Dinis Blochberger - 25 Oct 2007 12:04 GMT > Hello, > [quoted text clipped - 5 lines] > > Thanks Phil There's a possible solution in this thread: http://forum.java.sun.com/thread.jspa?threadID=629458&start=0&tstart=0
 Signature Sabine Dinis Blochberger
Op3racional www.op3racional.eu
Stefan Ram - 25 Oct 2007 13:50 GMT >On some platforms, file names cannot contain certain characters (eg. on >windows no ? is allowed in a file name and path). >Is there a way in the API to sanitize a user-supplied string so that it >can be used as a valid filename? I have specifed and implemented a conversion for this, which is called »Filode«.
Since one can never know in advance under which FileSystem a JVM will be hosted, Filode only assumes that a filename may contain characters A-Z of a single case.
http://www.purl.org/stefan_ram/pub/filode
http://www.purl.org/stefan_ram/html/ram.jar/de/dclj/ram/notation/filode/Text.html
Gordon Beaton - 25 Oct 2007 13:56 GMT >>Is there a way in the API to sanitize a user-supplied string so that it >>can be used as a valid filename? [quoted text clipped - 5 lines] > JVM will be hosted, Filode only assumes that a filename may > contain characters A-Z of a single case. One really irritating thing an application can do is prevent me from using the full capabilities offered by my system. Why would you want to enforce such a limitation? The OS will tell you whether a filename was valid or not when you try to create a file with it.
/gordon
--
Eric Sosman - 25 Oct 2007 14:26 GMT >>> Is there a way in the API to sanitize a user-supplied string so that it >>> can be used as a valid filename? [quoted text clipped - 9 lines] > to enforce such a limitation? The OS will tell you whether a filename > was valid or not when you try to create a file with it. On at least some versions of Windows, certain filenames are valid but surprising. Try using the file name "con.txt" and see what happens (on my XP box, "type con.anything" echoes what's typed at the keyboard).
 Signature Eric Sosman esosman@ieee-dot-org.invalid
Gordon Beaton - 25 Oct 2007 14:37 GMT > On at least some versions of Windows, certain filenames > are valid but surprising. Try using the file name "con.txt" > and see what happens (on my XP box, "type con.anything" echoes > what's typed at the keyboard). The same is true of "cat /dev/stdin" on Linux, and the technique is actually pretty useful for getting a program that requires a filename to read from stdin or print to stdout.
/gordon
--
Eric Sosman - 25 Oct 2007 16:41 GMT Gordon Beaton wrote On 10/25/07 09:37,:
>> On at least some versions of Windows, certain filenames >>are valid but surprising. Try using the file name "con.txt" [quoted text clipped - 4 lines] > actually pretty useful for getting a program that requires a filename > to read from stdin or print to stdout. Perhaps I wasn't clear enough. The surprising thing isn't that devices have names in the file system, but that Windows "imports" those names to every directory, and also gives them an unlimited number of aliases in every directory. "cat /dev/stdin.dat" will tell you it can't find any such file, while "type con.dat" and "type con.foobar" will both go straight to the CON: device.
I had to deal with this once, at a PPOE. Users could give their documents whatever names they liked, and could even have multiple identically-named documents in the same directory. Behind the scenes, the product constructed file names by "mangling" the user-provided names and attaching various extensions and disambiguating goodies. When we did our first Windows port, somebody created a pair of short documents describing the arguments for and against something, and assigned them the names "pro" and "con". These name stems passed through our mangler largely unchanged, yielding file names like "PRO.DOC" and "PRO.DC@" -- and I was the guy who fielded the bug report that resulted when we tried to store data in "CON.DOC" and "CON.DC@" ...
 Signature Eric.Sosman@sun.com
Daniel Pitts - 25 Oct 2007 17:51 GMT > Gordon Beaton wrote On 10/25/07 09:37,: >> [quoted text clipped - 28 lines] > guy who fielded the bug report that resulted when we tried > to store data in "CON.DOC" and "CON.DC@" ... Wow, talk about namespace clutter. You can't even use relative or absolute paths. Stupid M$.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Daniel Pitts - 25 Oct 2007 17:46 GMT >>> Is there a way in the API to sanitize a user-supplied string so that it >>> can be used as a valid filename? [quoted text clipped - 11 lines] > > /gordon Heh, that was basically my reply. Ohwell.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Roedy Green - 25 Oct 2007 16:56 GMT >On some platforms, file names cannot contain certain characters (eg. on >windows no ? is allowed in a file name and path). >Is there a way in the API to sanitize a user-supplied string so that it >can be used as a valid filename? >Is there a way to test if a filename is valid on a certain platform? See StringTools.isLegal.
This will test if the filename contains only some limited safe set.
What do you do with the bad chars? Convert them to something else? Delete them? Complain to the user?
I suppose you could test if the file exists, and if it does not, create it, and see if it exists. IF not you have a problem. Then delete it.
If you are going to be moving files from platform to platform, you want to restrict them ALL to same safe set, e.g.
A-Z a-z 0-9 .
Then allow the platform specific FileSeparator. e.g. / \ or only allow / which seems to work universally.
I woud would avoid space, particularly lead or trail space.
You might also allow _, but to be safe, I would leave that out too. It has magic meaning to various people.
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Roedy Green - 25 Oct 2007 17:04 GMT On Thu, 25 Oct 2007 15:56:52 GMT, Roedy Green <see_website@mindprod.com.invalid> wrote, quoted or indirectly quoted someone who said :
>See StringTools.isLegal. bundled in http://mindprod.com/products1.html#COMMON11
 Signature Roedy Green Canadian Mind Products The Java Glossary http://mindprod.com
Daniel Pitts - 25 Oct 2007 17:45 GMT > Hello, > [quoted text clipped - 5 lines] > > Thanks Phil If you don't need to protect the current system from the user (eg, you're running locally on the user's computer), then let the user enter whatever they want. If it's not a valid filename, let the file operation throw the exception. Ofcourse, you should catch it and display the appropriate dialog.
On the other hand, just because a string has no "forbidden" characters, doesn't mean its a valid file for your purpose. If it happens to be the same name as a directory, then reading and/or writing to it will fail in most cases. If it happens to be a read-only file, then writing to it might fail, depending on user privilege. If the file doesn't exist, but the parent path is read-only, you won't be able to create.
So, in short. Let if its not a security issue, let the OS tell you the validity. If it *is* a security issue, let the security manager tell you the validity.
Hope this helps, Daniel.
 Signature Daniel Pitts' Tech Blog: <http://virtualinfinity.net/wordpress/>
Free MagazinesGet these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...
|
|
|