Sony Network Walkman NW-S23 MP3 File Storage Introduction The NW-S23 apparently supports real MP3 playback; it just obfuscates the files before writing them to the device. Ive reverse-engineered the obfuscation sufficiently that I can read and write files from the device, and the few bits I havent managed to explain away dont seem to matter - theyre probably magic bytes (version, etc.) or other non-critical data. File Mechanism and Layout When you load an MP3 file to the device via the MP3FileManager application, the first thing it does is to create a folder for the file. If youve just dropped a single MP3 file onto the application, your folder will be called New Folder or similar; if you drag an entire folder to the application, the folder name will be copied. The folder data is stored in a file called PBLIST1.DAT; whenever this file is modified, it is backed up to PBLIST0.DAT and the new data written to a fresh PBLIST1.DAT. Once the folder has been created, the MP3 files ID3 information is stripped, some of which ends up in the PBLIST1.DAT file, the MP3 file is obfuscated and written to the device, and the track number and folder/playlist position of the obfuscated file are copied to the PBLIST1.DAT file. Im not clear on the exact order of how this happens, but it seems logical that the application would attempt to first write the obfuscated file, and only if that succeeds update the PBLIST1.DAT file. The obfuscated file is named MPXXXX.DAT, where XXXX is the track number in zero-padded hex format. Note that this has no relation to the ID3 track number; its simply an internal index used by the NW-S23 to identify the track. Its also used in the obfuscation algorithm, as youll see below. To finish with the grosser details of the file handling, the files are located on the device as follows: [device root] (e.g. /media/NW-S23 on Linux, E: on Windows) | +-control (various files I dont know/care about live here) +-esys | | | +-nw-mp3 ** MPXXXX.DAT files go here | | | +-PBLIST0.DAT (backup playlist) | +-PBLIST1.DAT (live playlist) | +-hifi (more files I dont know/care about, probably ATRAC area) Specifics of File Formats General All multi-byte integers are stored in big-endian format, which means if youre writing code for glibc on Intel chips to interface with this youll need to do an amount of byteswapping. The library code Ive written does this for you where appropriate, e.g. in extracting track numbers, but foldernames and suchlike are left in their on-disk format. All text strings appear to be UTF-16 (or maybe UCS-2) with null termination. Folder names run to a maximum of 126 characters + NULL, and other metadata runs to 127 characters + NULL. PBLIST format * The file starts with an 8-byte signature consisting of the characters WMPLESYS. * There are six 2-byte words following this, the first two of which appear to be a timestamp with an epoch of 15:36 on May 26 1978 (honest, I did the math), but thats only speculation - the player doesnt seem to care what data you put in here. The next two words are 0x08 0x9F 0x9E 0xFF, a sequence which also appears in the MP*.DAT files and may indicate a version number. The last two words of this block are 0x00 0x03 0xCE 0xA0. * Next, theres a pair of 4-byte longwords containing the number of folders and the number of tracks on the device. * The last four bytes of header data are a longword XOR checksum of the header bytes. If you take the entire header including this field as longwords, and XOR them, you should end up with 0. * Next we get to actual data. First comes the folder list: for each folder on the device, theres a 256-byte block. The first 252 bytes are are 126 words containing the folder name in UTF-16 (I think) format. The last four bytes make up a longword pointing to the start of the tracklist for this folder as an absolute file offset - you should be able to fseek() to this offset and start reading the tracklist for the folder. * After all the folders comes the tracklist which the folders longword pointer points into. There is only a single tracklist on the device, consisting of a list of words representing each track. Thus for multiple folders the list is something like Folder 1 Track 1, Folder 1 Track 2 ... Folder 2 Track 1. Obviously this allows you to trivially move tracks between folders or change the order of tracks without having to reencode the files. The block is rounded up to the nearest multiple of eight bytes by zero-padding, and the only way you can find out how big the block is is by using the number of tracks field in the header plus a bit of math. When writing files to the device, the Sony application appears to try to fill holes in the existing tracklist before allocating new numbers, so for example if youve got tracks 1, 2 and 4 and you add a new track, it will be numbered 3 rather than 5. Your tracklist will still be written out in the correct order, i.e. 1, 2, 4, 3. * After the folder list comes the track metadata. For each track the device stores the original filename, the track title, and the artist, in that order, in fixed-sized blocks of 128 words. Unused space (i.e. for short strings) is zero-padded. * The file ends at the last block of metadata - theres no trailer. MPDAT format This is the fun one. * The file starts with a 4-byte signature, WMMP * Next is a 4-byte longword giving the total file-size in bytes. This includes the file header, i.e. its exactly what youd see displayed in a directory listing of the file. * Next is the duration of the track in milliseconds, again in a 4-byte longword. * The third 4-byte longword gives the number of frames in the file. If youre trying to write a file to the device using your own code, I recommend ripping bits out of XMMS or mp3info to get this number, as I had difficulty locating a library that would calculate it without actually decoding the entire file. * There are 16 bytes of magic: theres the 0x08 0x9f 0x9e 0xff sequence that occurs in the PBLIST file, followed by 0x01, and padded out to 16 bytes with 0x00. Ive no idea what any of this is but it seems unchanging. * The rest of the file is the obfuscated MP3 data, with no ID3 frames - strip those out before you encode or your file will not play in the device. The Obfuscation Mechanism The obfuscation mechanism is a trivial substitution cypher based on the track number. Start off with a 256-byte array (one for each possible byte value) and fill it with array[index] = 256 - index. Then, start working your way through powers of 2 from 1 up to the biggest power of 2 less than or equal to the track number. For each power N, if the track number has bit N set, go through your array in blocks of 2N, and swap the first N bytes of the block with the second N bytes. Heres the C code Ive written to do this: void mple_build_conv_array( guint16 trackno, guint8 *conv ) { guint16 bit; guint16 i; for ( i = 0; i < 256; i++ ) { conv[i] = 255 - i; } bit = 1; while( bit <= trackno ) { if ( trackno & bit ) { guint16 j; guint16 k; for ( j = 0; j < 256; j+= bit * 2 ) { for ( k = 0; k < bit; k++ ) { guint8 temp; temp = conv[j + k]; conv[j + k] = conv[j + k + bit]; conv[j + k + bit] = temp; } } } bit <<= 1; } } Note that this array works for conversion in either direction. =============================================================================== v1.0 / Ronan Waide / April 10, 2005 / Distribute as you see fit