[ARM PATCH] 2046/1: fix nwfpe for double arithmetic on big-endian platforms
Patch from Lennert Buytenhek Hi, I need the patch below (against 2.6.8-rc1-ds1) to make nwfpe properly emulate arithmetic with doubles on a big endian ARM platform. From reading the mailing list archives and from helpful comments I've received from people on this list, I gather that this has come up in the past, but it appears that Russell King was never really convinced as to why this patch is needed. I think I understand what's going on, and will try to explain. On little endian ARM, the double value 1.0 looks like this when stored in memory in FPA word ordering: bytes: 0x00 0x00 0xf0 0x3f 0x00 0x00 0x00 0x00 u32s: 0x3ff00000 0x00000000 u64: 0x000000003ff00000 On big endian, it looks like this: bytes: 0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00 u32s: 0x3ff00000 0x00000000 u64: 0x3ff0000000000000 It appears to be this way because once upon a time, somebody decided that the sub-words of a double will use native endian word ordering within themselves, but the two separate words will always be stored with the most significant one first. God knows why they did it this way, but they did. Anyway. The key observation is that nwfpe internally stores double values in the type 'float64', which is basically just a typedef for unsigned long long. It never accesses 'float64's on the byte level by casting pointers around or anything like that, it just uses direct u64 arithmetic primitives (add, shift, or, and) for float64 manipulations and that's it. So. For little endian platforms, 1.0 looks like: 0x00 0x00 0xf0 0x3f 0x00 0x00 0x00 0x00 But since nwfpe treats it as a u64, it wants it to look like: 0x00 0x00 0x00 0x00 0x00 0x00 0xf0 0x3f So, that's why the current code swaps the words around when getting doubles from userspace and putting them back (see fpa11_cpdt.c, loadDouble and storeDouble.) On big endian, 1.0 looks like: 0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00 Since nwfpe treats it as a u64, it wants it to look like: 0x3f 0xf0 0x00 0x00 0x00 0x00 0x00 0x00 Hey! That's exactly the same. So in this case, it shouldn't be swapping the halves around. However, it currently does that swapping unconditionally, and that's why floating point emulation messes up. This is how I understand things -- hope it makes sense to other people too. cheers, Lennert
Showing
Please register or sign in to comment