天天看点

Android NDK之fseek, lseek

关于文件seek有一系列函数,在stream上操作的fseek, fseeko,在file descriptor上操作的lseek, lseek64等。下面是几个函数原型:

int fseek(FILE *stream, long offset, int whence);
int fseeko(FILE *stream, off_t offset, int whence);
off_t lseek(int fd, off_t offset, int whence);
off64_t lseek64(int fd, off64_t offset, int whence);      

对于2G以上的大文件,如果想直接seek到2G之后的位置,offset就得大于2G,这就要求offset必须是64位的类型。

从上面offset的类型来看,有三种:long,off_t,off64_t

off_t和off64_t是typedef出来的新类型,明显off64_t肯定是64位的,就是说lseek64是肯定支持大文件的。

对于long和off_t,fseeko的man手册有下面一段话:

man fseeko 写道 On many architectures both off_t and long are 32-bit types, but compilation with

#define _FILE_OFFSET_BITS 64

will turn off_t into a 64-bit type.

对于off_t,只要加一个宏编译参数,就可以让它变成64位。

对于long,其长度取决于系统和编译器,32位平台下,long是32位,64位平台下,long可能是32位,也可能是64位,这个取决于编译器。

总结一下就是,fseek在32位平台下无法支持大文件,64位平台下可能支持大文件(取决于编译器);fseeko和lseek可以通过宏参数设置,使其支持大文件;lseek64从函数名就可以看出来,它使支持大文件的。

上面都是针对Linux,现在我们来说Android。

Android上,fseek是无法支持大文件的,fseeko和lseek呢,设置了宏 _FILE_OFFSET_BITS之后,还是不行,google之后发现原来Android不支持啊。https://code.google.com/p/android/issues/detail?id=64613

鉴于这个网页不太方便打开,这里把内容贴出来:

Issue 64613:implement _FILE_OFFSET_BITS 写道 Reported by [email protected], Jan 8, 2014

This is arguably a dupe of Issue #55866 (NDK: Missing large file support), but that bug is still in NotEnoughInformation, so lets provide more information...

The NDK currently declares e.g.

extern off_t lseek(int, off_t, int);

extern off64_t lseek64(int, off64_t, int);

While this provides "large file support", it does not go as far as glibc does. On "proper" Linux, it's more complicated; if _FILE_OFFSET_BITS is set to 64, then the "normal" file I/O functions are 64-bit -- lseek(2) would take a 64-bit off_t, not a 32-bit off_t.

http://users.suse.com/~aj/linux_lfs.html

> In a nutshell for using LFS you can choose either of the following:

> * Compile your programs with "gcc -D_FILE_OFFSET_BITS=64". This forces all

> file access calls to use the 64 bit variants.

(See also e.g. glibc <features.h> and <unistd.h> which has lots of fun/complicated #if-fu.)

Many open-source libraries will use autoconf's AC_SYS_LARGEFILE macro (or variants thereof) in order to check for and enable 64-bit off_t on 32-bit platforms, effectively resulting in:

#define lseek lseek64

http://www.gnu.org/software/autoconf/manual/autoconf-2.69/html_node/System-Services.html

The problem with the NDK is that it doesn't support any of these patterns. Consequently, software built to support _FILE_OFFSET_BITS=64 behavior won't be built with this support enabled (because the NDK doesn't support it), resulting in use of the 32-bit file APIs instead of the 64-bit APIs.

https://code.google.com/p/android/issues/detail?id=55866#c6

> I have the same problem with my ffmpeg build. Can't open video files larger than 2GB.

What would be useful is for the NDK to implement/conform to the current glibc macros/patterns so that software with large file support can easily make use of it on Android.

Jan 9, 2014 Project Member #1 [email protected]

we don't actually have the full set of *64 functions yet either, but we're working on it.

Summary: implement _FILE_OFFSET_BITS (was: NDK: Missing "GNU compatible" large file support.)

Owner: [email protected]

Cc: [email protected] [email protected]

Nov 6, 2014 #2 [email protected]

This issue hit me recently in migrating some code that was safe using autoconf (AC_SYS_LARGEFILE) and I tried (paranoid) adding in `#define _FILE_OFFSET_BITS 64` all over the place.

Finally realized with a very small test program that Android does not respect `#define _FILE_OFFSET_BITS 64` or the autoconf macro as expected.

This led to a maddening bug that was hard to track down as core expectations were not correct.

Is this non-conformance documented anywhere?

好了,就只有lseek64了,好在Android支持这个。

但是怎么用呢,之前的代码全是用的fopen, fread, fseek, ftell系列的函数,好在有fileno这个函数。

int fileno(FILE *stream);      

 这个函数把stream转成file descriptor。

下面封装出自己的支持64位的fseek函数,注意fseek和lseek64的返回值。

int fseek_64(FILE *stream, int64_t offset, int origin) {
    int fd = fileno(stream);
    if (lseek64(fd, offset, origin) == -1) return errno;
    return 0;
}      

就这样,it works。

但是程序跑了一段时间后,发现有些不正常,一路追踪下来,bug锁定在了我们自己写的fseek_64函数。

具体表现是,用fread读了一些数据,然后fseek_64,接下来再fread,发现读到的数据不是我们期望的,在我们想要的数据前面,总是有一些脏数据,于是猜想是不是fread有缓存,脏数据就是缓存中未读完的数据呢?为什么fseek就没有问题,而我们的调用了lseek64的fseek_64却有问题?fseek和lseek之间有什么联系,又有什么区别?于是google之,证实了我的猜想。

首先,fread/fwrite系列函数在实现时确实是使用了缓存的。而lseek是系统调用,fseek是标准c库,它的底层实现也是调用了lseek,但是同时对缓存做了相应处理。比如,假设缓存中有10字节的数据,这时要往后跳4字节,这是fseek不需要调用lseek,只要把缓存的指针往后挪4个字节就ok了;如果要往后跳40字节呢,fseek就调用lseek,跳到指定位置,然后把缓存清空。

我们的fseek_64实现里面,只调用lseek64跳到了指定的地方,而没有去操作缓存,所以导致了上面的bug。

这里又要用到sefbuf函数来操作缓存。于是修改fseek_64函数如下:

int fseek_64(FILE *stream, int64_t offset, int origin) {
    setbuf(stream, NULL); //清空buffer
    int fd = fileno(stream);
    if (lseek64(fd, offset, origin) == -1) return errno;
    return 0;
}      

 这样改过之后,上面那个bug就没了。

最后,还有个问题,在做项目的过程中发现,对于已经到达EOF的stream,使用lseek是不能让stream再次可读的。不知道fseek函数有没有处理这个,如果有处理的话(目前感觉这种可能性很大),我们的fseek_64函数应该继续改进,使用rewind甚至重新打开文件,来使其再次可读。如果是这样,代码应改成这样:

int fseek_64(FILE *stream, int64_t offset, int origin) {
    if (feof(stream)) {
        rewind(stream);
    }
    else {
        setbuf(stream, NULL); //清空fread的缓存
    }
    int fd = fileno(stream);
    if (lseek64(fd, offset, origin) == -1) {
        return errno;
    }
    return 0;
}      

 后面验证过了再来更新验证结果。

update:

验证过了,fseek可以使已经EOF的stream重新可读,rewind也可以。所以,上面的代码是可以工作的。

继续阅读