今天和一位開發的同僚吃飯聊到fastdfs,有幾個項目一直在用mongodb做檔案存儲,但是可能太浪費空間了,可能更适合用fastdfs來替代。
http://code.google.com/p/fastdfs/
fastdfs is an open source high performance distributed file system (dfs). it's major functions include: file storing, file syncing and file accessing, and design for high capacity and load balancing.
fastdfs is an open source high performance distributed file system. it's major functions include: file storing, file syncing and file accessing (file uploading and file downloading), and it can resolve the high capacity and load balancing problem. fastdfs should meet the requirement of the website whose service based on files such as photo sharing site and vidio sharing site.
fastdfs has two roles: tracker and storage. the tracker takes charge of scheduling and load balancing for file access. the storage store files and it's function is file management including: file storing, file syncing, providing file access interface. it also manage the meta data which are attributes representing as key value pair of the file. for example: width=1024, the key is "width" and the value is "1024".
the tracker and storage contain one or more servers. the servers in the tracker or storage cluster can be added to or removed from the cluster by any time without affecting the online services. the servers in the tracker cluster are peer to peer.
the storarge servers organizing by the file volume/group to obtain high capacity. the storage system contains one or more volumes whose files are independent among these volumes. the capacity of the whole storage system equals to the sum of all volumes' capacity. a file volume contains one or more storage servers whose files are same among these servers. the servers in a file volume backup each other, and all these servers are load balancing. when adding a storage server to a volume, files already existing in this volume are replicated to this new server automatically, and when this replication done, system will switch this server online to providing storage services. when the whole storage capacity is insufficiency, you can add one or more volumes to expand the storage capacity. to do this, you need to add one or more storage servers.
the identification of a file is composed of two parts: the volume name and the file name.
主要要測試一下是否适合跨idc的環境。譬如大檔案存儲,如何高效的複制到其他idc,是塊複制還是檔案級的複制。