天天看点

使用PHP下载文件

使用PHP脚本来下载文件,无非是通过两种方式,一种是使用system、exec等即有的函数调用系统自带的下载工具,比如 wget 之类的来下载文件,还有一种是使用php本身利用Socket来下载文件,我选择第二种方式。

使用Socket下载文件,首先如果是http协议的文件,必须明白HTTP协议的运行过程,如果是FTP协议的则要了解ftp协议运行过程,比较繁琐。比如HTTP协议访问一个文件的代码:(来自手册)

<?php

$fp = fsockopen("www.example.com", 80, $errno, $errstr, 30);

if (!$fp) {

        echo "$errstr ($errno)<br />\n";

} else {

    $out = "GET / HTTP/1.1\r\n";

    $out .= "Host: www.example.com\r\n";

    $out .= "Connection: Close\r\n\r\n";

    fwrite($fp, $out);

        while (!feof($fp)) {

            echo fgets($fp, 128);

        }

    fclose($fp);

}

?>

我们为了简单起见,使用fopen直接访问远程文件来达到目的,同事又能够访问http,也能访问ftp,比较合适。当然,如果按照上面的思路来说,也可以使用ftp的函数库来实现。

我们使用fopen函数来完成我们的工作,实现了如下代码: 

#! /usr/bin/php

error_reporting(0);

set_time_limit(0);

//无参数则给出提示

if ($argc < 2){

        echo "Usage: ". $argv[0] ." URL [Destination]\n\n";

        exit();

//设置获取基本变量

$url = $argv[1];

$save_path = $argv[2] ? $argv[2] : "./";

$file_name = array_pop(explode("/", $url));

$localfile = $save_path . $file_name;

//检查变量

if (!check_url($url)){

        exit("Error: URL ". $url ." invalid.\n\n");

if (file_exists($localfile)){

        exit("Error: local file ". $localfile ." exists.\n\n");

//打开远程文件

$fp = fopen($url, "rb");

if (!$fp){

        exit("Error: Download ". $url ." failed.\n\n");

//打开本地文件

$sp = fopen($localfile, "wb");

if (!$sp){

        exit("Error: Open local file ". $localfile ." failed.\n\n");

//下载远程文件

echo "Downloading, please waiting...\n\n";

while (!feof($fp)){

    $tmpfile .= fread($fp, 1024);

//保存文件到本地

fwrite($sp, $tmpfile);

fclose($fp);

fclose($sp);

echo "Download file ". $file_name ." succeed!\n\n";

/* 检查URL合法性函数 */

function check_url($url){

        return preg_match("/^(http|ftp)(:\/\/)([a-zA-Z0-9-_]+[\.\/]+[\w\-_\/]+.*)+$/i", $url);    

?> 

我们把以上代码保存为 download.php 文件,在Linux/Unix下记得要加上可执行属性:

chmod +x download.php

另外,PHP脚本引擎的路径必须是 /usr/bin/php ,如果不是,请自行修改第一行为实际的PHP引擎路径,比如:

#! /usr/local/php/bin/php

使用上面的脚本来下载文件:

download.php       远程文件      保存路径

如把Google Talk程序下载到我们的 /tmp 目录下:

download.php  http://dl.google.com/googletalk/googletalk-setup.exe     /tmp/

如果不出错,等待一会就能够在 /tmp/ 目录下看到 googletalk-setup.exe 文件。

能够改进的就是支持更多协议、需要验证的能够输入用户名密码、有下载进度条。至于断点续传和多线程对于PHP来说还不太现实,有兴趣的可以自己加深一步。

PS: 我另外发现一个更强的HTTP下载类,是dedeCMS的作者IT柏拉图写的:

<?

/*=======================================

// 织梦Http下载类

// 织梦之旅 www.dedecms.com 

=======================================*/

class DedeHttpDown

{

var $m_url = "";

var $m_urlpath = "";

var $m_scheme = "http";

var $m_host = "";

var $m_port = "80";

var $m_user = "";

var $m_pass = "";

var $m_path = "/";

var $m_query = "";

var $m_fp = "";

var $m_error = "";

var $m_httphead = "" ;

var $m_html = "";

//

//初始化系统

function PrivateInit($url)

      $urls = "";

      $urls = @parse_url($url);

      $this->m_url = $url;

        if(is_array($urls))

        {

        $this->m_host = $urls["host"];

        if(!empty($urls["scheme"])) $this->m_scheme = $urls["scheme"];

        if(!empty($urls["user"])){

        $this->m_user = $urls["user"];

        if(!empty($urls["pass"])){

        $this->m_pass = $urls["pass"];

        if(!empty($urls["port"])){

        $this->m_port = $urls["port"];

        if(!empty($urls["path"])) $this->m_path = $urls["path"];

        $this->m_urlpath = $this->m_path;

        if(!empty($urls["query"]))

        $this->m_query = $urls["query"];

        $this->m_urlpath .= "?".$this->m_query;

     }

//打开指定网址

function OpenUrl($url)

     //重设各参数

     $this->m_url = "";

     $this->m_urlpath = "";

     $this->m_scheme = "http";

     $this->m_host = "";

     $this->m_port = "80";

     $this->m_user = "";

     $this->m_pass = "";

     $this->m_path = "/";

     $this->m_query = "";

     $this->m_error = "";

     $this->m_httphead = "" ;

     $this->m_html = "";

     $this->Close();

     //初始化系统

     $this->PrivateInit($url);

     $this->PrivateStartSession();

//获得某操作错误的原因

function printError()

     echo "错误信息:".$this->m_error;

     echo "具体返回头:<br>";

     foreach($this->m_httphead as $k=>$v)

     { echo "$k => $v <br>\r\n"; }

//判别用Get方法发送的头的应答结果是否正确

function IsGetOK()

     if( ereg("^2",$this->GetHead("http-state")) )

     { return true; }

     else

     {

      $this->m_error .= $this->GetHead("http-state")." - ".$this->GetHead("http-describe")."<br>";

      return false;

//看看返回的网页是否是text类型

function IsText()

     if(ereg("^2",$this->GetHead("http-state"))

      && eregi("^text",$this->GetHead("content-type")))

      $this->m_error .= "内容为非文本类型<br>";

//判断返回的网页是否是特定的类型

function IsContentType($ctype)

      && $this->GetHead("content-type")==strtolower($ctype))

      $this->m_error .= "类型不对 ".$this->GetHead("content-type")."<br>";

//用Http协议下载文件

function SaveToBin($savefilename)

     if(!$this->IsGetOK()) return false;

     if(@feof($this->m_fp))

     { $this->m_error = "连接已经关闭!"; return false; }

     $fp = fopen($savefilename,"w") or die("写入文件 $savefilename 失败!");

     while(!feof($this->m_fp)){

      @fwrite($fp,fgets($this->m_fp,256));

     @fclose($this->m_fp);

     return true;

//保存网页内容为Text文件

function SaveToText($savefilename)

     if($this->IsText()) $this->SaveBinFile($savefilename);

     else return "";

//用Http协议获得一个网页的内容

function GetHtml()

     if(!$this->IsText()) return "";

     if($this->m_html!="") return $this->m_html;

     if(!$this->m_fp||@feof($this->m_fp)) return "";

      $this->m_html .= fgets($this->m_fp,256);

     return $this->m_html;

//开始HTTP会话

function PrivateStartSession()

     if(!$this->PrivateOpenHost()){

      $this->m_error .= "打开远程主机出错!";

     if($this->GetHead("http-edition")=="HTTP/1.1") $httpv = "HTTP/1.1";

     else $httpv = "HTTP/1.0";

     fputs($this->m_fp,"GET ".$this->m_urlpath." $httpv\r\n");

     fputs($this->m_fp,"Host: ".$this->m_host."\r\n");

     fputs($this->m_fp,"Accept: */*\r\n");

     fputs($this->m_fp,"User-Agent: Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.2)\r\n");

     //HTTP1.1协议必须指定文档结束后关闭链接,否则读取文档时无法使用feof判断结束

     if($httpv=="HTTP/1.1") fputs($this->m_fp,"Connection: Close\r\n\r\n");

     else fputs($this->m_fp,"\r\n");

     $httpstas = fgets($this->m_fp,256);

     $httpstas = split(" ",$httpstas);

     $this->m_httphead["http-edition"] = trim($httpstas[0]);

     $this->m_httphead["http-state"] = trim($httpstas[1]);

     $this->m_httphead["http-describe"] = "";

     for($i=2;$i<count($httpstas);$i++){

      $this->m_httphead["http-describe"] .= " ".trim($httpstas[$i]);

      $line = str_replace("\"","",trim(fgets($this->m_fp,256)));

      if($line == "") break;

      if(ereg(":",$line)){

       $lines = split(":",$line);

       $this->m_httphead[strtolower(trim($lines[0]))] = trim($lines[1]);

      }

//获得一个Http头的值

function GetHead($headname)

     $headname = strtolower($headname);

     if(isset($this->m_httphead[$headname]))

      return $this->m_httphead[$headname];

      return "";

//打开连接

function PrivateOpenHost()

     if($this->m_host=="") return false;

     $this->m_fp = @fsockopen($this->m_host, $this->m_port, &$errno, &$errstr,10);

     if(!$this->m_fp){

      $this->m_error = $errstr;

     else{

      return true;

//关闭连接

function Close(){

----------------------------------------------------------------------------------

这个类的使用方法:

下载网页

$httpdown = new DedeHttpDown();

$httpdown->OpenUrl("http://www.dedecms.com");

echo $httpdown->GetHtml();

$httpdown->Close();

如果下载图片并保存,可以用

$httpdown->OpenUrl("http://prato.bokele.com/0/0/399/bGluMi5qcGc=.jpg");

echo $httpdown->SaveBin("test.jpg");

echo "<img src='test.jpg'>";