PHP如何高效并且快速方便的读取html中的指定内容便于抓取数据?

已邀请:

zkbhj - 凯冰科技站长

赞同来自:

在这里介绍一款组件:GitHub地址:https://github.com/bupt1987/html-parser


php html解析工具,类似与PHP Simple HTML DOM Parser。 由于基于php模块dom,所以在解析html时的效率比 PHP Simple HTML DOM Parser 快好几倍。


 
使用方法(以抓取蛋壳M站列表房源为例):
//目标url
$url = 'https://www.danke.com/room/bj/mz0.html?from='.$house_type_desc.'&page='.$page;
$html = file_get_contents($url);

if($html){

$house_list = [];
$html_dom = new \HtmlParser\ParserDom($html);
$list = $html_dom->find('div.r_lbx');
foreach ($list as $key => $p){
$house_list[$key]['house_from'] = $house_from;
$house_list[$key]['from_id'] = getHouseId($p->find('a',1)->getAttr('href'));
$house_list[$key]['city_code'] = $city_code;
$house_list[$key]['house_pic'] = $p->find('img',0)->getAttr('src');
$house_list[$key]['detail_url'] = $p->find('a',1)->getAttr('href');
$house_list[$key]['house_name'] = $p->find('a',1)->getAttr('title');
$house_list[$key]['house_price'] = $p->find('div.r_lbx_moneya',0)->find('span',0)->getPlainText();
$house_list[$key]['subway_info'] = '小区距离'.$danKeInfo['line'].$danKeInfo['station'].'直线距离约'.$danKeInfo['distance'].'米';
$house_list[$key]['subway_line'] = $danKeInfo['line'];
$house_list[$key]['subway_station'] = $danKeInfo['station'];
$house_list[$key]['house_tags'] = getHouseTags($p->find('div.r_lbx_cenc',0)->getPlainText(), 0, 0, 0, 1);

}

}

要回复问题请先登录注册