Aj da ne otvaram novu temu za stari problem :) Nasao sam neku skriptu za skidanje podataka sa IMDB-a, i poceo da je prilagodjavam sebi. E sad su tu nastali problemi. Kad skinem podatke, nije mi problem da napravim varijable od onih podataka koje nisu array, ali kad treba da uzmem podatke iz array-a ne mogu da se snadjem. Naime, kod mi glumce skine kao array i poredja ih jednog ispod drugog. E sad ja hocu da uzmem svakog glumca (ime i prezime) kao posebnu varijablu i da ga ubacim u bazu. Kod za prikazivanje podataka mi izgleda ovako:
Code:
<?php
include("imdb.php");
$imdb = new Imdb();
$movieArray = $imdb->getMovieInfo("The Godfather"); /* ovako mi lista sve sto dobije od podatala */
echo '<table cellpadding="3" cellspacing="2" border="1" width="80%" align="center">';
foreach ($movieArray as $key=>$value){
$value = is_array($value)?implode("<br />", $value):$value; /
echo '<tr>';
echo '<th align="left" valign="top">' . strtoupper($key) . '</th><td>' . $value . '</td>';
echo '</tr>';
}
echo '</table>';
echo $movieArray['title'] ; /* a ovako uzimam pojedinacan podatak, ako nije array */
Kod za prikupljanje podataka mi izgleda ovako:
Code:
<?php
class Imdb
{
function getMovieInfo($title)
{
$arr = array();
$imdbUrl = 'http://www.imdb.com/title/tt0387199/';
if($imdbUrl === NULL){
$arr['error'] = "No Title found in Search Results!";
return $arr;
}
$html = $this->geturl($imdbUrl);
if(stripos($html, "<meta name=\"application-name\" content=\"IMDb\" />") !== false){
$arr = $this->scrapMovieInfo($html);
$arr['imdb_url'] = $imdbUrl;
} else {
$arr['error'] = "No Title found on IMDb!";
}
return $arr;
}
// Scan movie meta data from IMDb page
function scrapMovieInfo($html)
{
$arr = array();
$arr['title_id'] = $this->match('/<link rel="canonical" href="http:\/\/www.imdb.com\/title\/(tt[0-9]+)\/" \/>/ms', $html, 1);
$arr['title'] = trim($this->match('/<title>(IMDb \- )*(.*?) \(.*?<\/title>/ms', $html, 2));
$arr['year'] = trim($this->match('/<title>.*?\(.*?([0-9][0-9][0-9][0-9]).*?\).*?<\/title>/ms', $html, 1));
$arr['rating'] = $this->match('/>([0-9].[0-9])<\/b><span class="mellow">\/10/ms', $html, 1);
$arr['genres'] = array();
foreach($this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Genre.?:(.*?)(<\/div>|See more)/ms', $html, 1), 1) as $m)
{
array_push($arr['genres'], $m);
}
$arr['creator'] = array();
foreach($this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Creator.?:(.*?)(<\/div>|>.?and )/ms', $html, 1), 1) as $m)
{
array_push($arr['creator'], $m);
}
$arr['cast'] = array();
foreach($this->match_all('/<td class="name">(.*?)<\/td>/ms', $html, 1) as $m)
{
array_push($arr['cast'], trim(strip_tags($m)));
}
$arr['karakteri'] = array();
foreach($this->match_all('/<td class="character">(.*?)<\/td>/ms', $html, 1) as $m)
{
array_push($arr['karakteri'], trim(strip_tags($m)));
}
//Get extra inforation on Release Dates and AKA Titles
$arr['runtime'] = trim($this->match('/Runtime:<\/h4>.*?([0-9]+) min.*?<\/div>/ms', $html, 1));
if($arr['runtime'] == '') $arr['runtime'] = trim($this->match('/infobar.*?([0-9]+) min.*?<\/div>/ms', $html, 1));
$arr['awards'] = trim($this->match('/([0-9]+) wins/ms',$html, 1));
$arr['nominations'] = trim($this->match('/([0-9]+) nominations/ms',$html, 1));
$arr['storyline'] = trim(strip_tags($this->match('/Storyline<\/h2>(.*?)(<em|<\/p>|<span)/ms', $html, 1)));
return $arr;
}
// ************************[ Extra Functions ]******************************
function geturl($url)
{
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 5.1; rv:2.0.1) Gecko/20100101 Firefox/4.0.1");
$html = curl_exec($ch);
curl_close($ch);
return $html;
}
function match_all($regex, $str, $i = 0)
{
if(preg_match_all($regex, $str, $matches) === false)
return false;
else
return $matches[$i];
}
function match($regex, $str, $i = 0)
{
if(preg_match($regex, $str, $match) == 1)
return $match[$i];
else
return false;
}
}
?>
E kako da uzmem podatke posebno tamo gde su array (primera radi kod :
$arr['genres'] = array();
foreach($this->match_all('/<a.*?>(.*?)<\/a>/ms', $this->match('/Genre.?:(.*?)(<\/div>|See more)/ms', $html, 1), 1) as $m)
{
array_push($arr['genres'], $m);
}). Sa ovim kodom on mi izbaci Comedy, Drama, a ja hocu ta dva da razdvojim i posebno da ih ubacim u dva razlicita polja u bazu.
Jel moze neko da pomogne?