Many feeds have some kind of 'problems'. This pagee depicts some common issue's on solutions

Splitting Stuff

Many feeds have a complete tree in the 'category' instead of splitting their categories nicely for you. For example (from a tradedoubler CSV feed)

merchantcategoryname Donna > Scarpe > Derby

You can't assign this merchantcategoryname directly to Select1..Select3. The solution is using a callback function.

By default the callback function generic_cb is used. Instead of this function add and use a function splitting the category.

 

function generic_gt_cb(&$item) {
                generic_cb($item);
                list($item['menu_1'],$item['menu_2'],$item['menu_3'])=
explode(">",$item['menu_1'],3);
}

add the function above to the feeds.php ( in administrator/components/com_datafeeds/cron )

(the function is already present in newer versions of the components - feeds-example.php)

 

Then in the feed configuration change generic_cb to generic_gt_cb in the Callback field and assign the merchantcategoryname to Select1

Renaming and reorganizing.

Feeds from different merchant often name things different 'T-shirt' versus 'T-shirts', or use a different tree to categorize items

small-animals -> cats

domestic animals -> cats

I script and documentation to solve size ( partiality ) is under construction

Finding categories

Most feeds contain some kind of categories or taxanomy you can put in to fields Select1 to Select9. Some feeds lack any kind of categorization. Sometimes however some information can be derived from the product deeplink or the image url.

Example

In case of the feed of the worldticketshop the deeplink contains some information about the type of ticket:

http%3A%2F%2Fwww.worldticketshop.nl%2Fconcerten%2Fchris_de_burgh_kaarten

using a callback will put the information in the Select1 ( = field menu_1 in the callback function)

function worldticketshop_cb(&$item) {
  generic_cb($item);
  if ( preg_match('#^http:.*www.worldticketshop.nl%2F(.*?)%2F#',$item['href'],$a) )
 {
    $item['menu_1']=$a[1];
  } else {
    $item['menu_1']='Diversen';
  }
}

 

Max execution time

more on performance

In general webservers are protected against run away scripts setting a time limit. The maximum time a php script is allowed to run is set by the max_execution_time configuration directive. When using large feeds the import might exceed the time limit and the import will stop only importing a part of the data.

How the execution  time is calculated depends and how fast it affects the import depends on the operating system.

You might try to increade the time using:

ini_set('max_execution_time',120);  #(in feeds.php)

however this will not work when safe mode is on.

On windows systems it might help to download the feeds manually and use the local file for the import.

Otherwise the feed is simply to big for your system. The parser does not support  incremental imports. Often the feed will be imported partially when the max execution time kicks in.

visits max_execution_time and set_time_limit for more information.

Some affiliate compagnie for example webgains and tradedoubler allow to merge several feeds into one, avoid this to keep the feeds small.