Importing unique items/avoiding duplicates
Depending on the kind of feed you have there might be a lot of duplicates in your feed, for example the same travel trip with different departure dates, or widgets being identical except for the color.
Depending on your needs you want to skip or actually import these nearly duplicate items.
The importer determines duplicates on a hash based on the feed name, the title of each item, and the values in the select fields.
Take an simple example feed called 'Widgets feed'
# | name | type | category |
1 | soft widget | soft | blue |
2 | soft widget | soft | red |
3 | big widget | soft | blue |
4 | big widget | hard | red |
Assume the name field from the feed is assigned to the title in the feed configuration and the Select1..9 fields are left blank.
As we import the are now 2 unique items since the hash to determine unique items is based on the feedname+title feed, thus items number 1 and 2 and item numbers 3 and 4 are identical.
Assume the name field is assigned to the title and the type field is assigned to the Select 1.
Now there will be three unique elements. Number 1 and 2 are still identical, however 3 and 4 will differ since 'soft' and 'hard' are different.
Assume in addition we assign the category to Select 2 now we will import four items into the database since the combintion of feedname+title+Select0-9 is different for each item..
You can change this behaviour using a callback function. Assume you don't want to assign the 'category' field to a Select value but still have all items in the database.
Create a callback function and use code like below to create your own hash:
function u_cb(&$item) { $item['md5']=md5($item['feed'].$item['name']. $item['type'].$item['category']); generic_cb($item);
}
Often different items have different deeplinks so you could use:
$item['md5']=md5($item['href']);
unique titles
function u_cb(&$item) { generic_cb($item); $item['md5']=md5($item['feed'].$item['title']);
}