Character encodings


Character encodings are a major issue when dealing with datafeeds. Quite often feeds are incorrectly encoded, feeds are encoded twice, older versions of PHP do not support UTF8 (multi-byte) correctly, same for old versions of mysql. In mysql the settings of the collation is important, for joomla this should be set to utf8.

The parser works fine with ISO-8859-1 and UTF8 encoded feeds as input, the database itself should be set to utf8 collation (joomla default). The parser itself uses ISO-8859-1 as intermediate encoding. If your language is not supported in ISO-8859-1 characters will fail to display. In this case you must set the parser to use UTF-8.



into the feeds.php file ( in administrator/components/com_datafeeds/cron, if not present copy feeds-example.php). Be sure the used OS,webserver,PHP version, PHP functions and editor are UTF-8 ( multi byte) save. ( implemented in version svn:752)

Quotes and other characters appear as ?

Quit a few (UTF-8) feeds contain characters that are not proper UTF-8. Often a copy/paste operation from a MS-Office product on the merchants website is the cause. Some characters like the ' or not translated into correct UTF-8. This works fine untill the character is touched by some kind of decoding/encoding. If a UTF-8 feed with some illegal characters goes thru the parser ( working in ISO-8859-1 mode) the characters appear is question marks or small black squares. Setting the parser into UTF-8 mode as mentioned above and fetching UTF-8 feeds might solve the problem.

Disable auto encoding

If a UTF8 or ISO-8859-1 feed does not display (thus encode/decode) correctly, you might want to try setting 'auto encoding' to 'no' and entering the encoding in the next field. Sometime feeds provided as ISO-8859-1 are actually UTF8 encoded.

XML feeds with illegal characters might even fail to parse, this issue should be addressed to the provider of the datafeed. Normally the CSV version of the file should parse fine.


Several normal characters are gone?

Some characters like '/', '?' and '&' have a special meaning in url's. The ':' and '-' are special characters for joomla. Getting this characters into url's or into the system would make thing quite complex. Long before the parser went public, even before it went into a joomla component the choice was to simply remove these characters during the import.