dimanche 19 avril 2015

Design decision: Matching cyrillic chars in JSON with PHP

I'm developing a plugin for a CMS and have an unanticipated problem: because the plugin is multilang-enabled, input can be of any of the unicode character sets. The plugin saves data in json format, and contains objects with properties value and lookup. For value everything is fine, but the lookup property is used by PHP to retrieve these entities, and at certain points through regexes (content filters). The problems are:



  1. For non-latin characters (eg. Экспорт), the \w (word-char) in a regex matches nothing. Is there any way to recognize cyrillic chars as word chars? Any other hidden catches?

  2. The data format being JSON, non-latin characters are converted to JS unicodes, eg for the above: \u042D\u043A\u0441\u043F\u043E\u0440\u0442. Is it safe not to do this? (server restrictions etc.)


And the big 'design' question I have stems from the previous 2 problems:


Should I either allow users with non-Latin alphabet languages to use their own chars for the lookup properties or should I force them to traditional 'word' chars, that is a,b,c etc. + underscore (thus an alphabet from another language)? I'd welcome a technical advice to guide this decision (not a UX one).


Aucun commentaire:

Enregistrer un commentaire