by Russ Ward (@russcward)
DO YOU RECOGNIZE THIS: H [X]=E [I [X]] ?
While working on Facebook (FB) pages for pharmaceutical companies in early 2010, the problem of open and uncontrolled user generated content on a FB Wall became very apparent to Zemoga. Our initial solution for this problem was to turn off public access to the wall and add a new shout tab that users could submit posts to that were then reviewed before being approved and finally posted.
This solution was acceptable but still required human intervention to assess each wall post. While the solution lacked any automation, at least wall posts were vetted before communication to the world.
At this point the real opportunity seemed to be to stop inappropriate content getting through the system while being categorized to show what types of inappropriate content was being posted. But what sort of filter was best to filter these random user posts out into categories? The filter needed to not just look for particular words but to identify different types of terms as real English words, indecent words, words associated with diseases or human suffering. These different words work together to create context and meaning for readers.
In a previous career I had the opportunity to come across the work of a guy called Claude Shannon from Michigan who, in the late 40’s (yes 60 years ago and, no, I’m not quite that old), figured out a way to decipher encrypted communications between enemy operatives using mathematics and the science of entropy. Now Claude was probably not thinking of Social Media when he was challenged with this problem and little did he realize where his formula, that unrecognizable formula I started this post with, could be used.
Yet lo and behold Bob’s your uncle and here we are.
The novelty of the patent we hold is that we filter for semantic elements that are contained in databases and, upon finding the offending semantic term, we can categorize and report on the term to various designated administrators for appropriate action. The post management and reporting process is essentially automated – enabling communication to all designated parties including the author of the post. This categorization and reporting process has the ability to reject posts in real time so that potentially offensive content never gets to public view, while at the same time considerably reducing the administrative time and reporting burden.
A figure extracted the Zemoga’s Semantic Filter patent.
Yes, Sir. Mr. Shannon was incredibly intelligent, making many significant advances in science. Arguably he went on to become the father of information theory with his “A Mathematical Theory of Communication” in 1949 and his “Prediction and Entropy of printed English” in 1951.
I wonder if he would have been a Facebook user.
You can see the Zemoga Shannon Entropy Patent here: