Semantic Neural Networks

by Russ Ward (@russcward)Recently I wrote in this blog about search optimization, site relevance and semantic scent. If I can hold your attention for a few minutes I would like to share the idea that search terms are effectively artificial neural networks (ANN) of lexical fields and links.Read on at your own risk.

by Russ Ward (@russcward)

Recently I wrote in this blog about search optimization, site relevance and semantic scent. If I can hold your attention for a few minutes I would like to share the idea that search terms are effectively artificial neural networks (ANN) of lexical fields and links.

Read on at your own risk.

All right, OK. You’re thinking search optimization for the web is well known and, what the hell do neural networks have to do with anything anyway!

Well, underneath the seeming random nature of your web search there are immense, humongous and really big implementations of word aggregating and sorting going on.

Most of us know that search engines companies have enormous databases that record everything all of the time.  The search engine catalogs all web pages, the number of links to web pages and stores them in compressed form in a repository where they are indexed1.  (fun fact: The Google web crawlers that collect the all this web page information is written in the aptly named programming language Python. Hunh.) This index is first sorted by word occurrence and how many times the words have been hit or viewed and then ranked for apparent relevance.

Google’s indexing system, for example, is so efficient that a search for the word “apple” returns 856 Million results in point two four seconds (.24).  WOW. And I thought the number of web sites was big, imagine keeping a list of how many times words appear in those web sites and where they are! What makes it more incredible is that Google’s word list contains 14 Million words. Fourteen. Million. Words. Holy crap.

Well, at least I’m impressed.

I have trouble just managing files on my laptop.

These search engine dudes must not have very much to do, so they keep adding to this catalog, 24 hours a day, while maintaining absolutely incredible split second response (now there’s a job…).

Anyway, when you search for something using Google, their system looks up this astronomically big index and finds the same word/s that have the most hits.

This is called word relevance.

Sites that have the best match for a given search and have attracted other users due to their “relevance” will rise up in search rank and are more likely to be found by users (in other words, the more a site gets clicked on, the more it going to get clicked on).

The algorithms is actually much more complicated than this, but now you get the basic way it works.

OK. Having gotten that dramatically exciting information out of the way…

Think of a subject of interest and imagine the word in the context of what you are trying to find or understand. The semantic context of this word has a relation to other words. For example, a search for “trees” in Google generates results in a lexical field that has a contextual relationship to trees (see Figure 1 below).

Depending on your cognitive construct (what you think), one of these lexical relationships may be a match (or not). This can also be said to be a semantic version of pattern matching for you. For Google it is a semantic match combined with other terms – terms that have been chosen more often than others.

The pattern match process usually has one of three simple outcomes – it’s either meaningful to you, not meaningful to you or not directly a match but catches an interest that triggers you to explore more. So your choices are “select”, “quit” or “explore” (see Figure 2).

If you find a new, yet marginally related piece of info and decide to explore the subject it can be described as “Subsumption Learning” which forms part of Cognitive Construct Theory2.

Figure 2 shows us a model that follows the basic user flow in a keyword search engine. This user flow can be viewed as a basic neural network. The diagram can be considered as an ANN dependency graph that can be expressed in mathematical terms.

A simplistic iteration of Figure 2 can be seen in Figure 3, where your search, when submitted as a keyword or keyword phrase, behaves in the same manner as a neural network. This is commonly known as a “feedforward” network because it propagates forward and does not cycle back on itself. The search propagates through the search engine word index then the finds the URL and delivers the result.

Obviously this semantic web search does not end up with just one search return but millions of results that are prioritized by semantic relevance pattern matching within Lexical fields from that humongous Google index. For the sake of this discussion I have left out the implication of multiple search returns or how other “relevance” mechanisms affect your search result (thank Heaven).

If you look back to Figure 1, you might consider how the semantic keyword search result can be assessed against the other relevant words from that group called a Lexical Field.  Your search triggers an action potential that looks for the keyword and words related in that Lexical Field (easy huh).

If you’re not glazed over yet – you can see in Figure 3 how machine based semantic matching process can be applied in Neural Network terms. This is “propagation” from the submitted keyword to the best match relevant result. The triggered propagation follows the “feedforward” flow from query x to the result f in the diagram.  We don’t see the propagation between x and f so this machine based operation is hidden – Luckily, because it would be really (really) boring.

In my fuzzy logic opinion, this is how neural network theory can be applied to web search and Search Engine Optimization from both the human cognitive side and the machine operation side of an equation.

You’re in this deep so why not keep reading…

So what if I think there is a theoretical equation that considers your search a cognitive construct being processed using your brain’s neural network while the machine operation emulates the same type of processing? Hang in there.

Figure 4 (with more nice little circles) looks further at the neural network framework that I’m caught up in. On the left in Figure 4 there is a naïve attempt to describe a neural network construct that stylizes the human thought process.

In this case an initial idea (construct) shows divergent thinking from x1 arriving at f1 that converges into what becomes the users cognitive construct (search term).

At this point construct f1 becomes the content x for the search engine submission keyword/s.

So what do the alpha numerics on Figure 4 mean? Well this is where the math starts to kick in to support Neural Network theory.  This approach is called “probabilistic view scenario” where we will find that there is a certain probability of a matching search is a specific Lexical result range.

So you know that Search Optimization is important but this is really over complicating it, right?

What you optimize to is what you get.

If your SEO uses off the cuff keywords that you think are important, then think again. Figure 4 indicates that if the user thinks of their search term (f1) as their search term, then you had better make sure that you raise the probability of finding you by using matching terms with the users natural semantic terms.

Don’t be lulled into thinking that keyword search optimization is a peripheral task that can be done by the office intern.

So here are my takeaways:

  • Keywords are the result of a personal cognitive construct of the user – the construct is what the individual is looking to find in their search. We need to consider this… “Help people find what they are looking for, rather than setting out a shingle that you want them to find and hoping they will find it”.
  • Don’t just develop a list of keywords from your site content – find the keywords real people are already searching with to find products, services and information directly related to your product and the associated Lexical Field.
  • The associated terms in the Lexical Field are just as important as the keyword itself. This means that associated word phrases in the relevant Lexical Field need to appear in your web site search optimization of content.
  • Keyword search propagates through the enormous (humongous) catalog of the same terms on the Internet (yes the entire Internet) that impacts the result.
  • The more combinations of keywords and phrases that are optimized the greater the probability that your site content will be found (yep, the more it is found the more it will be found).
  • Neural network theory is not really important to the implementation of SEO on your site, but it may help you think more about search optimization and you can do about it.

On my next blog post I will talk about semiotics and how it can affect the bounce rate on landing pages.

Text References

1. Brin S. & Page L. The Anatomy of a Large-Scale Hypertextual Web Search Engine

2. Kutluk E., Sengel E., & Kilicaya F., (Undated) Cognitive Construct Theory