There’s a new search algorithm in town. It’s called SMITH. You might not have heard about it. That’s understandable this algorithm is not in use yet. But it can come to a search engine near you. That’s because a new study by Google shows that it’s superior to the BERT algorithm that you may already know about. In this article, I’ll go over the differences between the SMITH and BERT algorithms and explain why Google says SMITH beats BERT.
New Study Says Google’s SMITH Algorithm Beats BERT
But First, a Warning
I think Google thinks SMITH is better than BERT, the company won’t say if its current search algorithm uses SMITH at all.
SMITH algorithm might remain on the layer. For a while, anyway.
Still, it’s a great idea to understand what’s going on with this new algo because odds are better than even money that one day Google will use it to return search results.
What Is BERT?
BERT stands for Bidirectional Encoder Representations and Transformers. Any question?
Seriously, Google uses BERT for natural language processing. It helps the search software better understand online documents so it can rank them according to a specific query.
It works great right now. But there’s a problem.
You see, BERT works best with short text. Not long-form content.
BERT is limited to handle a few sentences or perhaps an entire paragraph because of the “quadratic computational complexity of self-attention with respect to input length. BERT is best when it process short document.
Google is looking at another algorithm called the Siamese Multidepth Transformer-based Hierarchical (SMITH).
What Is SMITH?
The SMITH algorithm enables Google to understand entire documents as opposed to just brief sentences or paragraphs.
While BERT attempts to understand words within sentences, SMITH tries to understand sentences within documents.
To do that, it uses a predictive algorithm. That’s how it understands whole documents.
And, according to Google, SMITH is better-suited to handle long-form documents.
In other words, SMITH can do what BERT can’t do.
But hold on, it doesn’t mean Google will replace BERT with SMITH. That’s not how this works.
Instead, Google will use SMITH to supplement BERT. They’ll work together to fully understand document content.
There’s another great benefit to the SMITH algorithm, though. It helps with long-tail queries.
According to the research, Google says that semantic matching between long documents “is less discovered
That’s exactly the problem they’re solving with SMITH.
How It Works
Let’s see, how the SMITH algorithm works. If you’re a true hardcore nerd, read on.
First, we’ll go on the important concept of algorithm pre-training. That’s when the algorithm is trained on a specific data set.
For example, if a sentence is written as: “To be or not to be, that is the ____.” What comes next?
The algorithm gets trained to understand the nest question.
This process was repeated over and over again with different phrases and sentences and eventually the algorithm becomes fairly smart.
But the example I shared above only masks out a single word. What would happen if I masked out a sentence in the middle of a paragraph?
Well, that’s what SMITH handles.
BERT uses masked word prediction. SMITH uses masked sentence prediction.
That’s yet another reason why SMITH is better. It can predict blocks of sentences.
Read that again: SMITH can predict bulk of sentences.
Welcome to the 21st century.
Where It’s Going
You can see SMITH in Google’s search algorithm one day. But that’s just my opinion.
Google will make its own decision as to whether SMITH becomes part of the ranking algo.
However, given the positive reviews from Google’s own research, the current limitations of BERT, and the fact that strategists love to use long-form content, don’t surprised if we didn’t see SMITH in use in the near future.
It’s also worth noting what Google doesn’t say. It doesn’t say that “more research is needed.”
That’s a pretty big tell right there.