Understand how technology clusters are generated and named, including unrelated and miscellaneous clusters
What is a technology Cluster?
A technology cluster refers to a grouping of patent families relating to the same technical area.
How does the Classification platform group patent families into technology Clusters?
It groups patent families with similar characteristics into Technology Clusters. This is done by creating similarity matrices based on patent meta data. The clustering involves no human intervention or hard coded categories. It uses meta-data available to create technology clusters that are as accurate as possible. Machine learning plays a role here as the algorithms identify the technology domain of the patents and give different weights to different factors (e.g. codes tend to be poor at clustering software). Meta data used to create clusters includes:
- CPC codes
- Citations (forward and backward)
- Title
- Abstract
How are Cluster names determined?
Cluster names are machine generated, by reference to the title and abstract. Clusters are given a name that most closely describes all patent families in the Cluster using text summarisation, and natural language processing (NLP) techniques.
The clustering and naming algorithms are separate, ensuring that there is no possibility of a self-fulfilling prophecy in the clustering results. If the clustering were based on the occurrence of a certain phrase then it would bias that cluster towards containing only patents that used the phrase, and not other closely related technologies irrespective of words, skewing the clustering results.
Miscellaneous cluster
The platform will present a maximum of sixteen technology clusters. If a group of patent families creates more than sixteen clusters, the clustering tool will group the remaining families into ‘miscellaneous’.
Unrelated cluster
An ‘unrelated’ cluster will appear when all of the portfolios are not clustered. The 'unrelated' cluster will show the number of patent families that do not fit into any of the technology clusters in the report. For example, if the tool clustered company X and you are benchmarking company Y and Z: unrelated shows the number of patent families in Y and Z's portfolio that don’t cross over, or fit into, any of X’s clusters.
Need help? Request support and we'll connect you with a Cipher expert.