The Pulse of the World in 10 Minutes

Join the Ken dot Live...

Get my daily email on the state of the little guy
(that’s us!) and how we can stick it to the big guy (them!)…

Baidu Updates Access Protocols, Blocks Google and Bing from Indexing Its Encyclopedia Content

Chinese internet behemoth Baidu has recently implemented changes to its robots.txt file, effectively preventing the search engine crawlers from Google and Microsoft Bing from indexing its content. This strategic update, noted on August 8 according to the Wayback Machine, signals a significant shift in how Baidu Baike, the firm’s extensive Wikipedia-like service, manages access to its nearly 30 million entries.

Prior to this change, Googlebot and Bingbot had partial access to Baidu Baike, allowing them to index its vast repository. However, the recent alteration underscores Baidu’s intent to protect its data, a move mirroring global trends where data exclusivity is increasingly becoming a strategic asset, especially for training sophisticated artificial intelligence (AI) models.

This action by Baidu reflects a growing recognition of the value inherent in proprietary data, particularly as companies worldwide escalate efforts to harness AI for competitive advantage. For small business owners and solopreneurs, this development highlights the critical importance of data management and the potential need to reevaluate how their own content is accessed and utilized by global tech players.

In parallel developments, Reddit announced in July a selective blocking of search engines from indexing its forums, with the exception of Google, underlining a multimillion-dollar arrangement that facilitates Google’s AI training initiatives. Similarly, Microsoft threatened last year to restrict access to its search data, aimed at curbing its use in rival AI applications, per a Bloomberg report.

While Baidu’s move might limit the scope of content available to Western search engines, it also illustrates the strategic maneuvers entities are employing to safeguard their informational assets. For small businesses and individual entrepreneurs, understanding these dynamics is crucial as they navigate the digital landscape where AI and data play pivotal roles.

In the broader context, major AI developers continue to forge significant partnerships to secure comprehensive access to content. A notable instance is OpenAI’s agreement with Time Magazine, granting access to over a century’s worth of archived materials, underscoring the ongoing race to feed AI systems with diverse and rich data sources.

Despite the update, a survey conducted by the Post revealed that cached content from Baidu Baike still appears in search results on Google and Bing, indicating that the full impact of the new policy may take time to manifest fully.

Representatives from Baidu, Google, and Microsoft have yet to respond to inquiries regarding this latest development. This evolving scenario presents a tapestry of challenges and opportunities for smaller enterprises striving to maintain relevance in a data-driven business environment.

We typically get the short end of the stick…from big business, from crappy employers and from crappy governments. So what I’ve (and my esteemed and impeccably dressed cohorts) decided to do is call them out on it…and also give you solutions to start tilting the playing field in your favor.