Skip to content
Blueprint Technologies - Data information specialists
  • What we do

      Technology Solutions

      Application development
      Cloud and infrastructure
      Data governance
      Data migration
      Data science and analytics
      Ethos privacy platform
      IoT enablement
      Modern data estate
      Video analytics

      Solution Accelerators

      Data Catalog
      Data Loader
      Data Sharing Portal
      Datalake Query Editor
      Ethos Privacy Program
      Lakehouse Monitor

      Supportive Services

      Privacy consulting services
      Support engineering
      Localization

      Partnerships

      Databricks Partnership

      We specialize in using the power of the Databricks Lakehouse to help our clients solve real-world business problems

      Learn more
  • Our approach
  • Our work
  • Insights
  • Careers
Connect
Blueprint Technologies - Data information specialists
Back to insights

How to apply smart localization approaches to online searches

By Avelino López García

Most multilingual websites parse the end user's query in a non-English language, normalize the text and then map the term to an English-driven taxonomy, leaving room for errors in direct translations. Blueprint's Localization team examines better ways to maximize search results for non-English markets.

We are all used to going to a website and typing a few words to retrieve information, images or simply to find an item we want to buy. The process may seem straightforward for an end user, but there are hidden complexities that allow this type of website search to work well across languages. A significant number of adaptations and unique localization approaches are needed for non-English language searches and their complexities related to morphology, written scripts and conceptual and cultural differences.

The search space, one of the most interesting and lesser-known areas of localization that fall in the crossroads of search, taxonomies and translation, requires a hybrid expertise in library science and translation skills to produce results relevant to the end user.

Localizing your search results

These complexities can be attributed to the fact that most search systems are conceived for English, which requires customization for other languages. Most multilingual sites parse the end user’s query in a non-English language, then modify the query in certain ways (for instance, normalizing it to remove accent marks) to facilitate processing. After that, they map the normalized query term to an English-driven taxonomy in which keywords are localized into multiple languages to reach the corresponding concept tied to a unique identifier of the relevant asset. That is then retrieved and presented to the end user.

In this common setup, the route going from the end user’s query, through the localized keywords and ending in the asset does not always work perfectly. This is because there are many conceptual, linguistic and cultural differences among languages. Here are a few examples of problem areas that often produce inaccurate results in an English-centric setup, even though they contain accurately translated keywords and good taxonomies.

Girl with Schultüte on her first day of School

Schultüte is a uniquely German/Czech concept that has no equivalent in English or in many other cultures and languages.

Conceptual issues: In Europe, an image search for “family” typically shows results with a couple of children; in parts of Africa and the Middle East it could show more children. But in China during the one-child policy period, you would expect, for the most part, to get search results with families with only one child. In other words, even a concept seemingly as simple as family can require adaptations for different countries or markets.

Missing concepts: Some cultures have unique concepts, like “Schultüte” in Germany. That is a large, colorful cone full of school supplies, sweets and little presents given to kids when they are about to start their very first day of school. This word has no equivalent in English or in many other cultures and languages, so an English-centric database will not have an entry for this concept, making it impossible to add localized keywords for it. This often clouds the search with wrong or irrelevant results.

Absence of synonyms or missing tags: English concepts often have multiple equivalents in a target language. For instance, Spanish users looking for a “puzzle” are going to interchangeably use the Spanish translations “puzle” and “rompecabezas.” Mapping both terms during a search can be solved by ensuring that the taxonomy includes both synonyms. But to fully leverage the translated synonyms in the taxonomy, every single asset must be tagged with the English keyword “puzzle.” Yet, taxonomies are very large databases and grow continuously, so their localization is almost never complete. To compensate for any taxonomy shortcomings, it is common to pull more results by matching searches against the text in descriptions, captions, footnotes and other unstructured or free text associated to the assets.

Ambiguity issues: The last example of problematic searches is when a user includes homographs, or words with multiple meanings, like “bridge” in English. If the system lacks an effective disambiguation mechanism, these searches can pose a challenge for any language and return inaccurate results.

Localization and logic operators

From the perspective of the search engine, there is a set of best practices that can mitigate or solve many of those issues. First, ensure that all assets are tagged with English keywords. Second, make sure the keywords in the taxonomy are fully localized. Third, create language-specific concepts where needed. Fourth, have disambiguation prompts for the user to clarify their search. Last, and probably most effective, leverage the conversion of complex inputs from international users into Boolean searches — this is one of the most flexible and clever localization features I have used and experienced. It simply means to transform the original end user’s query at runtime into a compound search sequence that includes logical operators (AND, OR, NOT) as defined by the English mathematician George Boole in his book The Laws of Thought.

Boolean Search Logic

Boolean conversions can be leveraged to improve both recall and precision, which are the cornerstones of search metrics. Precision means fewer, but more accurate results, reducing the noise of irrelevant output. Recall means presenting a wider set of results, which is useful when you don’t have enough assets to show and want to drive users toward a related yet relevant set of results.

This is how Boolean conversions work:

When users in mainland China search for “family,” they expect the results to be images of a small family with 1 or 2 children, but instead they receive images of large families, perhaps even some with the Octomom family. For a search engine to prevent that type of result, a Chinese localizer can convert the query for “family” into (family AND (“1_child” OR “2_children”)). Doing so ensures the results will show the kind of small family users in China expect to see.

Now consider users in Germany looking for a “Schultüte” image. Since the concept does not exist in English, this search may generate no results or very few ones solely based on free text matches — if they are activated. In that case, the database developers have to create a German-specific concept in the taxonomy. But a simpler and cleverer alternative is to transform this query on the backend into a German sequence equivalent to ((school AND cone) NOT “traffic_cone”). Note the use of “NOT” to exclude more common cones irrelevant to this search.

A Spanish user searching for “puzzles” using one of its translations, “rompecabezas” or “puzle,” would only find assets if both translations were in the localized taxonomy and the search matched assets with either translation in free text. If all the assets are not tagged, if only one synonym was entered in the taxonomy and if the search does not expand to match free text, the search results could miss many assets. In this situation, the localizer can convert the queries for both “puzle” and “rompecabezas” into the Boolean compound (puzle OR rompecabezas). Then the results for both queries would include any assets containing either word, both in free text and in the taxonomy, casting the widest net to capture all relevant results.

Similarly, when a search uses homographs, like “game bridge,” you most likely want images of the card game. But free text can give you images of a deer or another type of game animal next to a road bridge. In that case, Boolean conversions can be used in any language to improve the precision of the search results. The localizer may convert end users’ queries like “game of bridge,” “bridge game” and “bridge card game” into (bridge (NOT (“dental_bridge” OR “road_bridge”)) AND “card_game”). The results would be more accurate than a blunt search in free text, retrieved from descriptions, captions, footnotes, etc.

Boolean conversion is a very smart and extremely flexible mechanism for search localization. It helps queries in all languages overcome a wide range of issues and drastically improve both precision and recall, the top two measures in the search world. Knowing how Boolean conversions work to achieve better search results is also useful in other contexts. When our localization team at Blueprint was recently asked to localize tags for software products to enable filtering by keyword and browsing by facet, it was exactly this kind of hybrid expertise that informed our approach to provide high-quality localization.

The Blueprint Localization team is a hidden gem in the U.S. West Coast technology landscape. Its collective knowledge, especially the combination of linguistic, cultural and research expertise, is a game-changing differentiator for the localization industry. Are you interested in leveraging your technical and language skills as part of a talented and quality-driven team? Check out our Careers page to learn more. 

Let's build your future.

Share with your network

You may also enjoy

Article

Future-Proofing Your Business: Data acquisition best practices

Organizations looking to future-proof their business will find success or failure determined by a few critical elements within their strategy—the first being the development of a robust data estate. Prioritizing this has a significant impact on business, specifically through cost savings and improved scalability.

Article

Building Ethical and Transparent Global AI Standards

The AI landscape is hitting new levels of growth, and legislators are taking notice. AI legislation may be in its early stages, but proposed regulations offer insights into the future of AI.
Blueprint Technologies - Data information specialists

What we do

  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics
Menu
  • Application development
  • Cloud and infrastructure
  • Data governance
  • Data migration
  • Data science and analytics
  • IoT enablement
  • Localization
  • Modern data estate
  • Privacy consulting services
  • Support engineering
  • Video analytics

Our approach

  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development
Menu
  • Business strategy
  • Course of Action Assessment
  • Facilitated innovation
  • Managed services
  • Product development
  • Project Definition Workshop
  • Proof of Concept
  • Solution development

Our work

Insights

Careers

Accelerator Support

Contact us

Linkedin Youtube Twitter Facebook Instagram
© 2022 Blueprint Technologies, LLC. 2600 116th Avenue Northeast, First Floor
Bellevue, WA 98004

All rights reserved.
Media Kit

Employer Health Plan

Privacy Notice
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Menu
  • What we do
  • Our approach
  • Our work
  • Insights
  • Careers
  • Connect
Follow
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram
Menu
  • LinkedIn
  • Youtube
  • Twitter
  • Facebook
  • Instagram