Where Your Name Translations Come From

Every name translation on this site has been verified against the calligraphy generation engine before it reaches you. Not a spot check — every one of the 157,000 names in our system has passed round-trip verification. If a pronunciation produces an error or invalid output, it never reaches the catalog.

This article explains where those names come from, how they are verified, and how the inventory continues to grow.

What You Are Buying

When you purchase a Japanese calligraphy design of your name, you are buying a specific pairing: your name and a particular way of writing it in Japanese romanization (romaji). "Michael" paired with maikeru is one product. "Michael" paired with maiku is a different product. Both are valid — they represent different Japanese renderings of the same English name.

The romaji determines which Japanese characters appear in the calligraphy. Different romaji produce different art. This is why one name can have multiple designs — and why getting the romaji right matters.

Each design also carries a pronunciation guide that helps you recognize the Japanese rendering as your own name. Maikeru with the pronunciation "MY-keh-roo" tells you: this is how a Japanese speaker would say Michael using these characters.

For an overview of how these characters work, see How to Write Names in Japanese.

Seven Sources

No single data source contains everything needed. Seven primary sources each contribute something unique:

ParaNames (309,000 names) — The backbone. Extracted from Wikidata, the structured knowledge base behind Wikipedia. Every name comes with its language of origin and a clean romanization.

Tsumugi (956,000 records) — A Japanese media database tracking how foreign names appear in TV, film, and music. Tsumugi answers a question no other source can: "Given a name, what are the most commonly used Japanese romanizations, and which is most popular?" For Michael, Tsumugi knows that maikeru is used 847 times in Japanese media, maiku 312 times, and mikaeru 89 times. This frequency data determines which pronunciation appears first on the product page.

Kaggle Census (33.5 million records) — Government census data from 106 countries. The frequency backbone — it answers "How popular is this name?" and "Which names should we research first?" Maria is the most common name globally with 1.31 million occurrences.

BehindTheName (65,000+ names) — A curated name reference providing etymological meanings and native-language pronunciations. BehindTheName is the bridge between name data and the etymology system that powers kanji designs.

Census IPA (147,000 name-pronunciation pairs) — Government census data from 14 countries cross-referenced with pronunciation dictionaries. Authoritative pronunciation data for high-frequency names.

WorldSys and Alphapolis (52,000 names combined) — Two supplementary sources from Japanese publishing, adding names primarily from Asian and non-Western language contexts.

The full list of reference works is available on our Name Resources page.

How Sources Combine

Sources are merged before deduplication. This sounds like a technical detail, but it directly affects what you see.

Consider the name Andrea. In Italian, Andrea is a male name. In English, Andrea is a female name. If the system kept only one Andrea record, it would lose one cultural context. By merging first, both language variants survive — the Italian Andrea and the English Andrea are separate products with separate pronunciations, because they represent genuinely different cultural traditions.

The deduplication key is the triple: name, romaji, and language. The same name with different romanizations or from different languages produces separate records. This preserves the reality that names mean different things in different places.

Filtering

Not everything in the raw data is a name. Particles like "de," "van," and "al" are name components, not standalone names. Titles like "Sir" and "Dr" are honorifics. These are filtered before entering the catalog.

Corrections

When a data quality issue is found — an incorrect romanization, a wrong language tag — it is corrected through a permanent override. Once written, a correction is never silently reverted. The system does not change beneath you.

The Quality Gate

Exhaustive verification

Every pronunciation in the system has been verified by sending it through the calligraphy generation engine and checking the result. The engine has strict rules about which sounds it can process. An invalid pronunciation string — one containing a sound combination the engine does not recognize — would produce an error.

Verifying every pronunciation in advance means you never encounter a "this name cannot be generated" failure. Every name you see on this site will produce valid calligraphy.

The ratchet

The verified inventory only grows. Names are added but never silently removed. If a name needs correction, a new record replaces it — the pronunciation is updated, not deleted. Today's 157,000 unique names is a floor, not a ceiling.

Gender and Demographics

Census data reveals demographics for every name. Each carries gender percentages — the proportion of people with that name who are male, female, or use it as a surname. Kelly is 75% female, 18% male, 7% surname. Jordan is nearly even.

This data appears on the product page as a gender indicator, helping you understand how your name is used globally. The data comes from government census records across 106 countries — not from assumptions or cultural stereotypes.

157,000 Names and Growing

Metric	Count
Verified name records	~202,000
Unique names	~157,000
Ready for the website	~106,000
Verified, computation pending	~51,000

The gap between verified (157,000) and ready (106,000) represents names that have passed quality verification but have not yet had their full records computed — etymology, kanji matches, famous people. This is a batch processing task, not a quality gap.

How the inventory expands

Three paths add new names:

Source expansion — New data sources or expanded collection from existing ones. The BehindTheName collection is currently in progress, with over 100,000 names in the queue. Each batch adds names not found in other sources.

Language expansion — Processing names from additional languages through the pronunciation system. A recent expansion added 55,000 names in a single session — names already in the system with romanizations but waiting for pronunciation support. German, Russian, Dutch, and Swedish names were already there; they just needed pronunciation labels.

Dictionary expansion — Adding pronunciation dictionaries for new languages. The system currently uses five language cascades. Extending to the full 13 languages supported by the calligraphy engine would unlock names that currently have no pronunciation match. For more on this, see How We Pronounce Your Name.

Priority

Not all names are processed in random order. Census frequency data drives priority: the most common names globally — Maria, Mohammed, John, David, Anna — are processed and verified first. The long tail is processed progressively.

This means that if your name is common, it is almost certainly in the system. If your name is rare, it may be among the names being added — or you can request it directly.