Ab welcher Delphi-Version ist string standardmäßig Unicode?

Seit Delphi 2009 ist der Typ string standardmäßig ein UnicodeString (UTF-16). In älteren Delphi-Versionen war string ANSI-basiert und abhängig von der Systemcodepage.

Was sind die häufigsten Fehler bei der Delphi-Unicode-Migration?

Typisch sind falsche Annahmen über Länge und Speichergröße (Length vs. Bytes), Probleme bei Windows-API-Calls mit PChar, stillschweigende Encoding-Konvertierungen bei Datei-I/O sowie Drittbibliotheken, die nur ANSI unterstützen oder eigene Stringtypen nutzen.

Sollten externe Schnittstellen eher UTF-8 oder UTF-16 verwenden?

Für REST/JSON, Web-Integrationen und viele Datei- und Protokollformate ist UTF-8 üblich und interoperabel. Intern in Delphi ist UTF-16 über string/UnicodeString praktikabel. Wichtig ist, an Systemgrenzen explizit zu konvertieren und das Encoding festzulegen.

Muss bei einer Unicode-Migration auch die Datenbank geändert werden?

Nicht zwingend, aber die Datenbank- und Treiberkonfiguration muss Unicode korrekt transportieren. Kritisch sind Spaltentypen (z. B. VARCHAR vs. NVARCHAR), Charsets/Collations und Text in BLOB/Memo-Feldern ohne definierte Kodierung. Oft ist eine Bereinigung des Datenzugriffs (z. B. über FireDAC) sinnvoll.

Wie lässt sich die Migration absichern, damit keine Zeichen „still“ verloren gehen?

Durch explizite Encoding-Regeln an Ein-/Ausgängen, Roundtrip-Tests (Import→Export), Datenbank-Roundtrips mit problematischen Zeichen sowie Logging, das Unicode korrekt schreibt. Zusätzlich sollten Warnungen zu impliziten Konvertierungen nicht pauschal unterdrückt werden.

Unicode migration of legacy Delphi projects

The Unicode migration of legacy Delphi projects is a necessary step in many companies because existing applications otherwise increasingly hit limits with international data, modern operating systems, integrations and new interfaces. In practice this is rarely a “recompile and done” case. Delphi made fundamental changes to the standard string types starting with the Unicode releases (from Delphi 2009). That shifts assumptions about character encoding, memory layout and API signatures. Underestimating this produces creeping data errors, broken exports, unclear support cases and security risks.

This article provides a technically sound approach: how to analyse the codebase, sensibly limit the scope, reduce risks at hotspots (databases, files, Windows APIs, COM, REST services) and secure the migration so that operation and ongoing development can run in parallel. The focus is on Delphi-typical pitfalls in VCL applications, services and interfaces — with an eye to modernization paths that can later include topics such as BDE replacement with native connectivity, REST servers or multiplatform scenarios.

Why the Unicode switch in Delphi is often “bigger than expected”

In classic Delphi releases string was an ANSI string (depending on the system code page). Since Delphi 2009 string is by default a UnicodeString (UTF-16). At the same time many libraries and VCL classes were migrated to Wide APIs. That is fundamentally positive because it robustly supports international characters. However: legacy code has often been developed for years around the assumptions “1 character = 1 byte”, “PChar is PAnsiChar” or “Length() corresponds to byte count”.

The typical reasons migrations become more complex:

Implicit conversions do succeed but alter data (especially for files, interfaces or database BLOB/text fields).
Byte-oriented code (streams, buffers, hashing, encryption) silently becomes incorrect when string contents are interpreted as raw bytes.
Third-party components are sometimes ANSI-only or use their own string types and callbacks.
External environment (Windows APIs, COM, printing/reporting, EDI, CSV, XML/JSON) expects specific encodings.

The goal therefore should not be to “change as little as possible”, but to change deliberately where data flows and encodings must be defined. A clean Unicode migration is also an opportunity to finally document and test unclear encoding boundaries.

Technical basics: Delphi string types, encodings and their side effects

string, UnicodeString, AnsiString, WideString — what really matters in the project

For the migration it is crucial which types are used at interfaces and in core functions:

string: Since Delphi 2009 a UnicodeString (UTF-16, reference-counted, immutable semantics via Copy-on-Write).
AnsiString: Byte string with an associated code page (depending on the Delphi version a code page can be carried). Appropriate when an external interface explicitly requires a specific 8-bit encoding.
UTF8String: In newer Delphi versions often an alias/AnsiString with a UTF-8 code page; practical for REST/JSON and many protocols.
WideString: BSTR (COM), memory-managed via SysAllocString; today usually only required for certain COM interops.
PChar: Since Unicode Delphi PWideChar. This is one of the most common breakpoints in Windows API calls.

Mixing these types produces conversions. Some are correct, some are surprising: a conversion is only “correct” if you know which code page the source uses and which the target expects.

UTF-16 internally, UTF-8 externally: a pragmatic guideline

In Delphi VCL applications it often makes sense to work consistently internally with string (UTF-16). Externally (REST, files, messaging) reality is dominated by UTF-8. A robust rule of thumb is therefore:

Internal: string/UnicodeString as the standard.
Boundaries: explicitly convert at input/output using TEncoding.UTF8 (or defined ANSI code pages).
Byte-based processing: use TBytes instead of strings.

That reduces implicit conversions and makes responsibilities verifiable: “Where do bytes become text, and with which encoding?”

Inventory: where Unicode typically breaks in old Delphi projects

Before touching code, a structured inventory is worthwhile. In Unicode migrations of legacy Delphi projects the sources of error are usually not evenly distributed but concentrate on certain hotspots.

1) Database access and field types (BDE, ADO, FireDAC)

Many legacy projects still use BDE or older data access layers. Typical problems here are:

Mapping of database charsets to Delphi strings (ANSI vs. Unicode field types).
“Text” stored in BLOBs or memo fields without a defined encoding.
SQL statements as strings that are interpreted differently for umlauts/Unicode characters.

If a modernization is due anyway, a Unicode migration pairs well with a cleanup of data access, e.g. towards BDE-Ablosung mit nativer Anbindung and clear charset configuration (for example with PostgreSQL or MariaDB). Important: a migration should not automatically force a database migration, but the interface between the DB and Delphi must be unambiguous.

2) File and stream I/O: CSV, INI, proprietary formats, import/export

A classic: files were previously read/written via AssignFile/ReadLn, TFileStream or TStringList.LoadFromFile without setting an encoding. In Unicode Delphi the runtime then decides heuristically (BOM) or uses default encodings. This leads to:

misinterpreted umlauts (Ã¤, Ã¶) in CSV/log files,
incorrect length values in proprietary formats,
incompatibilities with external partners who expect ISO-8859-1 or Windows-1252.

A clean solution is to define a fixed encoding per file format and anchor that in code and documentation. For CSV/JSON UTF-8 is usually the right standard; for old interfaces sometimes Windows-1252. The decisive factor is explicitness.

3) Windows API, PChar, buffer sizes and message handling

Many Delphi applications call WinAPI functions or work with buffers. Common breakpoints:

Using PChar with functions that have ANSI or Wide variants (…A/…W).
Buffer sizes calculated in bytes, while Char is 2 bytes in UTF-16.
Pointer arithmetic and record layouts designed around 1-byte chars.

Precise refactoring is required here: either consistently use Wide APIs or deliberately call the ANSI variant and work with AnsiString/code pages. “It somehow compiles” is not a quality criterion.

4) COM, ActiveX, Office automation and third-party libraries

COM interfaces often work with BSTR (WideString). Older Delphi versions had different default strings, so code happened to work “by accident”. In Unicode Delphi double conversions or incorrect type assumptions in wrappers frequently occur. Third-party libraries are similarly critical: some provide callbacks as PAnsiChar, others expect null-terminated byte strings.

It is worthwhile to classify dependencies: which library is Unicode-ready, which is not, and which can be replaced or encapsulated? Encapsulation is often the fastest way to confine Unicode legacy issues to a well-defined area.

Strategy: treating the Unicode migration of legacy Delphi projects as a controlled modernization program

The safest migration approach is a multi-stage program that makes risks visible early and keeps the application runnable.

Step 1: define scope and prioritise code hotspots

Not every source file needs immediate adjustment. Prioritise by data flow and risk:

External interfaces (REST API, TCP/IP, files, e-mail, printing/reporting).
Data access (SQL, ORM/data modules, BDE/FireDAC layers).
String-near utility functions (parsers, formatters, encoders/decoders).
Integrations (COM, DLL imports, hardware interfaces).

The result should be a list of places where “encoding is a specification”. These spots will be made testable later.

Step 2: deliberately tighten compiler/project options and warnings

In many projects warnings have been disabled over years. For a Unicode migration this is counterproductive. It makes sense to re-enable warnings and take conversion warnings seriously. It also helps to define project-wide rules, for example: no implicit AnsiString conversions at I/O boundaries, use TEncoding for file operations, no “PChar tricks” without clear context.

Step 3: introduce “encoding boundaries” as a technical layer

A practical architectural measure is to introduce small adapters/helpers that define exactly how external data enters and leaves. Examples:

CSV reader/writer: always use TEncoding.UTF8 (or a defined code page) and explicit separator rules.
REST client/server: JSON always as UTF-8 bytes, set headers correctly, do not stream the body as raw strings.
Windows API wrapper: central functions that cleanly encapsulate Wide/Ansi handling.

This prevents “encoding decisions” from being scattered across the codebase.

Typical code traps and how to fix them cleanly

Length, SizeOf, ByteLength: when character length and byte size diverge

In ANSI times Length(s) was often misused as the byte count. In UTF-16 this is incorrect. When you need byte arrays, convert explicitly:

For UTF-8: TEncoding.UTF8.GetBytes(s)
For a defined ANSI code page: TEncoding.GetEncoding(1252).GetBytes(s) (only when technically correct)

For buffer sizes in API calls check whether the function expects character or byte units. Many Wide APIs expect a character count, not bytes. Documentation and signature decide, not intuition.

PAnsiChar vs. PWideChar: DLL imports and external protocols

With DLL imports there is a high risk that signatures in the Delphi code no longer match. Decide what the DLL expects:

If the DLL expects UTF-8, passing PAnsiChar(UTF8String) is common, but you must control lifetime and null-termination.
If it expects UTF-16, use PWideChar and wide strings.

In any case the imports should be encapsulated in a separate unit so that the string policy does not spread across the entire project.

Formatting, case conversion, comparison: locale and normalization

Unicode also raises semantic issues: case folding is not trivial in all languages, and characters can have different normalization forms. In typical enterprise applications this is less critical than in consumer text processing, but it affects:

sorting and filtering (e.g. in grids or search functions),
case-insensitive comparisons for key values,
generation of filenames or identifiers.

It is important to set a clear rule: which values are “keys” (for example article numbers, customer codes) that should remain ASCII-close, and which are “texts” that must be fully Unicode-capable? This separation reduces follow-up errors.

GUI/reporting: fonts, printing, PDF and component behaviour

VCL has been Unicode-capable since the Unicode releases, but behaviour depends on components and output paths. Risks arise with:

older report engines or PDF generators that assume ANSI,
barcode/label printing that requires specific code pages,
hard-coded fonts or character sets.

Plan early tests with realistic sample data (names, places, special characters, non-Latin scripts if relevant). The value is less in “can Unicode” and more in demonstrating: “This output is correct in our context.”

Data and persistence: Unicode does not end at the code

Set database charsets and collations cleanly

A Unicode migration is only stable if databases and drivers are configured correctly. Examples:

With PostgreSQL UTF-8 is usually the default; nevertheless client encoding and driver behaviour must be checked.
With SQL Server the distinction between VARCHAR and NVARCHAR is relevant; choosing the wrong column type can lose characters.
With MariaDB/MySQL charset/collation (e.g. utf8mb4) are crucial so that 4-byte characters are not truncated.

In Delphi code parameters and field types should be used so that Unicode is not “converted back” on the way. FireDAC usually offers better control than very old access layers.

Legacy file formats: migration rules instead of silent conversion

If your application has produced files over years (export formats, archive files, proprietary structures), you must define:

Which existing files remain “as they are” and are read with the correct interpretation?
Which formats will be elevated to UTF-8?
Are there version fields/headers to distinguish new and old files unambiguously?

Silent conversion without marking is risky because errors often surface late. Better: version, detect clearly, migrate deliberately.

Quality assurance: tests that actually find Unicode issues

Unicode errors are often data-dependent. Therefore “happy path” tests are insufficient. A useful test set covers the problematic areas:

Roundtrip tests: import → process → export, then byte-accurate comparison (for defined formats).
DB roundtrip: write/read texts with umlauts, accents and, if applicable, non-Latin characters; verify equality.
Interface tests: REST requests as UTF-8, headers, JSON escaping, logging.
Regression: reproduce legacy data and typical user cases, especially for search, filter and sort.

For B2B systems it is also relevant that errors become observable: logging must not destroy encodings. Writing logs as ANSI loses exactly the information needed in an incident.

Planning and effort: what really drives complexity

The effort of migrating Unicode in legacy Delphi projects depends less on “lines of code” and more on couplings and external dependencies:

Many integrations (DLLs, COM, devices, ERP/DMS/CRM) increase verification effort because encodings are relevant at every boundary.
Historical formats (old exports, customer-specific CSVs) require migration rules and compatibility strategies.
Mixed Delphi versions or multiple products from a common code base increase coordination needs.
Old data access layers (e.g. BDE) can indirectly block Unicode and suggest a modernization.

In practice a proven approach is to stabilise Unicode first in the core and the most critical data flows. Then modules can be migrated step by step. That reduces risk and avoids long “big bang” phases without releases.

Placement in modernization paths: REST, services, multiplatform

Unicode is often a cornerstone when legacy software is to be modernised. Typical follow-up questions include:

REST servers or REST APIs to be added (handle JSON/UTF-8 cleanly).
Operate Windows services or Linux services reliably (logging, config files, protocols).
Stepwise UI modernization in VCL, later possibly multiplatform clients.

The order matters: if you build new interfaces, encoding rules should be established beforehand. A Unicode migration “on the side” during interface development otherwise leads to hard-to-test failure patterns because cause and effect get mixed.

For internal linking in the magazine it makes sense to place adjacent topics such as Delphi modernization, FireDAC data access or architecture of REST servers as in-depth articles so readers can jump to the next technical step.

Conclusion: Unicode migration is a risk topic — with the right method it becomes manageable

The Unicode migration of legacy Delphi projects is not a cosmetic upgrade but a correction of fundamental assumptions about text, bytes and interfaces. Those who proceed in a structured way gain more than “umlauts work again”: data flows become unambiguous, integrations more robust, and later modernization (e.g. REST servers, services, database cleanup) becomes easier because encodings no longer happen implicitly “somewhere”.

If you need a concrete migration plan for your Delphi application, a risk analysis of the hotspots or support with implementation, the fastest next step is a technical initial discussion about your constraints and dependencies: get in touch.

In the professional context Delphi Unicode Migration and Delphi Ansi Zu Unicode also play an important role when integrations, data flows and further development must work together cleanly.

Discuss a project or modernization initiative with Net-Base.

Unicode Migration of Legacy Delphi Projects: Pitfalls, Strategy, and Clean Implementation