Net-Base Magazine

06.06.2026

High-Performance REST Server in Delphi: Request Limits, Thread Pool and Clean Overload Behavior (Source Snippets)

A high-performance REST server in Delphi is not made fast by "fast JSON" alone, but by controlled concurrency, hard timeouts and clean overload behavior. This article demonstrates a practical concurrency gate with a semaphore, 429/503 responses...

06.06.2026

From magazine topic to project implementation

Relevant service and technical pages for this post

Why “High Performance” for REST in Delphi often fails because of concurrency

A High Performance REST Server Delphi is rarely limited in practice by pure CPU time per request, but by uncontrolled concurrency: too many simultaneous requests, too many concurrent database queries, or blocking I/O (file, network, database). The result does not feel like “a little slower” but like a chain reaction: more threads, growing queues, connection-pool collapse, increasing latencies, client-side timeouts and, ultimately, a server that still “runs” but no longer provides stable responses.

The remedy is not a single trick but a deliberate overload behavior: when the server reaches its limits, it must reject requests early and deterministically (typically HTTP 429 or 503), instead of letting requests run into an infinite queue. That is precisely what this source snippet is for: a lightweight Concurrency-Gate (semaphore) plus timeouts that can be integrated into existing REST endpoints — regardless of whether you use Indy, WebBroker, Horse or your own HTTP layer.

Architecture idea: Concurrency-Gate before the “expensive part”

The basic idea is simple: before the expensive part (database access, complex reports, large JSON responses) a token is reserved from a semaphore. If no token is available, an immediate controlled response is returned. Important: this gate must be reliably released (try/finally), and it must be placed in the code path that is actually expensive — not just at the very start of the request handler if parser/router/authentication still follow.

This does not “optimize load away” but channels it: the server handles fewer requests concurrently, but with more stable latencies. In individual enterprise applications this is usually more valuable than sporadic best times in synthetic benchmarks.

Source snippet: Request limiter with timeout, 429/503 and telemetry hooks

The following Delphi code implements a Concurrency-Gate as class TRestRequestGate. It is based on TSemaphore (from System.SyncObjs; a semaphore is a counter for limited concurrent access). The gate call either returns a “Lease” object (RAII-like: release in the destructor) or chooses an immediate overload response. Additionally, there are hooks for logging/monitoring so you can see in operation why requests were rejected.

Delphi
unit RESTRequestGate;

interface

uses
  System.SysUtils,
  System.Classes,
  System.SyncObjs,
  System.Diagnostics;

type
  // Minimal context for logging/tracing; can, for example, be extended with user/route.
  TRESTGateContext = record
    RequestId: string;
    Route: string;
    RemoteIp: string;
  end;

  TRESTOverloadDecision = (odAccepted, odRejectedBusy, odRejectedTimeout);

  // Hook for operational telemetry (e.g., to file, syslog, Prometheus exporter, etc.)
  TRESTGateEvent = reference to procedure(const Ctx: TRESTGateContext;
                                         Decision: TRESTOverloadDecision;
                                         WaitedMs: Integer;
                                         InFlight: Integer);

  // Lease object: releases the token in the destructor.
  TRESTGateLease = class
  private
    FSemaphore: TSemaphore;
    FInFlightCounter: PInteger;
    FReleased: Boolean;
  public
    constructor Create(ASem: TSemaphore; ACounter: PInteger);
    destructor Destroy; override;
    procedure Release;
  end;

  TRESTRequestGate = class
  private
    FSem: TSemaphore;
    FMaxInFlight: Integer;
    FInFlight: Integer;
    FOnEvent: TRESTGateEvent;
  public
    constructor Create(AMaxInFlight: Integer);
    destructor Destroy; override;

    // TimeoutMs = 0: no waiting time, immediate 429/503
    function TryAcquire(const Ctx: TRESTGateContext; TimeoutMs: Cardinal;
                        out Lease: TRESTGateLease;
                        out WaitedMs: Integer;
                        out Decision: TRESTOverloadDecision): Boolean;

    property OnEvent: TRESTGateEvent read FOnEvent write FOnEvent;
    property MaxInFlight: Integer read FMaxInFlight;
    function InFlight: Integer;
  end;

implementation

uses
  System.Math;

{ TRESTGateLease }

constructor TRESTGateLease.Create(ASem: TSemaphore; ACounter: PInteger);
begin
  inherited Create;
  FSemaphore := ASem;
  FInFlightCounter := ACounter;
  FReleased := False;
end;

destructor TRESTGateLease.Destroy;
begin
  Release;
  inherited;
end;

procedure TRESTGateLease.Release;
begin
  if FReleased then
    Exit;
  FReleased := True;

  // First decrement the counter, then release the semaphore.
  TInterlocked.Decrement(FInFlightCounter^);
  FSemaphore.Release;
end;

{ TRESTRequestGate }

constructor TRESTRequestGate.Create(AMaxInFlight: Integer);
begin
  inherited Create;
  if AMaxInFlight <= 0 then
    raise EArgumentException.Create('AMaxInFlight must be > 0');

  FMaxInFlight := AMaxInFlight;
  FInFlight := 0;

  // InitialCount = MaxCount = AMaxInFlight
  FSem := TSemaphore.Create(nil, AMaxInFlight, AMaxInFlight, '');
end;

destructor TRESTRequestGate.Destroy;
begin
  FSem.Free;
  inherited;
end;

function TRESTRequestGate.InFlight: Integer;
begin
  Result := TInterlocked.CompareExchange(FInFlight, 0, 0);
end;

function TRESTRequestGate.TryAcquire(const Ctx: TRESTGateContext; TimeoutMs: Cardinal;
  out Lease: TRESTGateLease; out WaitedMs: Integer; out Decision: TRESTOverloadDecision): Boolean;
var
  Sw: TStopwatch;
  WaitRes: TWaitResult;
  CurrentInFlight: Integer;
begin
  Lease := nil;
  WaitedMs := 0;
  Decision := odRejectedBusy;

  Sw := TStopwatch.StartNew;
  if TimeoutMs = 0 then
    WaitRes := FSem.WaitFor(0)
  else
    WaitRes := FSem.WaitFor(TimeoutMs);

  WaitedMs := Integer(Min(Sw.ElapsedMilliseconds, High(Integer)));

  case WaitRes of
    wrSignaled:
      begin
        CurrentInFlight := TInterlocked.Increment(FInFlight);
        Lease := TRESTGateLease.Create(FSem, @FInFlight);
        Decision := odAccepted;
        Result := True;

        if Assigned(FOnEvent) then
          FOnEvent(Ctx, Decision, WaitedMs, CurrentInFlight);
      end;

    wrTimeout:
      begin
        // wrTimeout for TimeoutMs > 0: deliberate waiting, but limited.
        Decision := odRejectedTimeout;
        Result := False;
        if Assigned(FOnEvent) then
          FOnEvent(Ctx, Decision, WaitedMs, InFlight);
      end;
  else
    begin
      // wrAbandoned/error cases: conservatively reject
      Decision := odRejectedBusy;
      Result := False;
      if Assigned(FOnEvent) then
        FOnEvent(Ctx, Decision, WaitedMs, InFlight);
    end;
  end;
end;

end.

Purpose: stability under load instead of „everything at once“

With MaxInFlight you define how many requests may be in the „expensive part“ concurrently. This is deliberately not „number of CPU cores“ but an operational parameter. For database-heavy endpoints it is often sensible to set MaxInFlight in relation to the DB connection pool (for example Pool = 20, MaxInFlight = 12 to 16) so that not every request blocks a connection and additional threads don’t pile up.

Constraints and pitfalls

  • Try/Finally is mandatory: The lease must be released reliably. If you have exceptions in the endpoint, the gate will otherwise become „leaky“ and the server may remain permanently „busy“.
  • Choose timeout sensibly: TimeoutMs=0 is a hard limit (reject immediately). A short timeout (typically 50 to 150 ms) smooths peaks without building real queues.
  • Don’t gate too early: Authentication (for example Bearer/JWT) or routing can be cheap; the semaphore should engage before the truly expensive section. Conversely: if auth is expensive (e.g. against an external identity system), that must also be limited.
  • 429 vs 503: HTTP 429 („Too Many Requests“) is appropriate when clients are expected to retry. 503 („Service Unavailable“) fits when the service is temporarily unable to accept requests sensibly. In both cases a Retry-After header is recommended.

Integration into REST-handler: Indy/WebBroker/Horse pragmatic

The snippet is intentionally framework-neutral. You only need a place where requests „pass through.“ Typical choices are a global singleton or a gate per route group (for example „/reports“ smaller, „/health“ without a gate). Example integration pattern:

  • Populate context (RequestId, Route, RemoteIp)
  • TryAcquire with a short timeout
  • On denial immediately write the response (429/503) and return
  • Lease remains in scope until after the expensive section

In Horse (middleware) the gate sits close to a route group. In WebBroker you can implement it in the respective action handler. In Indy it depends on whether you have one thread per request; the gate still works as long as the expensive sections are properly bounded.

High Performance REST Server Delphi: overload responses that don’t „poison“ clients

Overload responses are more than status codes. If clients aggressively resend immediately on 429/503, you will get a retry storm. In heterogeneous system landscapes (mobile apps, C# Services, legacy clients) consistent behavior helps:

  • Retry-After: for example 1 to 3 seconds, depending on the endpoint. This provides a clear pacing signal.
  • Short body: A small JSON like {"error":"server_busy","requestId":"..."} is sufficient. Large error objects consume CPU and bandwidth.
  • Health endpoint undamped: Monitoring should still report under load (optionally with a „degraded“ flag).

If you run a reverse proxy like nginx in front: tune timeouts and buffering there. A proxy can relieve (TLS termination, Keep-Alive) but it can also shift load (for example by buffering large request bodies). In operation it matters that limits are consistent: Proxy-Timeout > App-Timeout, otherwise clients see a „Gateway Timeout“ even though the app would have rejected cleanly.

Threading, DB pools and Keep-Alive: Where it breaks in practice

The Gate solves the “too many at once” problem, but it does not automatically prevent a single request from binding an excessive amount of resources. Three typical tipping points from Delphi projects occur exactly at the interfaces between threading, database and HTTP connections:

  • A request blocks multiple scarce resources: First a DB connection, then an external HTTP call, then a file access. If all of that happens in the same request thread, the blocking time multiplies. The Gate will then limit concurrency, but throughput drops drastically. It is worth decoupling the dependencies here (e.g. make external calls asynchronous, precompute via a job queue).
  • BDE-replacement with native integration-Pooling and transactions: BDE-Ablosung mit nativer Anbindung can pool connections, but a “long” transaction (e.g. because JSON construction or business checks occur between StartTransaction and Commit) holds the connection unnecessarily. A clean practice is to keep the transaction as tight as possible around the actual statements and to serialize or validate outside the transaction where domain logic allows.
  • HTTP Keep-Alive as a hidden memory hog: Keep-Alive reduces handshakes, but with many idle clients it can lead to too many open sockets. Especially in Windows- and Linux-services you then don’t see “CPU high” but rather “handles/FDs full” or RAM consumed by buffers. Clear idle timeouts on the server and the reverse proxy and a per-client-IP limit (where the environment allows) help here.

The consequence: MaxInFlight is not a static value. It depends on your slowest, scarcest resource (DB, external systems, storage) and on how well a request holds those resources together.

Performance levers alongside the Gate: don’t mix JSON, DB and I/O

The Gate stabilizes, but it does not replace clean endpoint economy. Three chokepoints in Delphi REST servers repeatedly appear:

  • JSON building with unnecessary intermediate strings: Load often arises from many temporary Unicode strings. Where possible, build in a streaming-oriented way (writer/stream) instead of large intermediate objects, especially for list endpoints.
  • Database access “per item”: N+1 queries and per-row lookups are the classic case. Better: targeted joins, batch queries, server-side aggregation. For very large result sets, add pagination with stable sorting (so pages do not “jump”).
  • Blocking I/O in the request thread: File access or external HTTP calls should either be strictly limited or moved into an asynchronous pipeline. Otherwise you block expensive threads for “waiting”.

For established digital enterprise solutions this is often the crux: an endpoint was added “quickly” and works until real load and data volumes arrive. Then it becomes apparent whether architectural boundaries were drawn cleanly (data access layer, caching, bulk strategies, clear timeouts).

Debugging and operations: what you should measure

The hook OnEvent is intentionally simple. In practice you should record at least the following values:

  • InFlight (current concurrency at the Gate)
  • WaitedMs (how much “queueing” you allow)
  • Decision (accepted/busy/timeout)
  • Route/RemoteIp (coarse root-cause analysis, without disregarding data protection)

This gives you a signal whether limits are too strict (too many 429s) or too lax (high WaitedMs, rising latencies). And you can see whether individual routes dominate. For Windows- and Linux-Services this is decisive in everyday operation: Without telemetry, a performance problem quickly becomes a guessing game between network, database, proxy and application.

Unusual but extremely helpful: „WaitedMs“ as an early warning indicator

Many teams look only at response time and CPU. WaitedMs is often the better indicator because it shows that requests are already waiting before the actual work. If WaitedMs rises while CPU remains moderate, the scarce resource is often not the CPU but a pool (DB connections), a lock in the business logic, or an external downstream service. That saves time in root-cause analysis, because you can target your investigation toward “pool/lock/I/O” instead of “compiler optimization”.

Variants: per-route gates, priorities and „Fast Lane“

A single gate for everything is simple, but not always ideal. Sensible variants:

  • Gate per route group: „/reports“ strict, „/api/orders“ moderate, „/health“ open. This prevents expensive report requests from displacing core processes.
  • Fast lane for admin/monitoring: Separate gate with low concurrency so operational actions remain possible under load.
  • Budget-based limits: When response sizes vary widely, an additional byte budget can help (e.g. a maximum of X MB generated concurrently). This is more complex, but realistic for large downloads.

Important: Prioritization quickly becomes political („my endpoint is more important“). It remains technically stable when priorities are tied to processes (e.g. order entry before reporting), not to roles or departments.

Conclusion: Is the gate worth it — and where does the approach break down?

A Concurrency-Gate is a pragmatic building block for a high-performance REST server in Delphi, because it makes overload controllable and keeps your systems stable under peak load. It’s especially worthwhile if you have database-bound endpoints, if a reverse proxy stands in front, or if multiple clients (Legacy, portals, services) generate load in waves.

The limits are clear: If the actual work per request is too expensive (inefficient queries, large JSON objects, blocking external systems), the gate only masks symptoms. Then data access, caching strategies, timeouts and, where appropriate, asynchronous processing (queue/job system) must be addressed. As a safety belt in operation, however, the gate is often the difference between „temporarily sluggish“ and „completely unusable“.

If you want to introduce overload behavior control into an existing Delphi REST-API and REST-Server, or balance limits cleanly with database and proxy timeouts: discuss the project or modernization effort with Net-Base.

In the technical context, thread-pool Delphi and HTTP 429 Too Many Requests also play an important role when integrations, data flows and ongoing development must work together cleanly.

Discuss a project or modernization initiative with Net-Base.

Next step

When the topic becomes a real project, architecture, the existing system landscape and operations should be considered together early on.

We support not only with individual issues, but also when source snippets, legacy topics, or portal ideas are to be turned into a robust enterprise project.

  • Current state, target state and technical risks are assessed jointly.
  • REST, data access, portals and rollout are not deferred as afterthoughts.
  • You can determine early which path is economically and operationally viable.

Share post

Share this post directly

LinkedIn, X, XING, Facebook, WhatsApp and email are available immediately. For Instagram, we will prepare the link and a short caption immediately.

Email

Instagram opens in a new tab. The link and short text are copied to the clipboard beforehand.