Built-in string scalar used in output instead of a custom scalar

Issue ID: graphql-data-output-custom-scalar-string-needed

Average severity: Critical

Description

The schema exposes data using a built-in scalar of the type String in an output position instead of using a domain-specific constrained custom scalar.

For more details, see the GraphQL specification.

Possible exploit scenario

The GraphQL String scalar accepts any UTF-8 text of arbitrary length. It does not define:

  • Maximum length
  • Minimum length
  • Allowed character set
  • Structural format
  • Encoding expectations
  • Domain semantics

Unconstrained String in output position increases the risk of:

  • Accidental exposure of sensitive information
  • Leakage of internal system details
  • Contract instability (format changes across versions)
  • Downstream injection risks in consuming applications
  • Oversized responses leading to performance or availability issues
  • Inconsistent encoding behavior across services

For example, returning internal error messages as a built-in String may expose stack traces, returning file paths may leak infrastructure layout, and returning unrestricted free text may propagate unvalidated content into downstream systems (such as logs, HTML rendering, or report generators). In federated environments, uncontrolled string outputs may propagate across subgraphs and amplify exposure risk.

Because strings often carry business-critical or security-sensitive information, unconstrained output modeling significantly increases data exposure risk. Using domain-specific custom scalars for strings allows explicitly defining maximum length, allowed formats, sensitivity classification, controlled value sets, and explicit contract guarantees.

Remediation

Replace built-in String with constrained custom scalars that explicitly define validation rules. We recommend that you:

  • Define explicit scalars for output strings
  • Enforce maximum length for all output strings
  • Avoid exposing raw internal values

Explicit modeling of output strings strengthens contract clarity, reduces data leakage risk, and improves security governance across API ecosystems.