Data dictionaries

Regular expressions are a key part of API definitions that define the format of accepted inputs and outputs among other things. As hardcoding string values rarely makes any sense in API definitions, regular expressions are needed to define how those strings should look like.

However, regular expressions can get complex and it may be difficult to ensure that they are consistently used throughout. Even in a same developer team different developers could write the pattern for the same regular expression in different ways, some of them good, some bad, some possibly even very bad. The more ways to write a pattern, the larger the potential attack surface, as attackers have more options to attempt to exploit.

Data dictionaries allow organization administrators to define a dictionary of formats that should be used in APIs in their organization. This way you can harmonize what formats your APIs accept and consequently increase their security as the stricter data definition quality for input and output data narrows down the attack surface. For developers, this means that they do not have to reinvent the wheel but can check the data dictionaries for formats already in use and use the existing ones.

The screenshot shows the details for the format entry 'date' from the standard data dictionary maintained by 42Crunch.

Standard data dictionary

All organizations get a data dictionary with formats that follow standard patterns, such as date, email, hostname, or IP address. You can add more formats and edit the existing ones, for example, if your company conventions specify using a different date format than the default pattern provided. You cannot delete the standard data dictionary, but you can reset it back to the formats it originally came with.

Data dictionaries are not yet available in the free community organization.

In addition to the standard data dictionary, you can add more data dictionaries, for example, to dedicate them to different kinds of formats used in your APIs. This makes it easy for API developers then to find and use the patterns in their API definitions, ensuring that same patterns are consistently used throughout all APIs in your organization.

Types of formats

You can define regular expression patterns for strings and lists of enums in your data dictionaries. The available properties depend on the format type:

Parameter Format type Description
Pattern String

A regular expression for the accepted string pattern

The OpenAPI Specification defines the pattern format as ECMA Script. 42Crunch API Security Platform uses the pattern format PCRE1 compiled with the PCRE_JAVASCRIPT_COMPAT option which uses ECMA Script syntax for constructs where ECMA Script and PCRE would be different, such as the support of \U and \u for unicode.

Example Both An example of how an acceptable value could look.
Min length and max length String The length limitations for accepted string values. These must match length limits defined in the regular expression pattern.
Enum Enum List of allowed enum values (for example, enum1, enum2, enum3).

Both format types also let you define additional details that characterizes the data in question:

  • Sensitivity level: How sensitive is the data in question, from not sensitive at all to critical (defaults to medium).
  • GDPR or PII data: Can the data in question include information that falls under General Data Protection Regulation, or is Personally Identifiable Information.
  • Identifier: Is the data in question an object identifier (for example, UUID, GUID).