Data dictionaries
Regular expressions are a key part of API definitions that define the format of accepted inputs and outputs among other things. As hardcoding string values rarely makes any sense in API definitions, regular expressions are needed to define how those strings should look like.
However, regular expressions can get complex and it may be difficult to ensure that they are consistently used throughout. Even in a same developer team different developers could write the pattern for the same regular expression in different ways, some of them good, some bad, some possibly even very bad. The more ways to write a pattern, the larger the potential attack surface, as attackers have more options to attempt to exploit.
Data dictionaries allow organization administrators to define a dictionary of formats that should be used in APIs in their organization. This way you can harmonize what formats your APIs accept and consequently increase their security as the stricter data definition quality for input and output data narrows down the attack surface. For developers, this means that they do not have to reinvent the wheel but can check the data dictionaries for formats already in use and use the existing ones.
Standard data dictionary
All organizations get a data dictionary with formats that follow standard patterns, such as date, email, hostname, or IP address. You can add more formats and edit the existing ones, for example, if your company conventions specify using a different date format than the default pattern provided. You cannot delete the standard data dictionary, but you can reset it back to the formats it originally came with.
In addition to the standard data dictionary, you can add more data dictionaries, for example, to dedicate them to different kinds of formats used in your APIs. This makes it easy for API developers then to find and use the patterns in their API definitions, ensuring that same patterns are consistently used throughout all APIs in your organization.
Types of formats
You can define regular expression patterns for strings and lists of enums in your data dictionaries. The available properties depend on the format type:
Parameter | Format type | Description |
---|---|---|
Pattern | String |
A regular expression for the accepted string pattern The OpenAPI Specification defines the pattern format as ECMA Script. 42Crunch API Security Platform uses the pattern format PCRE1 compiled with the |
Example | Both | An example of how an acceptable value could look. |
Min length and max length | String | The length limitations for accepted string values. These must match length limits defined in the regular expression pattern. |
Enum | Enum | List of allowed enum values (for example, enum1, enum2, enum3 ). |
Both format types also let you define additional details that characterizes the data in question:
- Sensitivity level: How sensitive is the data in question, from not sensitive at all to critical (defaults to medium).
- GDPR or PII data: Can the data in question include information that falls under General Data Protection Regulation, or is Personally Identifiable Information.
- Identifier: Is the data in question an object identifier (for example, UUID, GUID).