Deserialization of Untrusted Data in the PyYAML
Module
The Python PyYAML
module provides a way to parse and generate YAML data.
However, it is important to be aware that malicious YAML strings can be used
to attack applications that use the json module. For example, a malicious
YAML string could be used to cause the decoder to consume considerable CPU
and memory resources, which could lead to a denial-of-service attack.
Example
import yaml
yaml.load("{}")
Remediation
To avoid this vulnerability, it is important to only parse YAML data from
trusted sources. If you are parsing YAML data from an untrusted source, you
should first sanitize the data to remove any potential malicious code. You can
also switch to the safe_load
function or use the SafeLoader
value to the
Loader
argument.
- yaml.safe_load
- yaml.SafeLoader
import yaml
yaml.safe_load("{}")
import yaml
yaml.load("{}", Loader=yaml.SafeLoader)
False Positives
In the case of a false positive the rule can be suppressed. Simply add a
trailing or preceding comment line with either the rule ID (PY522
) or
rule category name (deserialization_of_untrusted_data
).
- Using rule ID
- Using category name
import yaml
# suppress: PY522
yaml.load("{}")
import yaml
# suppress: deserialization_of_untrusted_data
yaml.load("{}")