Skip to main content

Deserialization of Untrusted Data in pandas Module

PY511
deserialization_of_untrusted_data
CWE-502
⚠️ Warning
🔒 Professional Plan

The Python pandas module is a data analysis and manipulation tool. It contains a fucntion to read serialized data using the pickle format. Pickle is not secure because it can be used to deserialize malicious code. For example, an attacker could create a pickle file that contains malicious code and then trick a user into opening the file. When the user opens the file, the malicious code would be executed.

Example

import pickle
import pandas as pd


df = pd.DataFrame(
{
"col_A": [1, 2]
}
)
pick = pickle.dumps(df)

pd.read_pickle(pick)

Remediation

Consider signing data with hmac if you need to ensure that pickle data has not been tampered with.

Alternatively if you need to serialize sensitive data, you could use a secure serialization format, such as JSON or XML. These formats are designed to be secure and cannot be used to execute malicious code.

False Positives

In the case of a false positive the rule can be suppressed. Simply add a trailing or preceding comment line with either the rule ID (PY511) or rule category name (deserialization_of_untrusted_data).

Fix Iconfix
import pickle
import pandas as pd


df = pd.DataFrame(
{
"col_A": [1, 2]
}
)
pick = pickle.dumps(df)
# suppress: PY511
pd.read_pickle(pick)

See also