Deserialization of Untrusted Data in pandas
Module
The Python pandas
module is a data analysis and manipulation tool. It
contains a fucntion to read serialized data using the pickle format. Pickle is
not secure because it can be used to deserialize malicious code. For example,
an attacker could create a pickle file that contains malicious code and then
trick a user into opening the file. When the user opens the file, the malicious
code would be executed.
Example
import pickle
import pandas as pd
df = pd.DataFrame(
{
"col_A": [1, 2]
}
)
pick = pickle.dumps(df)
pd.read_pickle(pick)
Remediation
Consider signing data with hmac if you need to ensure that pickle data has not been tampered with.
Alternatively if you need to serialize sensitive data, you could use a secure serialization format, such as JSON or XML. These formats are designed to be secure and cannot be used to execute malicious code.
False Positives
In the case of a false positive the rule can be suppressed. Simply add a
trailing or preceding comment line with either the rule ID (PY511
) or
rule category name (deserialization_of_untrusted_data
).
- Using rule ID
- Using category name
import pickle
import pandas as pd
df = pd.DataFrame(
{
"col_A": [1, 2]
}
)
pick = pickle.dumps(df)
# suppress: PY511
pd.read_pickle(pick)
import pickle
import pandas as pd
df = pd.DataFrame(
{
"col_A": [1, 2]
}
)
pick = pickle.dumps(df)
# suppress: deserialization_of_untrusted_data
pd.read_pickle(pick)