docs(blog): data poisoning article #2566

vsauter · 2025-01-07T23:28:57Z

Addition of article on data poisoning.

gru-agent · 2025-01-07T23:29:13Z

TestGru Assignment

Summary

Link	CommitId	Status	Reason
Detail	`752dfbd`	🚫 Skipped	No files need to be tested {"site/blog/data-poisoning.md":"target file(site/blog/data-poisoning.md) not in work scope \n include: */.ts,*/.tsx,*/.js,*/.jsx \n exclude: node_modules,*/.test.ts,*/.test.tsx,*/.spec.ts,*/.spec.tsx,*/.d.ts,*/.test.js,*/.spec.js","site/static/img/blog/data-poisoning/backdoor-panda.png":"target file(site/static/img/blog/data-poisoning/backdoor-panda.png) not in work scope \n include: */.ts,*/.tsx,*/.js,*/.jsx \n exclude: node_modules,*/.test.ts,*/.test.tsx,*/.spec.ts,*/.spec.tsx,*/.d.ts,*/.test.js,*/.spec.js"}

Tip

You can @gru-agent and leave your feedback. TestGru will make adjustments based on your input

github-actions · 2025-01-07T23:49:58Z

Images automagically compressed by Calibre's image-actions ✨

Compression reduced images by 11.2%, saving 47.11 KB.

Filename	Before	After	Improvement	Visual comparison
`site/static/img/blog/data-poisoning/poisoning-panda.jpeg`	420.51 KB	373.40 KB	-11.2%	View diff

168 images did not require optimisation.

mldangelo

Great work on this article. You’ve done a wonderful job distilling your thoughts and presenting them clearly. I’ve left far too many comments. They’re mostly suggestions to consider at your discretion (feel free to ignore). One thing to think about is our target audience. Make sure terms and concepts are familiar to whoever you believe will read this. I really appreciate all your hard work, and I’m excited to see how this shapes up. Keep it up!

site/blog/data-poisoning.md

…' into docs/data-poisoning

mldangelo · 2025-01-08T23:24:49Z

site/blog/data-poisoning.md

+
+# Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide
+
+Data poisoning remains a top concern on the [OWASP Top 10 for 2025](https://owasp.org/www-project-top-10-for-large-language-model-applications/). However, the scope of data poisoning has expanded since the 2023 version. Data poisoning is no longer strictly a risk during the training of Large Language Models (LLMs); it now encompasses all three stages of the LLM lifecycle: pre-training, fine-tuning, and retrieval from external sources. OWASP also highlights the risk of model poisoning from shared repositories or open-source platforms, where models may contain backdoors or embedded malware.


Suggested change

Data poisoning remains a top concern on the [OWASP Top 10 for 2025](https://owasp.org/www-project-top-10-for-large-language-model-applications/). However, the scope of data poisoning has expanded since the 2023 version. Data poisoning is no longer strictly a risk during the training of Large Language Models (LLMs); it now encompasses all three stages of the LLM lifecycle: pre-training, fine-tuning, and retrieval from external sources. OWASP also highlights the risk of model poisoning from shared repositories or open-source platforms, where models may contain backdoors or embedded malware.

Data poisoning remains a top concern on the [OWASP Top 10 for 2025](https://owasp.org/www-project-top-10-for-large-language-model-applications/). However, the scope of data poisoning has expanded since the 2023 version. Data poisoning is no longer strictly a risk during the training of Large Language Models (LLMs); it now encompasses all stages of the LLM lifecycle, including: pre-training, fine-tuning, and retrieval from external sources. OWASP also highlights the risk of model poisoning from shared repositories or open-source platforms, where models may contain backdoors or embedded malware.

mldangelo · 2025-01-08T23:25:37Z

site/blog/data-poisoning.md

+
+When exploited, data poisoning can degrade model performance, produce biased or toxic content, exploit downstream systems, or tamper with the model’s generation capabilities.
+
+Understanding how these attacks work and implementing preventative measures is crucial for developers, security engineers, and technical leaders responsible for maintaining the security and reliability of these systems. This comprehensive guide delves into the nature of data poisoning attacks and offers strategies to safeguard against these threats.


can you please take another pass at this section?

mldangelo · 2025-01-08T23:26:45Z

site/blog/data-poisoning.md

+
+Data poisoning attacks are malicious attempts to corrupt the training data of an LLM, thereby influencing the model's behavior in undesirable ways. These attacks typically manifest in three primary forms:
+
+1. **Poisoning the Training Dataset**: Attackers insert malicious data into the training set during pre-training or fine-tuning, causing the model to learn incorrect associations or behaviors. This can lead to the model making erroneous predictions or becoming susceptible to specific triggers. They may also create backdoors, where they poison the training dataset to cause the model to behave normally under typical conditions but produce attacker-chosen outputs when presented with certain triggers.


i know this is a different list from the one in your intro but it is similar enough that it's confusing to me (there's overlap). maybe not an actionable nit but wanted to share.

mldangelo · 2025-01-08T23:28:28Z

site/blog/data-poisoning.md

+
+## Detection and Prevention Strategies
+
+To protect your LLM applications from [LLM vulnerabilities](https://www.promptfoo.dev/docs/red-team/llm-vulnerability-types/), including data poisoning attacks, it's essential to implement a comprehensive set of detection and prevention measures:


mldangelo · 2025-01-08T23:28:51Z

site/blog/data-poisoning.md

+
+### Implement Data Validation and Tracking to Mitigate Risk of Data Poisoning
+
+- **Enforce Sandboxing**: Implement sandboxing to restrict model exposure to untrusted data sources.


how does this make sense in an LLM context?

mldangelo · 2025-01-08T23:30:31Z

site/blog/data-poisoning.md

+
+- **Enforce Sandboxing**: Implement sandboxing to restrict model exposure to untrusted data sources.
+- **Track Data Origins**: Use tools like OWASP CycloneDX or ML-BOM to track data origins and transformations.
+- **Use Data Versioning**: Use a version control system to track changes in datasets and detect manipulation.


consider rephrasing. I am not sure if this is useful detail in this article. It may be table stakes.

mldangelo · 2025-01-08T23:30:56Z

site/blog/data-poisoning.md

+Regularly monitor the outputs of your LLM for signs of unusual or undesirable behavior.
+
+- **Implement Tracing**: LLM tracing provides a detailed snapshot of the decision-making and thought processes within LLMs as they generate responses. Tracing can help you monitor, debug, and understand the execution of an LLM application
+- **Use Golden Datasets**: Golden datasets in LLMs are high-quality, carefully curated collections of data used to evaluate and benchmark the performance of large language models. Use these datasets as a "ground truth" to evaluate the performance of your models.


pre-deployment evals with promptfoo! consider stating this first.

mldangelo · 2025-01-08T23:31:47Z

site/blog/data-poisoning.md

+
+- **Lock Down Access**: Restrict access to LLM repositories and implement robust monitoring to mitigate the risk of insider threats.
+  - Access to training data should be restricted based on least privilege and need-to-know. Access should be recertified on a regular cadence (such as quarterly) to account for employee turnover or job changes.
+  - All access should be logged and audited. Developer access should be limited to the minimum necessary to perform their job and access should be revoked when they leave the organization.


Is this the right level of detail for your audience?

mldangelo · 2025-01-08T23:32:12Z

site/blog/data-poisoning.md

+
+- **Vet Your Sources**: Conduct thorough due diligence on model providers and training data sources.
+  - Review model cards and documentation to understand the model's training processes and performance. You can learn more about this in our [foundation model security](https://www.promptfoo.dev/blog/foundation-model-security/) blog post.
+  - Verify that models downloaded from Hugging Face [pass their malware scans](https://huggingface.co/docs/hub/en/security-malware) and [pickling scans](https://huggingface.co/docs/hub/en/security-pickle).


this is good

mldangelo · 2025-01-08T23:33:04Z

site/blog/data-poisoning.md

+### Red Team LLM Applications to Detect Data Poisoning
+
+- **Model Red Teaming**: Run an initial [red team](https://www.promptfoo.dev/docs/red-team/) assessment against any models pulled from shared or public repositories like Hugging Face.
+- **Assess Bias**: In Promptfoo's eval framework, use Promptfoo's [classifier assert type](https://www.promptfoo.dev/docs/configuration/expected-outputs/classifier/#bias-detection-example) to assess grounding, factuality, and bias in models pulled from Hugging Face.


not just related to hugging face this can be run on any output. model is hosted on hugging face

data poisoning blog post

752dfbd

vsauter requested a review from typpo January 7, 2025 23:28

Vanessa Sauter added 2 commits January 7, 2025 15:38

formatting

7f2652c

changes to red panda image

4080954

Optimised images with calibre/image-actions

be3fa13

mldangelo reviewed Jan 8, 2025

View reviewed changes

Vanessa Sauter added 3 commits January 8, 2025 15:09

incorporating michael's feedback

f554e4d

merge remote-tracking branch 'refs/remotes/origin/docs/data-poisoning…

563a8c9

…' into docs/data-poisoning

change to image location

af8ad94

mldangelo reviewed Jan 8, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(blog): data poisoning article #2566

docs(blog): data poisoning article #2566

vsauter commented Jan 7, 2025 •

edited by typpo

Loading

gru-agent bot commented Jan 7, 2025 •

edited

Loading

github-actions bot commented Jan 7, 2025

mldangelo left a comment

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025

mldangelo Jan 8, 2025


		# Defending Against Data Poisoning Attacks on LLMs: A Comprehensive Guide

		Data poisoning remains a top concern on the [OWASP Top 10 for 2025](https://owasp.org/www-project-top-10-for-large-language-model-applications/). However, the scope of data poisoning has expanded since the 2023 version. Data poisoning is no longer strictly a risk during the training of Large Language Models (LLMs); it now encompasses all three stages of the LLM lifecycle: pre-training, fine-tuning, and retrieval from external sources. OWASP also highlights the risk of model poisoning from shared repositories or open-source platforms, where models may contain backdoors or embedded malware.


		When exploited, data poisoning can degrade model performance, produce biased or toxic content, exploit downstream systems, or tamper with the model’s generation capabilities.

		Understanding how these attacks work and implementing preventative measures is crucial for developers, security engineers, and technical leaders responsible for maintaining the security and reliability of these systems. This comprehensive guide delves into the nature of data poisoning attacks and offers strategies to safeguard against these threats.


		Data poisoning attacks are malicious attempts to corrupt the training data of an LLM, thereby influencing the model's behavior in undesirable ways. These attacks typically manifest in three primary forms:

		1. Poisoning the Training Dataset: Attackers insert malicious data into the training set during pre-training or fine-tuning, causing the model to learn incorrect associations or behaviors. This can lead to the model making erroneous predictions or becoming susceptible to specific triggers. They may also create backdoors, where they poison the training dataset to cause the model to behave normally under typical conditions but produce attacker-chosen outputs when presented with certain triggers.


		## Detection and Prevention Strategies

		To protect your LLM applications from [LLM vulnerabilities](https://www.promptfoo.dev/docs/red-team/llm-vulnerability-types/), including data poisoning attacks, it's essential to implement a comprehensive set of detection and prevention measures:


		### Implement Data Validation and Tracking to Mitigate Risk of Data Poisoning

		- Enforce Sandboxing: Implement sandboxing to restrict model exposure to untrusted data sources.

docs(blog): data poisoning article #2566

Are you sure you want to change the base?

docs(blog): data poisoning article #2566

Conversation

vsauter commented Jan 7, 2025 • edited by typpo Loading

gru-agent bot commented Jan 7, 2025 • edited Loading

TestGru Assignment

Summary

github-actions bot commented Jan 7, 2025

mldangelo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vsauter commented Jan 7, 2025 •

edited by typpo

Loading

gru-agent bot commented Jan 7, 2025 •

edited

Loading