Meta SAM 3 Prompts How to Segment Anything with Text, Clicks & Examples

Meta SAM 3 Prompts let you point, type, or click and instantly have the model find, segment, and track every object you care about in any image or video.

Business Innovation

Meta SAM 3 Prompt – How to Talk to “Segment Anything with Concepts”

Meta SAM 3 is built around one simple idea: you talk to the model with a prompt, and it finds, segments, and tracks everything that matches your description in images or videos. Those prompts can be text, visual examples, or clicks, and SAM 3 is designed to understand all of them in one unified system.

This guide explains what a Meta SAM 3 prompt is, the different prompt types, and how to use them effectively.


1. What is a “prompt” in Meta SAM 3?

In SAM 3, a prompt is any hint you give the model about what you want:

  • A short text phrase – e.g. “yellow school bus”, “striped cat”, “players in red jerseys”.

  • One or more exemplar regions (boxes or masks) that contain an example of the object you care about.

  • Classic visual prompts – points, boxes, and masks over the image, like older SAM models.

From that prompt, SAM 3 runs one of two task modes:

  • Promptable Concept Segmentation (PCS) – detect, segment, and track all instances of a concept from text/exemplars.

  • Promptable Visual Segmentation (PVS) – segment one specific object instance from points/boxes/masks (SAM 2 style).


2. Types of prompts in SAM 3

2.1 Text prompts (concept prompts)

  • Format: short noun phrases, like

    • “red cars”

    • “blue recycling bins”

    • “striped cat” 

  • Used in PCS mode.

  • Output: all instances matching that description in an image or ≤30-second video, each with its own mask and ID.

Text prompts are open-vocabulary, so you’re not limited to a fixed label list like “car / person / dog”.


2.2 Exemplar prompts (example boxes or masks)

  • You draw a box or provide a mask around one example object.

  • SAM 3 treats this as an “exemplar” and finds all visually similar objects in the scene or video. 

  • You can add:

    • Positive exemplars – “things like this”

    • Negative exemplars – “not like this” to remove confusions

This is powerful when:

  • The concept is rare or hard to describe in words.

  • You have mixed categories and you want “only these boxes, not those”.


2.3 Visual prompts (points, boxes, masks)

These are the prompts inherited from SAM 1 / SAM 2, used in PVS mode:

  • Points – click on the object (positive) or on background (negative).

  • Boxes – rough rectangle around the object.

  • Masks – an existing segmentation you want to refine. 

PVS is best when you care about one object or region and want precise geometry, not “all objects of this type”.


2.4 Hybrid prompting (mixing text, exemplars, and clicks)

SAM 3 lets you combine prompt types in one request:

  • Start with a text prompt (“players”) to get all players.

  • Add negative clicks on referees or crowd if they were mistakenly included.

  • Drop in a positive exemplar box around the exact jersey style you care about. 

This interactive loop is how you refine tricky concepts without retraining a model.


3. PCS vs PVS: When to use each prompt mode

Promptable Concept Segmentation (PCS) – best when you want all things of a type:

  • “All shipping containers in this port shot”

  • “Every dog in the park”

  • “All yellow school buses across this 20s video”

Input: text + exemplars
Output: masks + unique IDs per instance across frames. 

Promptable Visual Segmentation (PVS) – best when you want to focus on a single instance:

  • Isolate one product for a thumbnail.

  • Cut out one person from a group.

Input: points/boxes/masks
Output: detailed mask for that instance. 

In practice, many workflows start in PCS (“find everything”) and then use PVS + visual clicks to polish specific instances.


4. How SAM 3 uses your prompts internally

Under the hood, SAM 3 has:

  • A vision encoder for images/ video frames.

  • A text encoder for your prompt phrase.

  • A joint transformer that fuses them and predicts:

    • Which concepts are present (via a “presence head”).

    • Where they are (masks + bounding boxes). 

It’s trained on the SA-Co dataset, which includes millions of images and videos with 4M+ unique concept labels tied to masks and hard negatives. 

Because of this training + architecture, SAM 3 gets roughly 2× better accuracy on PCS benchmarks than previous systems while keeping SAM 2’s interactive strengths. 


5. Prompting best practices for Meta SAM 3

Here are practical tips developers and creators share in tutorials and blog posts:

5.1 Writing good text prompts

  • Use short noun phrases, not full sentences:

    • ✅ “red basketball jerseys”

    • ❌ “please find all the players wearing red basketball jerseys”

  • Add attributes when needed: color, material, role, etc.

    • “goalkeepers in green”

    • “wooden tables”

SAM 3 is open-vocabulary but still benefits from specific, focused descriptions. 

5.2 Using exemplars wisely

  • Draw boxes around clean examples of your concept.

  • Use multiple positives if the category has variety (e.g., different angles of the same product).

  • Add negatives to push it away from distracting look-alikes. 

5.3 Visual prompts for precision

  • Start with few clicks/one box → inspect result.

  • Add corrective clicks only where it fails; don’t overspecify.

  • For thin or small objects, zoom in and add a couple of positive touches right along the boundary.

5.4 Iterative refinement

SAM 3 is meant to be used in a loop:

  1. Give a text or exemplar prompt (PCS).

  2. Check masks and tracking.

  3. Add more exemplars or visual clicks to refine.

  4. Re-run until the result matches your needs. 


6. Example prompt recipes

A few concrete scenarios:

6.1 Video editing – isolate all players

  • Prompt: "football players" (text, PCS)

  • Add negative exemplar on referees if they’re included.

  • Use PVS clicks on one player if you want a hero shot for a spotlight effect.

6.2 Robotics – pick up certain objects

  • Prompt: "blue plastic bins" in warehouse frames.

  • Use exemplar box around one correct bin.

  • Robot then uses masks + IDs for grasping and tracking targets.

6.3 Dataset labeling – rare category

  • Prompt: "solar panels" on aerial imagery.

  • Add positive boxes on correctly segmented roofs, negative boxes on skylights.

  • Export masks as annotations for training lightweight detection models.


7. Limitations and gotchas with SAM 3 prompts

Even with strong prompting, SAM 3 isn’t magic:

  • Very ambiguous prompts (“cool objects”, “interesting shapes”) can confuse it.

  • Super fine-grained differences (e.g., two nearly identical logos) may still require manual cleanup

  • For specialized fields (like medical), teams commonly fine-tune SAM 3 into versions like MedSAM3 for domain-specific PCS. 


8. Where to actually use Meta SAM 3 prompts

You can experiment with SAM 3 prompts in:

  • Segment Anything Playground – the official browser UI from Meta: upload media, type prompts, and click to refine. 

  • facebook/sam3 on Hugging Face or GitHub – run Python code that accepts your prompts and returns masks. 

  • Third-party tools (Roboflow, Datature, etc.) that wrap SAM 3 and give you both a UI and an API for promptable concept segmentation.