Datasets Tutorial

This quick-start tutorial covers only the basics you need to start using datasets in HackAgent. Presets are pre-configured benchmark datasets. They are the fastest way to run standardized evaluations.

If you want to select goals by risk taxonomy (OmniSafeBench) instead of full datasets, you can use intents with categories/subcategories. See Selecting intent categories for details.

Basic CLI Example

hackagent eval baseline \
  --agent-name "target_agent" \
  --agent-type "google-adk" \
  --endpoint "http://localhost:8000" \
  --config-file "configs/baseline-agentharm.json" \
  --no-tui

Basic SDK Example

from hackagent import HackAgent, AgentTypeEnum

agent = HackAgent(
    name="target_agent",
    endpoint="http://localhost:8000",
    agent_type=AgentTypeEnum.GOOGLE_ADK,
)

attack_config = {
    "attack_type": "baseline",
    "dataset": {
        "preset": "agentharm",
        "limit": 50,
        "shuffle": True,
        "seed": 42,
    },
}

results = agent.hack(attack_config=attack_config)

Popular Presets

Preset	Description
`agentharm`	Harmful agentic tasks
`jailbreakbench`	Curated jailbreak behaviors
`strongreject`	Forbidden jailbreak prompts
`beavertails`	Multi-category safety evaluation
`simplesafetytests`	Fast safety sanity checks

For the complete list, see Presets.

Dataset Options

These are the core options supported across dataset sources.

Option	Type	Default	Purpose
`limit`	int	None	Maximum number of goals to load
`offset`	int	0	Skip the first N goals
`shuffle`	bool	False	Randomize goal order
`seed`	int	None	Make randomized selection reproducible

Minimal Example

attack_config = {
    "attack_type": "baseline",
    "dataset": {
        "preset": "strongreject",
        "limit": 100,
        "offset": 0,
        "shuffle": True,
        "seed": 42,
    },
}

Basic Guidance

Use limit to keep tests small while iterating.
Use offset to evaluate different slices of large datasets.
Use shuffle for broader sample diversity.
Use seed when you need reproducible runs.

tip

If shuffle and offset are both set, shuffling happens first and offset is applied after.

Learn More

Dataset Providers for the overview.
Presets for the full benchmark catalog.
HuggingFace Provider for external datasets.
File Provider for local JSON/CSV/TXT inputs.
Custom Providers for custom data sources.
Troubleshooting for common dataset issues.

Basic CLI Example​

Basic SDK Example​

Popular Presets​

Dataset Options​

Minimal Example​

Basic Guidance​

Learn More​