HubSpot’s Breeze Agents, Marketplace, and Studio: A 14-Day Pilot Playbook for Go-To-Market Teams

What HubSpot just launched, and why it matters

HubSpot’s new Breeze family turns AI from a feature into a front-office teammate. Breeze Prospecting Agent focuses on outbound engagement and qualification. Breeze Customer Agent handles web chat, email, and knowledge queries with clean handoffs to humans when complexity spikes. Breeze Studio centralizes configuration and governance, and the Breeze Marketplace offers prebuilt agents and assistants you can install, then tailor to your stack.

At INBOUND 2025, HubSpot put Breeze Marketplace and Breeze Studio in the spotlight with a public beta and more than 20 agents and assistants. For teams deciding whether to adopt, the fastest path is a controlled pilot that proves lift and cost efficiency before you scale. See HubSpot’s product spotlight for details on the beta and agent lineup in the Breeze Marketplace and Studio launches.

The 14-day pilot goal and the two numbers that decide the rollout

This pilot answers one question: does Breeze improve pipeline and support outcomes at a lower cost than your current funnel. Deploy two agents in parallel and compare against control groups on two decisive metrics.

Lead-to-meeting rate for net-new outreach. Target a relative lift of 15 to 30 percent versus your current sequence baseline.
First response time for inbound support conversations. Target a 50 to 75 percent reduction without sacrificing customer satisfaction.

For both, calculate cost per acquisition for sales-qualified meetings and cost per resolved conversation for support, including credit consumption, software seats, and human time.

What you need in place on Day 0

Before the pilot clock starts, make these decisions and prepare the data.

Target segment and volumes. Choose 1 or 2 Ideal Customer Profiles with clear firmographics and buying triggers. Aim for 500 to 1,000 prospects for outreach and at least 300 inbound conversations for support during the 14 days.
Consent and channel readiness. Confirm opt-in status for email and SMS, configure verified domains, and set quiet hours by time zone.
Knowledge sources for Customer Agent. Collect a clean set of FAQs, pricing rules, refund policies, and key product documents. Remove contradictory articles and stale promotions.
Brand voice and compliance guardrails. Provide tone-of-voice examples, escalation policies, and forbidden claims. Map regulated phrases and approval workflows.
Measurement plan. Define your control groups, attribution rules, and the exact fields in your CRM that will store agent interactions, outcomes, and approvals.

Days 1 to 3: Configure Breeze Customer Agent for support speed and accuracy

Use Breeze Studio to set up the Customer Agent with guardrails that reflect how your team works.

Channels and routing

Enable web chat and email first to keep the surface area manageable. Connect your help desk inboxes and set business hours. Create queues by language and product line.

Knowledge and intent coverage

Import your top 50 support articles and the 25 most frequent customer intents. Tag each intent with the appropriate actions: answer directly, collect context, create a ticket, or escalate to a human.

Safety and brand voice

Write 5 to 8 canonical replies that demonstrate tone and phrasing. Lock pricing statements behind explicit templates to reduce errors. Add restricted topics and a safe refusal pattern.

Handoffs and approvals

Define thresholds where the agent must escalate, such as ambiguous order status, billing disputes, and any security-related requests. Require draft approvals for sensitive replies during the pilot.

SLAs and metrics

Set the service level agreement for initial response at 30 seconds for chat and 5 minutes for email. Track resolution rate, average handle time, escalation rate, and customer satisfaction.

Days 1 to 5: Configure Breeze Prospecting Agent to run intelligent outreach

The Prospecting Agent should not carpet-bomb. It should monitor for signals, time the first touch, and personalize messages.

ICP and account lists

Import target accounts with firmographics and known technology stacks. Add roles by buyer persona. Exclude current opportunities and recent closed-lost to avoid noise.

Buying signals

Configure on-site intent events like pricing page visits and calculator use, plus third-party signals where you have permission, such as job postings or relevant press. Add product-specific triggers like integration marketplace views.

Messaging frameworks

Provide three email templates and one LinkedIn InMail framework per persona. Build personalization slots for role, industry, and product pain points. Keep subject lines under 45 characters.

Cadence and rate limits

Limit to 1 email every 48 hours per contact during the pilot. Cap daily sends at a level your domain reputation can handle, for example 300 per day per subdomain.

Approvals and human in the loop

Require a human approval on first-touch emails to each segment for the first 2 days, then switch to auto-send with sampling QA.

Days 4 to 7: Instrumentation and test design you can trust

To know whether Breeze is better, you need a fair test.

Randomized control groups. Split your eligible prospects 50-50 into Breeze and control sequences. For support, route 50 percent of new chats to Customer Agent and 50 percent to human-only handling. Keep routing sticky for the life of the conversation.
Definitions and formulas. Document them in your analytics wiki and keep them stable for the 14 days.

Key formulas

Lead-to-meeting rate = booked meetings ÷ unique leads touched. Example: 72 meetings from 800 leads equals 9.0 percent.
First response time, median = median elapsed seconds from inbound message timestamp to first agent or human reply. Example: median falls from 120 seconds to 35 seconds.
Cost per acquisition for meetings = total cost of outreach ÷ meetings booked. Include human time at a standard loaded rate, email and enrichment tools, and Breeze credit usage. Example: 4,500 dollars total cost ÷ 72 meetings equals 62.50 dollars CPA.
Cost per resolved conversation = total support handling cost ÷ resolved conversations. Include human time, ticketing, and Breeze credit usage. Example: 3,000 dollars ÷ 560 resolutions equals 5.36 dollars.
Data capture. Add fields to log agent name, model action, approval status, and reason for escalation. Write these to the timeline so analysts can reconstruct journeys.
Outcome quality. For Customer Agent, require a one-click internal QA on 10 percent of conversations with the categories correct, off brand, partial answer, unsafe claim, escalate sooner. For Prospecting, require a similar QA label on sampled emails.

Days 8 to 11: Launch, monitor, and tune

Run a daily 20-minute standup with marketing operations, sales operations, support, and the program owner. Use a lightweight dashboard with four cards per agent, and consider the structure in our AEO playbook for daily briefings.

Breeze Prospecting Agent cards

Deliverability and reputation. Track bounce rate under 2 percent and spam complaints under 0.1 percent. If you breach either, cut send volume by 50 percent and swap subdomain if needed.
Response and meeting rates. Look for early signal of lift, even if small. If response rates spike but meetings lag, adjust call to action and meeting link placement.
Personalization quality. Review 20 random emails per day. Reject any that lack persona-specific value. Update templates accordingly.
Do-not-contact adherence. Verify that suppression lists are respected. Any violation pauses the pilot until resolved.

Breeze Customer Agent cards

First response time trend. Expect a stable sub-60 second median for chat within the first 48 hours.
Resolution and escalation mix. Aim for a 40 to 60 percent self-resolution rate in the pilot, depending on complexity, with clean escalation notes.
Customer satisfaction. Use a 3-point quick reaction on chat end. Inspect any detractors within 24 hours.
Error taxonomy. Tally errors into wrong answer, ambiguous prompt, off brand, policy breach. Fix by updating knowledge, adding disambiguation prompts, or tightening guardrails.

Days 12 to 14: Analyze results and make the go or no-go decision

Pull a clean extract and compute the before versus during metrics for test and control.

Calculate lift and confidence

Example: control lead-to-meeting rate is 7.5 percent, Breeze group is 9.2 percent. Relative lift is 22.7 percent. Use a two-proportion z-test for significance. If p is less than 0.05 and list quality is balanced, you have a reliable improvement.
For support, compare medians and 80th percentiles for first response time. Pair with resolution rate and satisfaction to ensure you did not simply answer faster but worse.

Compute all-in cost

Sum human time with standard loaded rates, software seats, enrichment, and Breeze credit usage. If you are using credits for Customer Agent, review HubSpot’s guidance on activating the agent via credits and the published performance ranges for resolution rates in credits-based activation and resolution impact.

Decide the rollout motion

Green light. If you see at least a 15 percent lift in lead-to-meeting or a 50 percent reduction in first response time with equal or better quality, roll out to the next ICP or channel. Lock guardrails and raise limits gradually.
Yellow light. If results are flat, review QA tags and templates. Often the fix is better knowledge scoping for support or better value props for outreach.
Red light. If you see reputational risk, policy breaches, or a lift that only comes from overly aggressive claims, pause and remediate before another pilot.

How to structure controls, audit trails, and approvals

Approval policies. Keep draft approvals on in week one for net-new templates and sensitive topics such as pricing exceptions, security, and refunds. Move to sampling in week two.
Audit trails. Store the agent draft, the final sent message, and the delta after approval. This helps your legal or compliance team review quickly.
Human review flows. For Customer Agent, route anything tied to billing disputes, cancellations, or security to a trained support specialist. For Prospecting, require manager approval for any outreach that includes discounting or nonstandard terms.

Practical benchmarks to calibrate expectations

Every business is different, but these are reasonable pilot targets if your data and content are in decent shape.

Prospecting Agent
- Response rate: plus 20 to 40 percent versus your best-performing current sequence.
- Lead-to-meeting rate: 9 to 12 percent for mid-market, 4 to 7 percent for enterprise. Lift matters more than absolute numbers.
- Time saved per rep: 30 to 60 minutes per day on research and drafting.
Customer Agent
- First response time: under 60 seconds for chat, under 5 minutes for email.
- Self-resolution rate: 40 to 60 percent in pilot conditions with clean knowledge.
- Satisfaction parity: within 0.1 to 0.2 points of your human-only baseline on a 5-point scale.

If your results are below these ranges, review knowledge conflicts, tone examples, and intent coverage.

Common pitfalls and how to avoid them

Noisy knowledge bases. Conflicting articles lead to inconsistent answers. Archive or deprecate outdated content before ingestion.
Overly broad targeting. A wide ICP dilutes personalization. Start narrow, then expand with what you learn.
Aggressive send volumes. Domain reputation is hard to rebuild. Ramp gradually and warm new subdomains as needed.
Weak disambiguation. Provide short, specific clarification prompts when a customer asks vague questions. This prevents wrong but confident answers.
Shadow metrics. If reps book meetings in personal calendars or handle side-channel tickets, you lose attribution. Centralize scheduling and ticket creation.

What teams and tooling make the pilot succeed

Roles. Assign a pilot owner, a marketing or sales operations lead, a support operations lead, and a QA reviewer. Keep decisions fast and documented.
Playbooks. Write one-page playbooks for the top three buyer personas and the top five customer intents. Include examples of good messages and good answers.
Quality layer. Some teams use Upcite.ai to double check that outbound messages and help content stay on brand and cite the correct source material during template creation, which reduces the approval burden. To systematize this, adapt the approach in our machine-executable SEO operating plans and tighten outreach via the assistant inbox optimization guide.

Example scorecard template you can reuse

Create a one-page scorecard and update it daily.

Inputs
- Prospecting: contacts enrolled, emails sent, signals triggered, approvals pending
- Customer: conversations started, intents detected, knowledge articles used
Outcomes
- Prospecting: replies, positive replies, meetings booked, lead-to-meeting rate, CPA for meetings
- Customer: first response time median and 80th percentile, resolution rate, escalations, cost per resolved conversation
Quality
- Prospecting: personalization quality score, spam complaint rate
- Customer: QA error rate, satisfaction score, restricted-topic breaches
Notes and changes
- Template tweaks, new intents added, guardrail updates, any incidents

If you green-light the rollout, scale deliberately

Expand to a second ICP or region first. Keep a control group live for at least another month to watch for regression.
Automate approvals only after quality stays steady for a week and QA errors are under 3 percent.
Add channels one by one. For support, add social or SMS after chat and email prove stable. For sales, add calling recaps and follow ups after email shows lift.
Keep governance living in Breeze Studio. Treat it like a product with versioned changes, release notes, and an owner.

The bottom line

HubSpot’s Breeze Agents, Marketplace, and Studio let go-to-market teams put AI to work where it matters: faster first responses and better timed outreach, governed in one place. This 14-day pilot gives you a defensible answer on lift and cost. Set clear targets for lead-to-meeting and first response time, enforce strong guardrails, and measure cost per outcome with the same rigor you apply to paid media. If the numbers hold, scale with confidence. If not, you will know exactly what to fix and where your next test should focus.