Scaled Content Policy—Reference & Audit Gate

Purpose. Authoritative synthesis of Google’s scaled-content-abuse policy + March 2024 / March 2026 enforcement learnings, scoped to McClatchy National Content & CSA. Used for two purposes:

  1. Audit gate. Every content decision authored or reviewed in any repo must be checkable against §6 (McClatchy / CSA Application) and §7 (Audit Checklist). When evidence of a content decision shows up in a session, audit it against this doc before approving, shipping, or recommending it.
  2. Impromptu evals. Pierce can invoke this doc by name (“audit against scaled content policy”) for any decision—proposed format, persona, syndication call, automation, recipe, canonicalization choice—and get a structured pass/warn/fail answer.

Scope. This doc replaces the ad-hoc “we’re not Google’s target because of HITL” reasoning that has shown up across several past sessions with a structured framework. It is policy reference, not editorial guidance. It does NOT supersede csa-content-standards (which owns positive editorial direction). The two work together: csa-content-standards says what to write; this doc says what NOT to risk.

Replication. Identical copy lives in all 6 repos: ops-hub, csa-dashboard, data-headlines, gary-tools, csa-content-standards, data-keywords. ops-hub is the canonical edit surface; updates propagate. Source last refreshed: 2026-04-27.


1. Google’s operative definition

From Google’s spam policies (last updated 2026-04-13 UTC):

Scaled content abuse is when many pages are generated for the primary purpose of manipulating search rankings and not helping users. This abusive practice is typically focused on creating large amounts of unoriginal content that provides little to no value to users, no matter how it’s created.

Examples Google calls out specifically:

The operative phrase is “primarily designed to manipulate.” The policy targets intent + outcome (manipulating rankings, providing no value), not production method. AI-generated content is not inherently in violation; thin content at scale is, regardless of how it was produced. Hand-written content farms are equally targeted.

2. The threshold distinction—volume × value, not volume alone

Pattern Status
Many pages, each with proportional value Not in violation. Sites with thousands of high-quality pages are not penalized.
Many pages, value-per-page negligible In violation. Pages exist to capture keyword rankings, not serve user intent.
Few pages, thin Quality issue; not specifically scaled-content abuse.
Few pages, dense Best case.

The correct mental test: “If this page did not exist, would any user be worse off?” If the answer is no—because the information is readily available elsewhere at higher quality—the page is a removal/consolidation candidate, not an improvement candidate.

Sustainable AI-assist multiplier (per March 2026 enforcement evidence): roughly 2–4× human-baseline output, not 40–100×. A team that goes from 5 articles/week to 200 articles/week after adopting AI tools will be detected; from 5 to 15 generally will not.

3. Detection signals Google appears to use

Multi-signal system. The March 2026 update materially improved detection across all four categories simultaneously.

3.1 Content signals

3.2 Behavioral signals

3.3 Site-level patterns

3.4 E-E-A-T deficiency signals

4. Site patterns observed in March 2026 enforcement

These are the structural fingerprints that drew penalties of 50–80% organic traffic loss in the March 2026 update. Sites exhibiting multiple compound risk.

Pattern Concern
Content published faster than human production speed 10+ articles/day sustained for months without proportional staff.
No variation in content depth across pages Eerily uniform word counts, structure, and depth across hundreds of pages.
Pages targeting keyword variants with substitution “Best [PRODUCT] in [CITY]” templated across 100K near-identical pages.
No original media, data, or research Stock images / AI images only; no proprietary data viz, no first-party reporting.
Author identities that cannot be verified externally Bios with no LinkedIn / Twitter / external presence—likely AI personas.
Affiliate review with no first-hand product experience Content identical to manufacturer specs; no testing-pattern signals.
Location-based service pages from templates Hundreds of near-identical pages differing only in city name.
News aggregation with AI-rewritten articles No original reporting, no journalists on staff, content adds no value beyond sources rewritten.
Educational content farms with AI-generated explanations Generic explanations of well-covered topics, no expert authorship.

5. What survived (the positive pattern)

The March 2026 update did not penalize all AI-assisted content. Sites that survived shared common characteristics:

6. McClatchy / CSA application

How Google’s policy intersects with CSA-generated, McClatchy-syndicated content.

6.1 Defensible position

McClatchy’s CSA pipeline is not the policy target. Google’s enforcement targets sites whose content has no editorial intent and no human accountability. McClatchy’s CSA outputs:

The combination—HITL author layer + genuine structural/lexical differentiation + consistent bylines—is the defensible posture. Each piece of the combination is necessary; none is sufficient alone.

6.2 Triggers that would put McClatchy at risk

These are the patterns to never ship, regardless of efficiency pressure:

6.3 Patterns explicitly not in violation (per Google’s policy)

Google’s site reputation abuse documentation explicitly lists exclusions that cover much of McClatchy’s operating model:

6.4 Operating principles from this doc

7. Audit checklist—apply before approving any content decision

When auditing a proposed format / persona / recipe / automation / canonicalization decision, walk this list. Any fail halts the decision until resolved. Any warn is documented in session log with the mitigation chosen.

# Check Pass / Warn / Fail trigger
1 Does this multiply output beyond ~2–4× human baseline? Pass: ≤4×. Warn: 5–10×. Fail: >10× without proportional editorial expansion.
2 Is there a named, credentialed author on every output? Pass: yes, with verifiable external presence. Warn: yes, but bio thin. Fail: anonymous / generic-team attribution / AI-generated author.
3 Is the variant editorially distinct in <10 seconds of inspection? Pass: structural + lexical + surface diff is obvious. Warn: subtle but present. Fail: only city/name changed; templated substitution.
4 Does this content carry first-hand experience, original data, or primary-source citations? Pass: yes. Warn: synthesis of authoritative secondary sources. Fail: regenerated from training data without sourcing.
5 Is HITL review applied? Pass: every output reviewed by named editor. Warn: sample-based review. Fail: no human review.
6 Does the page pass the “would any user be worse off if this didn’t exist?” test? Pass: yes. Warn: marginal value-add. Fail: information available higher-quality elsewhere.
7 Could the publication cadence withstand traffic-monitoring scrutiny? Pass: cadence consistent with team size. Warn: AI-assisted increase but proportional. Fail: 50–500× legacy baseline.
8 Is the local angle / persona / format change adding genuine reader value? Pass: yes. Warn: editorial judgment call. Fail: added because “we think we should” with no measurable user benefit.
9 For canonicalization decisions: does the canonical choice consolidate authority on a strongest-performer outlet (not just first publisher)? Pass: per p25-canonical-authority rule. Warn: defaulting to first publisher without performance evidence. Fail: leaving variants uncanonicalized when authority should pool.

If a decision fails check 1, 2, 3, 5, or 7, it stops there. Those are the policy-violation triggers. If a decision fails check 4, 6, 8, or 9, it stops on quality grounds, not policy grounds.

8. Recovery framework (in the event of a manual action or core-update penalty)

Used only if a McClatchy property is hit. Documented for completeness; not the primary mode.

Phase 1—Stop the bleeding (Weeks 1–2)

Phase 2—Remediation (Weeks 3–12)

Phase 3—Rebuild authority (Months 3–6)

Realistic timeline

Cosmetic improvements (publish-date refresh, sentence reordering) consistently fail to recover. Recovery requires genuine, verifiable quality improvement that a human quality rater would evaluate as meaningfully better than the prior version.

9. Anticipatory policy—forward-looking guards

The March 2026 update will not be the last. Google has consistently escalated enforcement over consecutive updates since 2024, and the pattern strongly suggests continued tightening. Build deliberately around what will continue to be load-bearing:

  1. Pace AI-assisted output to a 2–4× human-baseline multiplier, not the technical limit of the tool. This is the most reliably-protective single rule.
  2. Treat author signal as load-bearing infrastructure. Every CSA byline must trace to a verifiable author with external presence. Author pages with credentials, external links, and consistent topic expertise compound over months.
  3. Invest in original assets that cannot be generated from training data. First-hand testing, proprietary data, original interviews, primary-source documents—these are the durable moats.
  4. Audit cadence quarterly. Run §7 against shipped output every quarter; document the hit rate. This is the dead-man’s-switch on slow drift.
  5. Read every new Google quality update against this doc. When Google announces a policy change, walk this doc, identify which sections need updating, and version the doc accordingly. Update propagates across all 6 repos.
  6. The single distinguishing question over time: if this content is later evaluated by a human quality rater scoring against the latest E-E-A-T standard, would they evaluate it as the work of a credentialed expert who would stand behind it? If no, do not ship.

10. References

Source Topic
Google Search Central, Spam Policies (last updated 2026-04-13 UTC) Canonical policy reference—scaled content abuse, site reputation abuse, expired domain abuse, all sibling categories.
Google “The Keyword” blog, March 5 2024 Original announcement of scaled content abuse + site reputation abuse + expired domain abuse policies. April 26 2024 update reported 45% reduction in low-quality unoriginal content (vs 40% expected).
March 2026 core update (DigitalApplied analysis) Update started March 5; rolled out 18 days. Targeted thin AI content; 50–80% traffic drops on affected sites. Recovery 3–6 months minimum.
March 2026 site-pattern analysis (DigitalApplied) Detection signals, structural fingerprints, recovery framework.
ops-hub CONTEXT.md Strategic Frameworks “CSA public-exposure posture (2026-04-21)”—internal framing of this same policy posture.
ops-hub data/projects.js p23-differentiation The “50% different” claim retraction (csa-content-standards v1.8.2, 2026-04-24) and 4-dim similarity framework that operationalizes Google’s “primarily designed to manipulate” against measurable axes.
ops-hub data/projects.js p25-canonical-authority Canonicalization rules referenced in §7 check 9.
csa-content-standards Positive editorial direction (what to write). This doc is the negative space (what not to risk).

Reference doc. Updated 2026-04-27. Replicated identically across all 6 repos. Edit canonical copy in ops-hub; propagate.