Scaled Content Policy—Reference & Audit Gate
Purpose. Authoritative synthesis of Google’s scaled-content-abuse policy + March 2024 / March 2026 enforcement learnings, scoped to McClatchy National Content & CSA. Used for two purposes:
- Audit gate. Every content decision authored or reviewed in any repo must be checkable against §6 (McClatchy / CSA Application) and §7 (Audit Checklist). When evidence of a content decision shows up in a session, audit it against this doc before approving, shipping, or recommending it.
- Impromptu evals. Pierce can invoke this doc by name (“audit against scaled content policy”) for any decision—proposed format, persona, syndication call, automation, recipe, canonicalization choice—and get a structured pass/warn/fail answer.
Scope. This doc replaces the ad-hoc “we’re not Google’s target because of HITL” reasoning that has shown up across several past sessions with a structured framework. It is policy reference, not editorial guidance. It does NOT supersede csa-content-standards (which owns positive editorial direction). The two work together: csa-content-standards says what to write; this doc says what NOT to risk.
Replication. Identical copy lives in all 6 repos: ops-hub, csa-dashboard, data-headlines, gary-tools, csa-content-standards, data-keywords. ops-hub is the canonical edit surface; updates propagate. Source last refreshed: 2026-04-27.
1. Google’s operative definition
From Google’s spam policies (last updated 2026-04-13 UTC):
Scaled content abuse is when many pages are generated for the primary purpose of manipulating search rankings and not helping users. This abusive practice is typically focused on creating large amounts of unoriginal content that provides little to no value to users, no matter how it’s created.
Examples Google calls out specifically:
- Using generative AI tools or other similar tools to generate many pages without adding value for users
- Scraping feeds, search results, or other content to generate many pages (including through automated transformations like synonymizing, translating, or other obfuscation techniques), where little value is provided to users
- Stitching or combining content from different web pages without adding value
- Creating multiple sites with the intent of hiding the scaled nature of the content
- Creating many pages where the content makes little or no sense to a reader but contains search keywords
The operative phrase is “primarily designed to manipulate.” The policy targets intent + outcome (manipulating rankings, providing no value), not production method. AI-generated content is not inherently in violation; thin content at scale is, regardless of how it was produced. Hand-written content farms are equally targeted.
2. The threshold distinction—volume × value, not volume alone
| Pattern |
Status |
| Many pages, each with proportional value |
Not in violation. Sites with thousands of high-quality pages are not penalized. |
| Many pages, value-per-page negligible |
In violation. Pages exist to capture keyword rankings, not serve user intent. |
| Few pages, thin |
Quality issue; not specifically scaled-content abuse. |
| Few pages, dense |
Best case. |
The correct mental test: “If this page did not exist, would any user be worse off?” If the answer is no—because the information is readily available elsewhere at higher quality—the page is a removal/consolidation candidate, not an improvement candidate.
Sustainable AI-assist multiplier (per March 2026 enforcement evidence): roughly 2–4× human-baseline output, not 40–100×. A team that goes from 5 articles/week to 200 articles/week after adopting AI tools will be detected; from 5 to 15 generally will not.
3. Detection signals Google appears to use
Multi-signal system. The March 2026 update materially improved detection across all four categories simultaneously.
3.1 Content signals
- High semantic similarity across multiple pages on the same domain.
- Content that describes experiences without specificity or unique detail.
- Lack of citations to primary sources or original data.
- Statistical writing patterns associated with LLM output.
3.2 Behavioral signals
- High bounce rate indicating content did not satisfy user intent.
- Low time-on-page for long-form content relative to reading time.
- Users returning to search results immediately after visiting.
- Low or declining CTR despite top-10 rankings.
3.3 Site-level patterns
- Anomalously rapid content publication velocity relative to site age + staffing.
- No author pages, credentials, or verifiable identities.
- About pages and author bios that are themselves AI-generated.
- External link profiles pointing almost entirely to AI content farms.
3.4 E-E-A-T deficiency signals
- No evidence of first-hand experience with topics covered.
- Authors with no verifiable expertise in the subject domain.
- No mentions, citations, or links from authoritative sources.
- Trust signals (contact info, editorial policy) missing or generic.
4. Site patterns observed in March 2026 enforcement
These are the structural fingerprints that drew penalties of 50–80% organic traffic loss in the March 2026 update. Sites exhibiting multiple compound risk.
| Pattern |
Concern |
| Content published faster than human production speed |
10+ articles/day sustained for months without proportional staff. |
| No variation in content depth across pages |
Eerily uniform word counts, structure, and depth across hundreds of pages. |
| Pages targeting keyword variants with substitution |
“Best [PRODUCT] in [CITY]” templated across 100K near-identical pages. |
| No original media, data, or research |
Stock images / AI images only; no proprietary data viz, no first-party reporting. |
| Author identities that cannot be verified externally |
Bios with no LinkedIn / Twitter / external presence—likely AI personas. |
| Affiliate review with no first-hand product experience |
Content identical to manufacturer specs; no testing-pattern signals. |
| Location-based service pages from templates |
Hundreds of near-identical pages differing only in city name. |
| News aggregation with AI-rewritten articles |
No original reporting, no journalists on staff, content adds no value beyond sources rewritten. |
| Educational content farms with AI-generated explanations |
Generic explanations of well-covered topics, no expert authorship. |
5. What survived (the positive pattern)
The March 2026 update did not penalize all AI-assisted content. Sites that survived shared common characteristics:
- Expert-led AI drafting. Subject-matter experts provide outlines, key facts, and original insights. AI drafts the structure. Experts review, rewrite, and add specific experiential details. Published under expert bylines with verifiable credentials.
- Original research with AI analysis. Teams conduct original surveys, compile proprietary data, or perform first-hand testing. AI helps analyze data and structure findings. The underlying research cannot be replicated from training data.
- AI-assisted content refresh. Using AI to update existing high-quality content with new information, statistics, examples. Maintains original human expertise while improving currency. Quality check by the original author before publishing.
- Selective AI use for non-E-E-A-T-sensitive content. AI used freely for content types where experience and expertise signals matter less: glossary definitions, procedural documentation, technical specifications, FAQ answers based on verified facts.
6. McClatchy / CSA application
How Google’s policy intersects with CSA-generated, McClatchy-syndicated content.
6.1 Defensible position
McClatchy’s CSA pipeline is not the policy target. Google’s enforcement targets sites whose content has no editorial intent and no human accountability. McClatchy’s CSA outputs:
- Pass through human-in-the-loop editorial review (HITL is mandatory).
- Land under named, credentialed bylines on T1 properties with established author pages.
- Cite first-party reporting + primary sources where applicable.
- Link to authoritative external sources when synthesizing.
- Produce structurally + lexically differentiated variants per persona / format / market (semantic similarity is high by design and not Google’s trigger).
- Reside on domains with first-party history, original journalism, and verifiable institutional backing.
The combination—HITL author layer + genuine structural/lexical differentiation + consistent bylines—is the defensible posture. Each piece of the combination is necessary; none is sufficient alone.
6.2 Triggers that would put McClatchy at risk
These are the patterns to never ship, regardless of efficiency pressure:
- Same article, same structure, only city name changed (templated regional substitution).
- No byline / no author signal on a CSA variant.
- Thin regional differentiation detectable in <10 seconds (the editorial fingerprint test).
- Mass publication of CSA output without HITL review.
- Bylines on AI-generated articles for authors who cannot verify they wrote / approved them.
- Author bios that are themselves AI-generated.
- Sustained publication velocity of 50–500× human baseline (the AI-batch-publishing fingerprint).
6.3 Patterns explicitly not in violation (per Google’s policy)
Google’s site reputation abuse documentation explicitly lists exclusions that cover much of McClatchy’s operating model:
- Wire service and press release service sites. McClatchy republication of AP / Reuters wire content is not site reputation abuse.
- News publications that have syndicated news content from other news publications. Sister-paper syndication (Miami Herald → Kansas City Star) is not in violation.
- Columns, opinion pieces, articles, and other work of an editorial nature. Editorial CSA output is editorial work, not third-party hosted content.
- Sites designed to allow user-generated content (forums, comments)—not McClatchy-relevant but worth noting for completeness.
6.4 Operating principles from this doc
- Quality gates, not volume goals. Replace “articles published per month” KPIs with “content that achieves engagement thresholds, earns citations, or generates qualified traffic.” (Volume is a follow-on metric of quality, not a primary target.)
- Depth over breadth. One authoritative, well-researched guide on a topic is worth more than 20 thin pages covering adjacent sub-topics. Build for depth on core expertise areas.
- Original assets as competitive moats. Original research, proprietary data, unique case studies, first-hand testing results cannot be generated by AI—they are the most durable content investment.
- Expert content + AI execution. AI provides efficiency, not substance. Subject-matter experts remain essential; AI tools make experts more productive.
- Local angle gate (per p23-differentiation v0.2): “Does a local angle add genuine reader value here—or are we adding it because we think we should?” Latter → drop. Over-localization without value tips the templated-substitution detector.
7. Audit checklist—apply before approving any content decision
When auditing a proposed format / persona / recipe / automation / canonicalization decision, walk this list. Any fail halts the decision until resolved. Any warn is documented in session log with the mitigation chosen.
| # |
Check |
Pass / Warn / Fail trigger |
| 1 |
Does this multiply output beyond ~2–4× human baseline? |
Pass: ≤4×. Warn: 5–10×. Fail: >10× without proportional editorial expansion. |
| 2 |
Is there a named, credentialed author on every output? |
Pass: yes, with verifiable external presence. Warn: yes, but bio thin. Fail: anonymous / generic-team attribution / AI-generated author. |
| 3 |
Is the variant editorially distinct in <10 seconds of inspection? |
Pass: structural + lexical + surface diff is obvious. Warn: subtle but present. Fail: only city/name changed; templated substitution. |
| 4 |
Does this content carry first-hand experience, original data, or primary-source citations? |
Pass: yes. Warn: synthesis of authoritative secondary sources. Fail: regenerated from training data without sourcing. |
| 5 |
Is HITL review applied? |
Pass: every output reviewed by named editor. Warn: sample-based review. Fail: no human review. |
| 6 |
Does the page pass the “would any user be worse off if this didn’t exist?” test? |
Pass: yes. Warn: marginal value-add. Fail: information available higher-quality elsewhere. |
| 7 |
Could the publication cadence withstand traffic-monitoring scrutiny? |
Pass: cadence consistent with team size. Warn: AI-assisted increase but proportional. Fail: 50–500× legacy baseline. |
| 8 |
Is the local angle / persona / format change adding genuine reader value? |
Pass: yes. Warn: editorial judgment call. Fail: added because “we think we should” with no measurable user benefit. |
| 9 |
For canonicalization decisions: does the canonical choice consolidate authority on a strongest-performer outlet (not just first publisher)? |
Pass: per p25-canonical-authority rule. Warn: defaulting to first publisher without performance evidence. Fail: leaving variants uncanonicalized when authority should pool. |
If a decision fails check 1, 2, 3, 5, or 7, it stops there. Those are the policy-violation triggers.
If a decision fails check 4, 6, 8, or 9, it stops on quality grounds, not policy grounds.
8. Recovery framework (in the event of a manual action or core-update penalty)
Used only if a McClatchy property is hit. Documented for completeness; not the primary mode.
Phase 1—Stop the bleeding (Weeks 1–2)
- Pause all AI content batch publishing immediately on the affected property.
- Check Google Search Console for manual action notifications.
- Identify the scope of penalized content via traffic segmentation.
- Set a content quality standard that all future content must meet before publishing.
- Remove or noindex the worst thin content (below quality threshold).
- Consolidate related thin pages into comprehensive guides; 301-redirect the merged URLs to the surviving page.
- Improve top-traffic penalized pages with expert review, original additions, and verifiable sourcing.
- Add verified author profiles + credentials to surviving content.
Phase 3—Rebuild authority (Months 3–6)
- Publish at a sustainable cadence with consistent quality standards.
- Build original research and data assets that attract authoritative citations.
- Develop genuine E-E-A-T signals through expert authorship and external mentions.
- Submit reconsideration request only if a manual action was issued (core-update penalties are algorithmic and have no reconsideration process—recovery comes from fixing signals, not contesting outcome).
Realistic timeline
- Months 1–2: stabilization. Decline stops as noindex + consolidation take effect.
- Months 3–4: partial recovery on revised priority pages—individual pages may regain 30–50% of lost traffic.
- Months 5–6: meaningful recovery if quality improvements are substantive. Next core update typically re-evaluates.
Cosmetic improvements (publish-date refresh, sentence reordering) consistently fail to recover. Recovery requires genuine, verifiable quality improvement that a human quality rater would evaluate as meaningfully better than the prior version.
9. Anticipatory policy—forward-looking guards
The March 2026 update will not be the last. Google has consistently escalated enforcement over consecutive updates since 2024, and the pattern strongly suggests continued tightening. Build deliberately around what will continue to be load-bearing:
- Pace AI-assisted output to a 2–4× human-baseline multiplier, not the technical limit of the tool. This is the most reliably-protective single rule.
- Treat author signal as load-bearing infrastructure. Every CSA byline must trace to a verifiable author with external presence. Author pages with credentials, external links, and consistent topic expertise compound over months.
- Invest in original assets that cannot be generated from training data. First-hand testing, proprietary data, original interviews, primary-source documents—these are the durable moats.
- Audit cadence quarterly. Run §7 against shipped output every quarter; document the hit rate. This is the dead-man’s-switch on slow drift.
- Read every new Google quality update against this doc. When Google announces a policy change, walk this doc, identify which sections need updating, and version the doc accordingly. Update propagates across all 6 repos.
- The single distinguishing question over time: if this content is later evaluated by a human quality rater scoring against the latest E-E-A-T standard, would they evaluate it as the work of a credentialed expert who would stand behind it? If no, do not ship.
10. References
| Source |
Topic |
| Google Search Central, Spam Policies (last updated 2026-04-13 UTC) |
Canonical policy reference—scaled content abuse, site reputation abuse, expired domain abuse, all sibling categories. |
| Google “The Keyword” blog, March 5 2024 |
Original announcement of scaled content abuse + site reputation abuse + expired domain abuse policies. April 26 2024 update reported 45% reduction in low-quality unoriginal content (vs 40% expected). |
| March 2026 core update (DigitalApplied analysis) |
Update started March 5; rolled out 18 days. Targeted thin AI content; 50–80% traffic drops on affected sites. Recovery 3–6 months minimum. |
| March 2026 site-pattern analysis (DigitalApplied) |
Detection signals, structural fingerprints, recovery framework. |
ops-hub CONTEXT.md Strategic Frameworks |
“CSA public-exposure posture (2026-04-21)”—internal framing of this same policy posture. |
ops-hub data/projects.js p23-differentiation |
The “50% different” claim retraction (csa-content-standards v1.8.2, 2026-04-24) and 4-dim similarity framework that operationalizes Google’s “primarily designed to manipulate” against measurable axes. |
ops-hub data/projects.js p25-canonical-authority |
Canonicalization rules referenced in §7 check 9. |
| csa-content-standards |
Positive editorial direction (what to write). This doc is the negative space (what not to risk). |
Reference doc. Updated 2026-04-27. Replicated identically across all 6 repos. Edit canonical copy in ops-hub; propagate.