<!-- AGENT-AUDIENCE: general-style -->

# Data Universe Labeling


> **Standing rule (exec/leadership, C&P Weekly 2026-04-20):** "If we can't tell the universe of the data, we don't validate it. The data is not valid to us."

Every data artifact—whether a report, a chart, a dashboard, a Slack message, a briefing document, a SQL query result, a spreadsheet column, or an AI-generated finding—must declare its **data universe** before it is used, shared, or acted upon. Data without a declared universe is not valid and must not be incorporated into decisioning, commissioning, or editorial workflow.

## Why this rule exists

The data intelligence team at McClatchy has historically sent performance data without universe labels, forcing consumers to chase down the source, scope, and caveats before the data is usable. This creates persistent ambiguity about what any given number actually represents. exec/leadership explicitly named this as the anti-pattern his team must not replicate.

Labeling the universe up front: (a) forces the producer to know what they built, (b) lets the consumer trust-but-verify in seconds rather than hours, (c) makes caveats legible so nobody is surprised, and (d) prevents the comparison of incompatible datasets.

## What counts as a data universe label

A minimally acceptable universe label answers five questions:

| Field | What it answers | Example |
|---|---|---|
| **Source system** | Where did the data come from? | Snowflake `MCC_PRESENTATION.CONTENT_SCALING_AGENT.TRACKER_ENRICHED` |
| **Scope / filter** | What is included vs. excluded? | 13 National-team brands per `national-portfolio.js`; excludes Life & Style, Mod Moms Club |
| **Window** | What time range does this cover? | Publication 2026-01-01 through 2026-04-19; traffic as of 2026-04-19 model run |
| **Caveats** | What should the consumer know before using this? | Pre-~August 2025 L&E Amplitude data is likely wrong; MSN not in Snowflake (comes from Tarrow XLSX) |
| **Run stamp** | When was this artifact produced? | 2026-04-20 09:17 CDT (Monday scheduled sync) |

If any of the five cannot be answered, the data is not ready for use.

## Required formats

### Machine-readable (frontmatter / JSON / YAML)

Any data file or artifact that has structured metadata (Jekyll frontmatter, a JSON envelope, a config block) must include a `data_universe` key:

```yaml
---
data_universe:
  source: "Snowflake: MCC_PRESENTATION.CONTENT_SCALING_AGENT.TRACKER_ENRICHED"
  scope: "13 publications per national-portfolio.js; excludes Life & Style, Mod Moms Club"
  window: "publication 2026-01-01 through 2026-04-19; traffic as of 2026-04-19"
  caveats:
    - "Pre-Aug-2025 L&E Amplitude data may be wrong (L&E not properly integrated before then)"
    - "MSN data not in Snowflake; comes from Tarrow XLSX only"
    - "Cluster aggregates (cluster_* columns) appear on parent rows only; child rows get empty strings"
  run_stamp: "2026-04-20T09:17:00-05:00"
---
```

### Human-readable (prose header on reports / messages)

Any report, Slack message, email, or document that surfaces data must open with a prose universe paragraph, flagged visually so it can't be skipped:

> **Data universe:** Snowflake TRACKER_ENRICHED (twice daily Mon-Fri, 10:13 + 18:13 CDT rebuild) · 13 National brands per `national-portfolio.js` · publication 2026-01-01 onward · traffic as of 2026-04-19. Caveats: pre-Aug-2025 L&E Amplitude may be wrong; MSN not in Snowflake; cluster aggregates on parent rows only.

One sentence is fine when the universe is simple. Multiple lines when it isn't. The key discipline: the consumer should never have to ask "where did this come from?"

### Agent prompts (CSA and any Pierce-built agents)

Any agent that consumes or produces data must:

1. **Refuse to incorporate data without a declared universe.** If data arrives without a universe label, the agent returns a request for the universe before proceeding.
2. **Emit the universe in its output** when producing data-driven findings. The agent's response must state which universe informed the finding.
3. **Distinguish universes when comparing.** If the agent is comparing two numbers, the agent must state both universes and flag if they are incomparable.

## Usage rules tied to data universes (new pattern, 2026-04-21)

Some data universes come with operating rules that must be applied by every consumer—not just source / scope / window / caveats, but actual decisioning rules authored by subject-matter experts (exec/leadership on strategy, data team on ad yield, etc.). These rules cannot just live in a meeting transcript—they need to travel with the data.

**The pattern:** encode operating rules in four places, each tuned to a different consumer:

1. **In the column's provenance row** in `SNOWFLAKE.md` §7—the `preprocessing` and `validation` cells spell out the rules. Anyone reading column docs sees them.
2. **In this canonical standard** (see per-universe "Usage rules" blocks below)—agents and skills reading governance see them at the universe level.
3. **In SQL comments inside the CTE** of the model that computes the column—anyone touching the code sees them inline.
4. **As a sibling caveats column shipped with every row** (e.g., `OUTLET_ECPM_CAVEATS` text column next to `OUTLET_ECPM_PROGRAMMATIC`)—downstream consumers (reports, agents, dashboards) physically cannot use the number without the rules attached.

The fourth tier is what guarantees the rules survive any path through the data. If someone copies the number into a slide, runs a query that strips metadata, or feeds it to an agent that doesn't read governance docs—the rules come along for the ride because they're literally in the row.

**When to use which tier:**

| If the rule is | Use tier(s) |
|---|---|
| A simple caveat ("this column is NULL for L&E articles") | Tier 1 (provenance row) |
| A universe-wide caveat ("pre-Aug-2025 Amplitude data is wrong") | Tier 1 + 2 |
| A hard analytical rule ("page views primary, eCPM tiebreaker") | All 4 tiers |
| A multi-part operating framework (exec/leadership's decisioning hierarchy) | All 4 tiers + a link to the canonical governance doc |

---

## Known universes that carry mandatory caveats

These are the most common data universes in Pierce's / the team's work and the caveats that must accompany each:

### Snowflake `TRACKER_ENRICHED` (primary performance universe)

- **Source:** `MCC_PRESENTATION.CONTENT_SCALING_AGENT.TRACKER_ENRICHED`
- **Refreshed:** twice daily Mon-Fri, 10:13 + 18:13 CDT (`snowflake-tracker-sync.yml`); manual re-runs possible
- **Scope:** 13 National brands + L&E (Us Weekly, Woman's World only)
- **Caveats:**
  - Cluster aggregate columns (`cluster_*`) appear only on parent rows; children get empty strings
  - `DYN_CONTENT_API_LATEST` was dropped 2026-04-20 then recreated late April (table-ids changed; filter `ACCOUNT_USAGE.TABLES WHERE deleted IS NULL` to avoid the dropped predecessors). The current live instance is actively maintained (251K rows, last_altered daily) and was re-integrated into `TRACKER_ENRICHED` at v2.7 (2026-05-16), supplying `tags_iab` (the canonical IAB-tier signal) plus `tag_need`, `tag_sensitive`, `tags_other`, `tags_seo`.
  - MSN traffic is NOT represented—comes from Tarrow XLSX only
  - **Performance-signal columns are origin-PVs basis (post-cross-syndication-screen, ship date TBD).** `is_hit`, `article_vs_co_median`, `cluster_vs_co_median`, `author_hit_*` measure the article's home-publication performance only—cross-syndication fan-out is excluded. Reach + revenue columns (`total_pvs`, `article_programmatic_revenue_live`, `author_avg_pvs`) remain on total-PVs basis. Driven by exec/leadership 2026-05-06 ask: distribution picking already-strong stories for syndication "juices" apparent topic perf and must not feed the outcomes loop. Implementation: [`ops-hub/docs/cross-syndication-screen.md`](https://github.com/piercewilliams/ops-hub/blob/main/docs/cross-syndication-screen.md).

### Amplitude (via Snowflake or direct)

- **Sources** (verified 2026-05-27):
  - `MCC_PRESENTATION.AMPLITUDE.AMPLITUDE_EVENTS_PROD` — **canonical presentation-layer events table** (10.1B rows / 5.9 TB, continuous refresh). Use this for vendor + feed-source attribution work (the `individual_feed_source` property lives here).
  - `MCC_AMPLITUDE.AMPLITUDE.EVENTS_412949` — raw per-project export (54.3B rows / 22.2 TB; same upstream Amplitude org, different consumption surface). Source for `compute_article_engagement_signals.py`.
  - `MCC_AMPLITUDE.AMPLITUDE.EVENTS_412950` — paywall funnel events (3.8M rows).
  - `MCC_AMPLITUDE.AMPLITUDE.EVENTS_669032` — O&O events (1.36B rows, not used in pipeline).
- **STALE views — do NOT use:** `MCC_PRESENTATION.AMPLITUDE.AMPLITUDE_EVENTS_PROD_LAST_30DAYS` (not refreshed since 2025-06-04); `MCC_PRESENTATION.AMPLITUDE.AMPLITUDE_LIFESTYLE_AND_ENTERTAINMENT_EVENTS_PROD` (not refreshed since 2026-02-24). Both return old data silently. Use `AMPLITUDE_EVENTS_PROD` with explicit filters instead.
- **Caveats:**
  - **L&E brands were not properly integrated into Amplitude before ~August 2025.** Any L&E data from before that window is likely wrong. Historical analyses must label this caveat explicitly or exclude pre-Aug-2025 L&E.
  - **Placement-test freeze — no L&E content-placement tests until the L&E data pipe is repaired.** With L&E Amplitude integration broken (above) and the dedicated L&E view stale since 2026-02-24, a placement test on L&E cannot be measured, so its result is uninterpretable; exclude L&E from the placement-test matrix until the pipe is fixed and the read is reliable.
  - p-tagging bug (CUE vs. WordPress format mismatch) may still affect cross-platform event data. Reliability gate: PTECH-7730.
  - **`event_time` has future-dated garbage rows** (max seen: 2201-01-01). Always pair date filters with both lower AND upper bounds.
  - **`individual_feed_source` is the canonical property for vendor + feed-source attribution.** `content_credit` undercounts NYT / Tribune / Minute Media by 50-90% vs `individual_feed_source`; do not use `content_credit` as a vendor cross-check (audited 2026-05-27).
  - **Instrumentation blindspots:** these event types fire in Amplitude but carry ZERO `individual_feed_source` — `amp_article_view`, `app_eedition_article_view`, `app_eedition_replica_view`, `newsbreakapp_article_view`, `smartnewsapp_article_view`. Combined ~15M events / 10 days are invisible to feed-source-based analysis. Engineering-side gap.
  - Canonical filter for feed-source PV analysis: `event_type IN ('article_view', 'eedition_article_view', 'eedition_replica_view', 'app_article_view')`. Full reference in ops-hub `SNOWFLAKE.md` §19.

### L&E brand page views appearing in content team lead's tracker

- **Source:** UNCONFIRMED as of 2026-04-21—likely Amplitude but needs verification
- **Status:** do not incorporate into downstream work until source is identified and labeled
- **Action:** see ops-hub P3 nextActions—identify source + universe-label column before using

### SEMrush (via API)

- **Source:** SEMrush API, Pierce's L&E allocation (250K credits/month; data team's total pool 2M/month)
- **Scope:** keyword metrics at seed level; each brief declares its seed set explicitly
- **Caveats:** verdicts (Go Hard / Test Small / Skip) are editorial judgments overlaid on SEMrush metrics—not SEMrush's own verdicts

### Tarrow XLSX (syndication platform-side)

- **Source:** Tarrow vendor export (XLSX); downloaded weekly via `data-headlines/download_tarrow.py`
- **Scope:** Apple News native, MSN, Yahoo, SmartNews platform-side views
- **Caveats:**
  - **Data is platform-side, NOT O&O click-throughs.** Do not commingle with Snowflake O&O traffic.
  - **Syndication platforms are LTV=0** per exec/leadership framing—pure PV increment, no subscriber conversion. Do not treat as equivalent to O&O PVs for decisioning.

### GA → Snowflake (legacy fallback)

- **Source:** Google Analytics, piped into Snowflake (availability varies by brand)
- **Caveats:**
  - Pre-current-integration era may contain infinite-scroll artifacts and other recording anomalies
  - Last ~18 months of GA data is "pretty good" per exec/leadership (2026-04-20) but any analysis must label that cutoff

### Story facts + IAB + extended PV channels (`DYN_STORY_FACTS_DETAIL_WITH_KPIS`)

- **Source:** `MCC_PRESENTATION.TABLEAU_REPORTING.DYN_STORY_FACTS_DETAIL_WITH_KPIS`—177K rows, keyed by `STORY_ID`
- **Contains:** IAB taxonomy (up to 5 levels), custom + MCC-defined keywords, story topic, section names, plus extended PV channels not in `STORY_TRAFFIC_MAIN`: paywall hits, app views, cross-site internal recirculation, external backlink views, eEdition (print-replica) views, direct subscription conversions
- **Status:** Primary replacement for the dropped `DYN_CONTENT_API_LATEST` (on the IAB + keyword side). Integrated into `TRACKER_ENRICHED` 2026-04-21.
- **Caveats:** Classification quality depends on upstream editorial tagging; IAB array may be sparse for some articles.

### Cross-site syndication (`STORY_TRAFFIC_METRICS`)

- **Source:** `MCC_PRESENTATION.TABLEAU_REPORTING.STORY_TRAFFIC_METRICS`—205M rows, (URL × BIZ_UNIT × DATE) grain
- **Contains:** Every McClatchy newspaper site × every article × every date it was served there. Enables cross-site syndication aggregates per article.
- **Window:** 2023-03 to present (3 years of history—deeper than `STORY_TRAFFIC_MAIN`)
- **Caveats:**
  - Different universe from `STORY_TRAFFIC_MAIN` / `_LE`. Appears to be the GA-pumped-into-Snowflake source exec/leadership mentioned (2026-04-20). Do not mix with Amplitude-derived metrics without labeling.
  - Closer is absent from this table (no evidence of meaningful Closer syndication on McClatchy sites at scale).
  - Covers internal McClatchy newspaper syndication only (Sac Bee ↔ Miami Herald ↔ Kansas City Star, etc.). External / platform syndication (Field Level Media, MSN, Yahoo News) lives in the Marfeel-per-medium universe below; do not conflate.

### Cross-syndication distortion screen—Marfeel per-medium (`MARFEEL_ARTICLE_BY_MEDIUM`)

- **Source:** `MCC_PRESENTATION.CONTENT_SCALING_AGENT.MARFEEL_ARTICLE_BY_MEDIUM`—fed by Marfeel API ingest path #3 (data-engineer-built; commitment 2026-05-08, ship date TBD)
- **Contains:** One row per (article × medium × date). The `medium` field identifies the syndication target—origin domain, Field Level Media, MSN, Yahoo News, Apple News partner feeds, etc.
- **Window:** Trailing rolling window per Marfeel API limits (initial scope: lifetime per article)
- **Why it exists (full context):** exec/leadership flagged 2026-05-06—distribution hand-picking already-strong stories for cross-syndication "juices" apparent topic performance. His canonical example was a Field-Level-Media-syndicated article spread across ~24 syndication targets whose `TOTAL_PVS` distorted any rollup that consumed it. The distortion is **selection-on-success**: syndication is downstream of strong early performance on the origin publication, not a topic-strength signal. Operators reading inflated cluster medians + topic averages over-commission on juiced topics rather than topics with native strength—the "outcomes loop" gets contaminated. The screen separates `ORIGIN_PVS` (home-publication only) from `SYNDICATED_PVS` (everything else) and tags each article with `SYNDICATION_JUICE ∈ {none, light, heavy}`. Performance-signal columns (`is_hit`, `article_vs_co_median`, `cluster_vs_co_median`, `author_hit_*`) switch to origin-PVs basis. Reach + revenue columns stay total-PVs basis (every PV is real revenue regardless of medium). Operationalizes the cross-syndication data-bias caveat (Strategic Framework #16, 2026-05-05 TH Team Meeting). Closes the loop with exec/leadership via a spot-check showing his 2026-05-06 example article correctly flagged `heavy` post-feed-land.
- **Caveats:**
  - External / platform syndication only. McClatchy-internal syndication is in `STORY_TRAFFIC_METRICS` (above).
  - Articles outside Marfeel's universe (or feed lag) get NULL rows—the model_tracker join is fail-open: `ORIGIN_PVS` falls back to `TOTAL_PVS`, `SYNDICATION_JUICE` defaults to `none`. Pre-feed-land state is the same as no-screen state.
  - Juice tier thresholds (heavy ≥10 sites & ≥60% syndicated; light ≥3 & ≥30%) are v0; calibrate against actual distribution after 2-3 weeks of production data.
  - The screen is a methodology change, not a hide-juiced policy. Heavily-syndicated articles still appear in every dashboard + still earn revenue + still count toward authors' total reach. The change is what counts as *performance signal*, not what counts as reach.
- **Implementation:** [`ops-hub/docs/cross-syndication-screen.md`](https://github.com/piercewilliams/ops-hub/blob/main/docs/cross-syndication-screen.md) (master spec—full design, SQL diffs, ship sequence). [`data-headlines/dev-docs/cross-syndication-screen-ui.md`](https://github.com/piercewilliams/data-headlines/blob/main/dev-docs/cross-syndication-screen-ui.md) (operator-facing UI—chip + filter). [`data-keywords/dev-docs/cross-syndication-screen-impact.md`](https://github.com/piercewilliams/data-keywords/blob/main/dev-docs/cross-syndication-screen-impact.md) (downstream consumer pre-flight).

### Newsletter attribution (`NEWSLETTER_LINK_HEADLINES`)

- **Source:** `MCC_PRESENTATION.TABLEAU_REPORTING.NEWSLETTER_LINK_HEADLINES`—79K rows, per-campaign article attribution
- **Contains:** URL × newsletter campaign × click counts
- **Caveats:** Coverage concentrated in newsroom brands (Sac Bee, Miami Herald); sparse for Us Weekly / Woman's World.

### Revenue and ad yield (multiple universes)

- **PRIMARY SOURCE (pending access, flagged 2026-04-21):** `MCC_RAW.TEMP.BURT_INTELLIGENCE`—data team confirmed this is the canonical dataset for per-market programmatic eCPM. Daily refresh. Built by data team. Access grant pending from data team; `GROWTH_AND_STRATEGY_ROLE` does not currently have usage on `MCC_RAW.TEMP`.
- **Woman's World article-level revenue (accessible today):** `MCC_PRESENTATION.TABLEAU_REPORTING.WOMANSWORLD_PAGEPERFORMANCE`—daily refresh, per-URL EARNINGS + PAGE_RPM + CPM + VIEWABILITY. Woman's World only; ~51K rows.
- **Direct-sold metadata (Naviga):** `MCC_RAW.SIGMA.VIEW_NAVIGA_FLASH_DAILY_*`—sales campaign metadata, not article-level. Useful context, not direct enrichment.
- **Market-level forecasts (contextual):** `MCC_PRESENTATION.TABLEAU_REPORTING.KPI_DIGITAL_REVENUE_*` series—aggregated market-level revenue forecasts; not per-article.

**Usage rules (exec/leadership + data team, C&P Weekly 2026-04-20)—MUST be applied by every consumer of eCPM or revenue data:**

1. **Programmatic baseline only; exclude direct-sold "gravy".** The stable-state programmatic number is the safe decisioning baseline. Direct-sold is variable and shouldn't drive content decisions. When BURT access lands, use the programmatic-only slice, not the combined.
2. **Monthly volatility is ~35%.** Holidays run strong (Nov-Dec), January runs soft. Use stable-state baselines (roughly last-summer averages), NOT spot monthly values. For Kansas City, data team's canonical stable number is ~$130.
3. **Page views are primary; eCPM is tiebreaker.** Per data team: "$1 CPM × 5× page views beats $2 CPM × 1× page views." Never flip this hierarchy.
4. **Market authority trumps eCPM for search-driven content.** Tier 1 markets (Miami, KC, DFW) have stronger domain authority and outperform smaller-eCPM markets (Myrtle Beach, Bradenton) in search regardless of their eCPM. Prefer larger markets for search-dependent content.
5. **Category/section eCPM variance matters.** Certain sections (real estate, specific verticals) have direct-sold advertiser interest that makes them more valuable per PV than blended brand-level averages suggest. Surface high-eCPM-section signal when available.
6. **Sigma dashboards underestimate.** The live STAR-Automation Sigma workbook includes GAM programmatic display only. It excludes video and Taboola. The "complete programmatic" picture (what's in `BURT_INTELLIGENCE`) includes those.

These rules are tier-4-encoded (per the usage-rules-tied-to-universes pattern above) as a literal `OUTLET_ECPM_CAVEATS` text column on every row of `TRACKER_ENRICHED`. Downstream consumers cannot use the eCPM number without receiving the rules.

### content team lead's tracker (Google Sheet)

- **Source:** the content team lead's Google Sheet → `MCC_RAW.GROWTH_AND_STRATEGY.NATIONAL_CONTENT_TRACKER` (via `ingest_tracker.py`, twice daily Mon-Fri, 10:13 + 18:13 CDT)
- **Scope:** National team commissioned content only
- **Caveats:**
  - **exec/leadership reframe (2026-04-20):** this is a production operations doc, not purely an analytical tracker
  - the content team lead's team may enter with inattention to detail; content team lead cleans up manually—integrate the data with awareness that raw inputs are not always clean
  - Going forward (per 2026-04-20): only the human-created cluster ID (hCID) is strictly needed from this sheet—every other column is derivable from Snowflake

## Snowflake is the validated boundary

A corollary to the labeling rule, made explicit 2026-04-21 after exec/leadership's original directive was extended to column-level provenance:

1. **`MCC_PRESENTATION.CONTENT_SCALING_AGENT.*` is the only sanctioned schema for production consumers.** Reports, agents, skills, dashboards, and talking points read from this schema (or artifacts derived from it)—not from raw upstream sources.

2. **Every column in every CSA-schema table must have documented provenance.** The canonical place for this is `ops-hub/SNOWFLAKE.md` §7 (TRACKER_ENRICHED) and §8 (TRACKER_WEEKLY). Each column must declare:
   - **Source**—which upstream table / feed / sheet the data ultimately originates from
   - **Collects / aggregates**—what the upstream source actually records, at what grain
   - **Preprocessing**—what the pipeline does to the raw value (dedup, canonicalize, filter, compute, fill)
   - **Validation**—how we know the value is valid (which safety gate guards it, which constraint it must satisfy)

   If any of the four cannot be stated, the column must be removed or quarantined until the gap is closed.

3. **External data cannot reach consumers directly.** Tarrow XLSX, Amplitude API pulls, Google Sheets, SEMrush endpoints, GA, and any other external source must first land in Snowflake via a vetted ingest pipeline. The path is always: raw source → intake/staging → vetted model routine → CSA-schema output table → consumer. Consumers do not bypass this.

4. **Intake vetting for new external sources.** Before a new source enters the pipeline, the following must be answered in writing (in `SNOWFLAKE.md`, `PIPELINE.md`, or the relevant ingest script's header comments):
   - What does the source collect? At what grain? With what latency?
   - What are its known quirks, gaps, or integration caveats?
   - What preprocessing protects downstream consumers from deceptive outputs?
   - How do we detect when the source silently changes or goes bad?

5. **Unknown-source data must be blocked, not used.** Data columns whose source is unverified (example: the L&E page view column in content team lead's tracker as of 2026-04-21) cannot be incorporated into reports or dashboards until the source is identified and labeled.

## Enforcement

1. **Agents and skills:** any agent or skill that produces a data-driven artifact must emit the data universe in its output. Agents should refuse data without declared universes at the intake boundary.
2. **Reports (docx, markdown, HTML):** must carry a universe header.
3. **Dashboards:** must surface the universe in a persistent location (site footer, header banner, or tooltip).
4. **Slack / email to stakeholders:** data-bearing messages open with the universe.
5. **Commits:** pipeline code that emits data artifacts must include the universe block in generation templates.
6. **Column-provenance documentation:** every new column added to a CSA-schema table must land with its full provenance row in `SNOWFLAKE.md` in the same commit as the code that creates it. No column ships without documentation.

## Anti-patterns

Do not:
- Send a chart or number without stating its source.
- Compare two numbers without confirming both universes are comparable.
- Use pre-Aug-2025 L&E Amplitude data without the integration-gap caveat.
- Treat Tarrow platform-side data as equivalent to O&O click-through data.
- Incorporate the content team lead's L&E PV column into analysis before its source is verified and labeled.
- Assume "Amplitude" is one universe—it's a set of event tables each with different coverage by brand and time.

## Related standards

- Editorial fact-checking lives in the [Claims Validation](/docs/claims-validation) standard (§9). Data universe labeling is about data *provenance*; claims validation is about claim *accuracy* in CSA output. Both required, different scopes.
- Canonical Snowflake reference: [`SNOWFLAKE.md`](https://github.com/piercewilliams/ops-hub/blob/main/SNOWFLAKE.md) in ops-hub.
- Canonical pipeline reference: [`PIPELINE.md`](https://github.com/piercewilliams/ops-hub/blob/main/PIPELINE.md) in ops-hub.
- National team portfolio scope: [`national-portfolio.js`](https://github.com/piercewilliams/ops-hub/blob/main/data/national-portfolio.js) in ops-hub.