Methodology

Task-level AI exposure analysis, grounded in evidence

Runway tells you what AI is doing to the specific tasks in your job — with evidence on every claim. AI hits work task by task, dimension by dimension, not job-title by job-title. We map your role across 71 universal capability dimensions and keep three signals separate: capability, deployment, workforce impact. Never collapsed into one number.

Problem

Why existing tools fail

Wrong unit of analysis

Most AI impact assessments operate at the job-category level — "Marketing" or "Finance". But AI doesn't automate jobs. It automates tasks. A marketing strategist and a marketing coordinator share a title family but have fundamentally different task profiles and exposure. Category-level analysis is noise.

Capability and adoption conflated

The fact that AI can do something is not the same as companies actually deploying it. A task being technically automatable, that automation being adopted at scale, and workers being displaced are three separate signals at three different confidence levels. Most tools collapse them into a single score. That produces confident wrong answers.

Static snapshots

One-time reports are stale within weeks. AI capabilities shift constantly — what matters is not just where you stand, but how fast things are moving and in which direction. Without velocity tracking, assessments are already outdated by the time you read them.

Reassuring by default

Career tools are incentivised to tell you things are fine — it reduces churn. Runway is built on the opposite principle: you deserve the truth, even when it is uncomfortable. Every claim is sourced. Every uncertainty is surfaced.

Framework

Five layers of analysis

Every Runway assessment passes through five distinct analytical layers. Each addresses a failure mode that simpler tools ignore.

1. Task decomposition

Your role is broken into atomic tasks — the specific, discrete units of work that constitute what you actually do. Not "content creation" but the specific subtasks within it. The decomposition is anchored to empirical occupational data and calibrated by your input, then translated into per-dimension evidence against the 71-dimension registry. Two people with the same job title will get different capability profiles based on how they actually spend their time and the dimensions their evidence covers.

2. Capability evidence

For each task, we track what AI systems can demonstrably do today — at what quality level, with what evidence. Every capability claim carries a confidence level: Confirmed (multiple independent sources), Plausible (single credible source), or Early Signal (preliminary). Vendor marketing claims are weighted differently from peer-reviewed benchmarks. We track the evidence, not the hype.

3. Adoption tracking

Technical capability is not the same as real-world deployment. We separately track what percentage of companies are actually using AI for each task in practice — by industry and company size — and how fast that adoption rate is changing quarter over quarter. This is the signal most tools miss entirely.

4. Personalised exposure

Your Exposure Profile reflects your specific situation — not a generic average. Task importance, work environment, industry, regulatory context, existing tool adoption, and organisational factors all shape your individual profile. Two people in the same role at different companies can have meaningfully different Exposure Profiles.

5. Trajectory and velocity

A snapshot tells you where you are. Runway tells you where you are heading. We project how your exposure will change over 12 to 36 months based on capability advancement rates and adoption velocity for the specific tasks in your profile. The rate of change is often a more critical signal than the current score.

Signal Model

Three signals, never collapsed

Most tools give you a single “AI risk” number. That number conflates three fundamentally different questions. Runway keeps them separate — because they have different confidence levels and different implications for what you should do.

Capability

Can AI do this task?

What AI systems have demonstrated against this specific task, at what quality threshold relative to human performance, verified by independent evidence. This is the technical ceiling — not a prediction about your job.

Adoption

Are companies actually deploying it?

What percentage of organisations are running AI against this task in production — not what is technically possible, but what is actually happening. Tracked by industry and company size, with quarterly velocity.

Exposure

What does that mean for you?

Your personal exposure, shaped by how central each task is to your role, your specific work environment, and the defensibility factors unique to your situation. This is where generic analysis becomes personal intelligence.

The Brief

The artifact you actually come back for

The Brief is the primary user-facing output. Not a dashboard reading. Not a score change notification. A personalised digest of what changed in your capability profile, what evidence drove the change, what is stable and why, and what early signals we are watching.

Changed

Up to five tasks where signal has moved materially since your last Brief. Each item names the source, the confidence level, and — only when the evidence is Confirmed and the task is central to your role — a specific implication. No platitudes.

Stable

Up to three tasks that remain stable for a specific, evidenced reason. We say why. "Relationships are hard to automate" is not a reason. "Adoption rate in regulated finance has held at 4% for three quarters with no capability shift in the underlying task" is.

Watching

Up to three early signals we are tracking but have not yet confirmed. Explicitly marked as unverified. No implications attached. We tell you what we are watching so that when it lands, you already understand the context.

Exposure Profile

Risk range (not a point estimate), confidence grade, velocity direction, and projection caveats. The grade tells you how much to trust the read. The range tells you the honest band.

The Brief is generated when signals are material — not on a fixed weekly cadence. If nothing has changed enough to warrant a new read, we do not manufacture one. That is the bar for shipping a Brief: something real to say.

Evidence

Evidence standards

Not all sources are equal. Every claim in the system carries a confidence level based on evidence quality — and that confidence level is visible to you.

Confirmed

Multiple independent sources corroborate the claim. Includes peer-reviewed research, independent benchmarks with published methodology, and large-sample labour market data.

Plausible

Single credible source, or multiple sources that are not fully independent. Includes analyst reports, vendor benchmarks with published methodology, and smaller-sample market signals.

Early Signal

Preliminary or unverified. Includes preprints, vendor announcements without independent validation, and early-stage market trends. Explicitly marked as unconfirmed in your Brief.

Sources

Data sources

We draw on multiple categories of data to ground every assessment in evidence rather than opinion.

→

Occupational task data

Empirical task distributions from occupational research databases and workforce analytics. This is the baseline for what people in your role actually spend time on.

→

AI capability research

Model evaluations, independent benchmarks, system cards, and peer-reviewed research on AI performance across specific task domains. Tracked per-task, not per-job.

→

Enterprise adoption data

Tool deployment rates, enterprise AI adoption surveys, and statutory sector data tracking what companies are actually running in production — not what vendors claim. Regional rows from sources like ABS BCS, Eurostat ICT, and Census BTOS are treated as coarse business-use context, not fake task-level precision.

→

Labour market signals

Job posting analytics, skill demand trends, and employment data from government statistical agencies and workforce research institutions.

→

Economic research

Labour economics papers, AI economic impact studies, and task-based automation research from leading research institutions.

Defensibility

Defensibility analysis

Not all tasks are equally automatable — even when AI can technically do them. We assess the defensibility of each task in your profile based on structural factors that affect real-world automation resistance.

Organisational context

Tasks that require deep knowledge of a specific organisation's processes, politics, or proprietary systems are harder to hand to AI — even capable AI.

Relationship value

When the value of a task comes from trust, rapport, or human connection — not just the output — automation faces a different kind of barrier.

Consequence stakes

High-consequence tasks (where errors are costly or irreversible) resist automation longer, because organisations require human accountability.

Output verifiability

Tasks where quality can only be judged by someone who already knows the answer are harder to delegate to AI confidently — there's no easy way to check the output.

Scope

What this is — and what it is not

We are direct about what Runway does well and where its limits are. Overpromising is the failure mode we are most determined to avoid.

What it is

A structured intelligence system that decomposes your role into atomic tasks, maps each task against verified capability and adoption evidence, and computes your personal Exposure Profile with a confidence interval that tells you exactly how sure we are.

A continuously updated intelligence layer — not a one-time report. New evidence is ingested, validated, and reflected in your Brief as AI capabilities and enterprise adoption evolve.

A decision-support tool that tells you which parts of your work are changing, how fast, and what the evidence quality is — so you can make informed career decisions.

What it is not

Not a crystal ball. Projections are scenario-based extrapolations from current evidence, not predictions. They show where current trajectories lead — not what will definitely happen. Regulatory shifts, market disruptions, or capability breakthroughs could change the picture.

Not an LLM wrapper. The intelligence comes from a structured data layer that tracks tasks, capabilities, and adoption independently. AI is used for synthesis and explanation — the analytical engine is deterministic.

Not career astrology. Every claim is tied to a specific evidence source with a stated confidence level. There are no personality-style insights, no vague "you should upskill" platitudes, and no reassurance that is not evidence-grounded.

Not based on self-report alone. Your inputs calibrate an empirical baseline anchored to occupational research — they do not replace it. When your self-report diverges significantly from the baseline, we surface that explicitly.

Calibration

How we stay calibrated

Continuous signal updates

Capability evidence and adoption data are updated as new research, benchmarks, and market signals are validated — not on a fixed quarterly cycle.

Version-tracked scoring

Every assessment records which scoring version was used. Your results are reproducible and comparable across time.

Outcome tracking

We follow up at 30, 90, and 180 days to measure whether our assessments predicted real-world outcomes accurately.

Confidence scoring

Every assessment includes a confidence grade (A–D). When data quality is low or evidence is sparse, the score range widens and we state the limitation explicitly. We do not manufacture precision we do not have.

The Architecture

The Capability Dimension Architecture

Every Runway read sits on top of the same coordinate system: 71 universal capability dimensions. Problem framing, strategic thinking, data reasoning, stakeholder management, debugging, narrative writing — each dimension is a discrete unit of professional skill that knowledge work draws on. The registry is fixed; the way each role weights those dimensions is what makes the role distinct.

Role intelligence on demand

When you enter your role, Runway's LLM Intelligence Layer generates a structured read against the dimension registry. The output names the dimension weights for your specific role (what your role demands), the AI exposure for each dimension (absorbing / accelerant / neutral / emerging), the trajectory under current market conditions, and the market context. The same engine reads canonical roles and roles we have never seen — there is no generic fallback. Confidence caps at Plausible until per-role signals accumulate.

Translating tasks to dimensions

Your task allocation from the assessment is converted into per-dimension evidence. Each task maps to one to three dimensions it provides evidence on; heavy time on tasks that exercise a dimension is stronger evidence on that dimension. The bridge between the task layer and the dimension layer is a frozen map generated once against the registry — deterministic, auditable, and zero LLM at read time.

Capability gap analysis

Your evidence strength on each dimension is compared to your role's weight on that dimension. Where role weight exceeds evidence strength materially, we surface a gap and classify it critical / significant / minor / strength. Gaps in dimensions under accelerant or absorbing AI exposure are the prescription engine's primary targets — those are the dimensions where the 30-day Defense Plan ranks the highest-value moves.

Dimension-fit scoring

The AI Market Fit Score now incorporates a dimension-fit component — the weighted match between your evidence profile and the dimensions your role currently rewards. Strong evidence in a dimension your role weights highly contributes more than strong evidence in a dimension your role does not need.

The registry, the role intelligence cache, the evidence translator, and the gap analysis are the structural reason a Runway Brief can speak about your work the way it does — specific to your role, not a generic ladder. The Task Intelligence Graph (below) is the evidence substrate that feeds this architecture; the dimension layer is the coordinate system the user-facing reads run against.

The Graph

The Task Intelligence Graph

Underneath every Brief, every score, and every confidence grade sits one structured asset: the Task Intelligence Graph. It is what makes the analysis specific rather than generic — and it is what compounds over time. The graph is the evidence substrate the Capability Dimension Architecture (above) reads from.

Task taxonomy (legacy fallback)

Roughly 300 atomic tasks with measurable properties — organisational specificity, output verifiability, consequence stakes, relationship value, embodiment, creative synthesis — that determine how automatable each is in practice. Grounded in occupational research, not generated from job titles. The taxonomy now feeds the dimension architecture above via the task-to-dimension map; the legacy archetype-matching path is retained as a fallback for compatibility but is no longer the primary surface.

Capability map

For every task, the evidence chain: what AI systems have demonstrated against it, at what quality threshold relative to a human baseline, with what source, with what confidence level, last updated when. The capability map is what AI can do. It is not what is happening.

Adoption + velocity

Per task × industry × company size: what percentage of organisations are running AI for that task in production, and how fast that rate is changing quarter over quarter. Capability ≠ deployment. We track them as separate signals because conflating them produces confident wrong answers.

The graph is the product. Every score the system shows can be traced back to specific evidence at a specific tier with a specific confidence level. If the graph is wrong, nothing else matters. If the graph is empty, no synthesis call can produce genuine intelligence. Most career tools have nothing equivalent — their analysis lives entirely inside an LLM prompt. We use AI for narration of facts the graph already holds.

How the intelligence compounds

Most career tools are static — the analysis you get today is the same analysis you would have got six months ago. Runway is built differently. The underlying intelligence layer gets more accurate over time.

Every signal processed improves coverage

Each new capability benchmark, adoption survey, or labour market report is validated, mapped to specific tasks, and integrated into the evidence base. Coverage gaps narrow continuously — especially for roles and industries where early data was sparse.

Every assessment refines the baselines

Aggregate assessment data (fully anonymised) reveals where empirical task baselines are accurate and where they diverge from how people actually work. The signal also feeds back into the dimension architecture — which dimensions each role actually weights highly, where the per-role intelligence drifts from cohort reality, and where the LLM-generated read needs anchoring against observed evidence.

Outcome tracking closes the loop

Follow-up surveys at 30, 90, and 180 days measure whether our exposure assessments corresponded to real-world outcomes. Where they did not, we identify why — and the model adjusts. This is the feedback loop that separates a living system from a static report.

The result is an intelligence layer that is measurably better at assessing exposure today than it was three months ago — and will be better again in another three. That compounding accuracy is the core of what we are building.

Grade Honesty

Why most grades today are C or D — and why that is the point

Confidence grades range A (narrow range, high data quality) to D (wide range, sparse data). If you see a C or D on your read, that is not a bug. It is the system refusing to inflate a number it cannot defend.

The substrate is still accumulating

The adoption-signal layer of the graph is dominated by Early Signal evidence today. That is honest reporting of the state of public AI-deployment data, not a flaw in our reading of it. Until multiple independent non-vendor sources corroborate a deployment-rate claim, we cap it at the lower confidence band — even when the underlying capability is well established.

Some role × industry pairs are sparse

A compliance officer in regulated banking has different adoption realities from a marketer in SaaS. Where we have less direct evidence for a specific role × industry × company-size segment, the confidence interval widens and the grade drops. We tell you which gap you are in — generic platitudes do not.

Velocity is harder to measure than state

A point estimate of deployment rate is one number. Quarter-over-quarter velocity is a derivative — it needs multiple time-anchored data points before we will assign anything above Early Signal. That makes trajectory projections honest about their uncertainty rather than precise about a guess.

A is earned by corroboration, not by confidence

Grade A requires multiple independent sources, fresh evidence, narrow range, and no contradictions. It is structurally rare. Most assessments will land between B and C today; some segments are firmly D. As capability evidence and adoption tracking deepen, more reads will move into the A/B band — but only when the evidence warrants it, never because the UI looks better.

A confident wrong answer is worse than an uncertain correct one. The grade is how we tell you which one you are looking at. If the substrate gets richer, grades rise on their own. They do not rise because we want them to.

Caveats

Limitations

Scores reflect task-level structural exposure — not individual capability, work quality, or adaptability.

Adoption data varies in quality by industry and region. Some segments have stronger evidence than others — your confidence grade tells you exactly how much to trust each read.

AI capability is advancing faster than any monitoring system can fully track. We mitigate this with continuous signal processing, but gaps exist.

Projections assume current trajectory. Regulatory changes, market shifts, or breakthrough capabilities could change the picture.

This tool does not constitute professional career, legal, or financial advice.

Lineage

Intellectual foundation

Runway's methodology draws on established research in labour economics and task-based automation analysis — the same frameworks used by leading economists studying technological displacement.

→

Task-based automation analysis

The principle that automation risk is best understood at the task level, not the occupation level. Pioneered by labour economists studying how technology reshapes work.

→

Routine vs. non-routine task frameworks

The distinction between routine cognitive, non-routine cognitive, routine manual, and non-routine manual tasks — and how each category responds differently to AI advancement.

→

Capability-adoption gap research

Economic research showing that technical capability consistently outpaces enterprise adoption, and that the gap between them varies by industry, regulation, and task type.

This is a model, not a prophecy. The value is in what it reveals about the structure of your work — and what that structure means as AI capabilities evolve.

Start your assessment →See terminology glossary →

Loading Runway...