top of page
contentsafety.png

The Multidimensional Content Safety Framework

Working toward multidimensional content safety
We are currently developing our Multidimensional Content Safety Framework (MCSF) as a dedicated module of the broader Responsible AI (RAI) Framework. Content safety is not an afterthought, but an essential layer: it ties directly into our four-tier scaffold, making moderation decisions principled, transparent, and accountable.

Two stages by design

  1. Universal Veto (non-derogable floor)
    Certain harms are never acceptable (e.g., incitement to violence, hate, sexual exploitation, self-harm facilitation). These are blocked everywhere, by design, and cannot be turned off by local settings.

  2. Cultural Window (context-sensitive layer)
    Above that universal floor, communities differ. The Cultural Window adapts sensitivity to local norms and domains (e.g., schools vs. open forums) while keeping the rights floor intact. This preserves cultural respect without sliding into relativism.

 

Multidimensional—not binary

The system evaluates content across multiple dimensions (e.g., wording, context, tone, potential for social harm, user history, and cultural signals etc.). This vector view clarifies why something was flagged and supports proportionate outcomes (allow, warn, review, block) with a clear explanation for users, reviewers, and—when needed—regulators.

Cultural relativity, without abdication

We recognize that political speech, satire, and taboos vary by place. Our approach separates universal prohibitions from local sensitivities, and it requires that local settings remain justifiable under open, reasoned scrutiny.

Illustrative vignette — Thailand vs. Spain
A multilingual civic assistant runs on the same safety pipeline in both locales. In Spain, robust political critique typically leans toward allow or review. In Thailand, where speech about the monarchy is highly sensitive, similar messages are routed to review more often. 

Compliance teams and administrators can also converse with the RAI Coach in a simulation: testing how the same message would be treated across jurisdictions, surfacing the tiered justifications, and exploring whether the outcomes are proportionate and explainable.

 

What’s live and what comes next

  • Now: We are still working on our content safety framework—the two-stage design, multidimensional evaluations, governance flows, and explanation patterns—so that decisions are traceable to principles across our four tiers (truth/meta-ethics, legitimacy, legality, practical context).

  • Next: We will fine-tune a dedicated guard model directly on our Responsible AI Framework, so that it can automatically calibrate content safety filters in line with our four-tier principles. In a compound setup, the RAI Coach will simulate edge cases, stress-test cultural settings, and propose refinements—ensuring that every adjustment remains grounded in the universal rights floor and backed by documented justifications.

 

The MCSF makes content safety both firm and fair: firm on non-derogable harms; fair to cultural nuance and legitimate dissent. It replaces mystery rules with a structured, transparent module of our RAI Framework—and it grows stronger through evidence, feedback, and principled calibration in the guard-model phase.

bottom of page