OpalForce hires domain experts, evaluators, and operators across India and South America. Send a note with the role and a short summary of your work.
Apply via emailReview LLM outputs against rubrics for factuality, clarity, safety, hallucinations, and usefulness.
Lead advanced LLM evaluation projects, reviewer calibration, adjudication, and quality scoring.
Rank AI responses and provide structured human feedback for model improvement workflows.
Review healthcare AI outputs for clinical accuracy, workflow realism, hallucination risk, and patient-safety concerns.
Evaluate legal AI outputs for citation reliability, reasoning quality, contract interpretation, and compliance risk.
Evaluate code-generation models and AI agents for correctness, maintainability, security, and real-world usefulness.
Test AI systems for cybersecurity reasoning, unsafe guidance, vulnerability analysis quality, and security workflow usefulness.
Review accounting, financial, underwriting, fraud, and compliance AI outputs for accuracy and risk.
Evaluate Spanish and Portuguese AI outputs for fluency, cultural fit, factuality, tone, and regional appropriateness.
Design and run adversarial AI tests for safety, hallucination, bias, misuse, policy, and operational failure modes.
Own QA sampling, reviewer calibration, quality dashboards, and client-ready reliability reporting.
Manage distributed expert teams, project delivery, client SLAs, staffing, and operational performance.
Source, screen, verify, and onboard doctors, lawyers, engineers, finance experts, cybersecurity experts, and multilingual evaluators.
Help AI companies and enterprises design evaluation pilots, expert workflows, and managed reliability operations.
Support AI risk documentation, review traceability, governance evidence, and compliance-ready reporting.
Audit annotation and evaluation outputs for consistency, guideline adherence, edge-case handling, and reviewer accuracy.
Evaluate prompts and AI responses for task design, instruction following, quality, safety, and expected user outcomes.
Review scientific and technical AI outputs for research quality, evidence use, reasoning, and hallucination risk.
Define the internal platform for task routing, evaluator scoring, QA, dashboards, audit logs, and client reporting.
Build internal tools for AI evaluation workflows, task routing, QA dashboards, reviewer analytics, and client reporting.
Versatile evaluators with strong writing, reasoning, and research skills. Tackle a rotating mix of LLM evaluation tasks across general knowledge, instruction following, and open-ended reasoning. Ideal for graduate students, researchers, and former consultants.
$60 – $200 / hour
Former or current consultants from MBB (McKinsey, BCG, Bain) or Big 5 firms evaluating AI on strategic frameworks, market sizing, case structuring, executive communication, and slideware reasoning.
$100 / hour
Board-certified internists evaluating AI on differential diagnosis, workup planning, medication reasoning, and inpatient/outpatient management against current clinical guidelines.
$130 – $180 / hour
Hematologists and oncologists evaluating AI outputs on staging, treatment selection, regimen dosing, biomarker interpretation, and survivorship guidance.
$130 – $180 / hour
Corporate attorneys reviewing AI on M&A diligence, contract drafting, deal structuring, redlines, and corporate governance. JD with transactional practice experience required.
$85 – $120 / hour
Board-certified radiologists evaluating AI on imaging interpretation reasoning, report drafting, finding adjudication, and clinical correlation across modalities.
$130 – $180 / hour
Cardiologists evaluating AI on ECG interpretation reasoning, heart failure management, interventional decision support, and guideline-aligned cardiovascular care.
$130 – $180 / hour
Litigators evaluating AI on case theory, motion drafting, citation accuracy, discovery review, deposition prep, and procedural reasoning across state and federal practice.
$85 – $120 / hour
ED physicians evaluating AI on triage, time-critical decision making, trauma and resuscitation protocols, disposition reasoning, and high-acuity differential diagnosis.
$130 – $180 / hour
Psychiatrists evaluating AI on DSM-5-aligned assessment, psychopharmacology, risk stratification, therapy framing, and safety-sensitive mental health workflows.
$130 – $180 / hour
Attorneys with deep specialty practice in real estate, tax, bankruptcy, or trusts and estates evaluating AI outputs on doctrine, filings, and jurisdiction-specific reasoning.
$85 – $120 / hour
Patent and trademark attorneys or agents evaluating AI on prior-art reasoning, claim drafting, office-action responses, and trademark clearance analysis. USPTO registration a plus.
$85 – $120 / hour
Current or former IB analysts/associates from bulge bracket or elite boutique firms evaluating AI on valuation, modeling, comps, pitch material, and transaction reasoning.
$100 – $130 / hour
Compliance and regulatory attorneys evaluating AI on financial services, healthcare, privacy, and sector-specific regulatory reasoning, policy drafting, and risk frameworks.
$85 – $120 / hour
JD-qualified attorneys across practice areas reviewing AI outputs on legal reasoning, citation accuracy, statutory interpretation, and client-facing communication. Flexible volume.
$55 – $135 / hour
Top-percentile evaluators across domains who consistently lead inter-rater reliability and adjudicate disputed labels. Reserved for proven reviewers across multiple programs.
$70 – $150 / hour
Employment and labor attorneys evaluating AI on Title VII, FLSA, NLRA, workplace investigations, employment agreements, and HR-adjacent regulatory reasoning.
$85 – $120 / hour
PE professionals evaluating AI on LBO modeling, diligence frameworks, portfolio operations, fund economics, and investment committee reasoning.
$130 / hour
U.S.-licensed attorneys with 7+ years of practice leading complex AI evaluation programs, rubric design, and adjudication for high-stakes legal AI products.
$100 – $150 / hour
Software engineers evaluating coding AI on enterprise codebases — Java, C#, TypeScript, large monorepos, legacy refactors, and production-grade testing patterns.
$50 – $70 / hour