If I am maker of a website or app and want certain features killed in a specific build that is going to be deployed to apps in several million devices.. can you explain at the code level how specific features can be enabled or disabled on a per device basis and how all this can be orchestrated?

Nice — this is exactly the sort of thing you want nailed before a multi-million device rollout. Below I’ll give a practical, code-level guide you can implement: concepts, architecture, safety knobs, and concrete code snippets (server and client) that show how to enable/disable features on a per-device basis and how to orchestrate rollouts.

1) Core idea — feature flags / remote config

Keep feature control data (flags, rules, percentages, targeting) separate from app binary.
Apps ask a centralized service (or a CDN cache) for the flag state for a given principal (device id / user id / session).
Evaluate flags server-side when possible (most secure) or client-side for responsiveness (with signed config + local cache).
Provide kill switches (global off), percentage rollouts (gradual), targeted rollouts (device model, country), and experiment buckets (A/B).

2) Basic components

Flag store / control plane — where product/ops define flags and targeting rules (DB + UI).
Evaluation service / SDK — evaluates which variant a given device should get. Can be:
- Server-side evaluation API (recommended for sensitive control).
- Lightweight client SDK with local cache + occasional sync.
Delivery — CDN + TLS + signing for cached configs; use shorter TTLs for rapid kill.
Telemetry / metrics — track flag exposures and success/failure metrics.
Safety — global kill switch, audit logs, incremental rollout, circuit-breakers.

3) Flag model (simple)

Each flag entry:

{
  "key": "new_payment_flow",
  "default": false,
  "rules": [
    { "type": "user_id_allow", "ids": ["123","456"] },          // explicit allow
    { "type": "device_model", "values": ["Pixel5","iPhone12"] },
    { "type": "country", "values": ["IN","PK"] },
    { "type": "percentage", "percent": 20, "salt": "newpay-v1" } // 20% rollout
  ],
  "created_by": "pm@company.com",
  "created_at": "2025-11-01T10:00:00Z",
  "kill_switch": false
}

4) Deterministic bucketing (important)

To do percentage rollouts that are sticky per device, compute a deterministic hash of (salt + device_id) and map to 0–99. Devices with value < percent are in the cohort. Example function (JS):

// simple stable bucket: returns 0..99
function stableBucket(deviceId, salt = "") {
  // djb2-like hash computed digit-by-digit (deterministic)
  let h = 5381;
  const s = salt + "|" + deviceId;
  for (let i = 0; i < s.length; i++) {
    h = ((h << 5) + h) + s.charCodeAt(i); // h * 33 + c
    h = h & 0xffffffff; // keep 32-bit
  }
  // unsigned
  return Math.abs(h) % 100;
}

// usage:
if (stableBucket("device-abcdef", "newpay-v1") < 20) {
  // in 20% rollout
}

5) Example: server-side evaluation service (Node + Redis)

A small HTTP endpoint that evaluates flags for a device; caches rules in Redis for speed.

// server/evaluator.js (Node/Express)
const express = require('express');
const redis = require('redis');
const bodyParser = require('body-parser');
const app = express();
app.use(bodyParser.json());
const client = redis.createClient();

function stableBucket(deviceId, salt="") {
  let h = 5381;
  const s = salt + "|" + deviceId;
  for (let i = 0; i < s.length; i++) {
    h = ((h << 5) + h) + s.charCodeAt(i);
    h &= 0xffffffff;
  }
  return Math.abs(h) % 100;
}

// load flags from Redis (or DB) -- assume cached JSON at "flags:all"
async function getFlags() {
  return new Promise((res, rej) => {
    client.get('flags:all', (err, data) => {
      if (err) return rej(err);
      res(data ? JSON.parse(data) : {});
    });
  });
}

app.post('/evaluate', async (req, res) => {
  const { device_id, user_id, country, device_model } = req.body;
  const flags = await getFlags();
  const result = {};
  for (const f of Object.values(flags)) {
    if (f.kill_switch) { result[f.key] = false; continue; }
    let on = f.default;
    // rules short-circuit: if any allow rule hit -> set true; explicit deny rules could be supported too
    for (const rule of f.rules || []) {
      if (rule.type === 'user_id_allow' && rule.ids?.includes(user_id)) { on = true; break; }
      if (rule.type === 'device_model' && rule.values?.includes(device_model)) { on = true; break; }
      if (rule.type === 'country' && rule.values?.includes(country)) { on = true; break; }
      if (rule.type === 'percentage') {
        if (stableBucket(device_id, rule.salt || f.key) < (rule.percent || 0)) { on = true; break; }
      }
    }
    result[f.key] = on;
  }
  res.json({ flags: result, server_time: new Date().toISOString() });
});

app.listen(3000);

6) Example: client SDK (JS) with local cache & fallback

Client requests /evaluate on startup and caches result. If offline, use last cached flags + safe defaults. Use short TTL for critical flags.

// client/flagClient.js
async function fetchFlags(deviceInfo) {
  try {
    const r = await fetch('https://flags.example.com/evaluate', {
      method: 'POST',
      body: JSON.stringify(deviceInfo),
      headers: {'content-type':'application/json'}
    });
    const json = await r.json();
    localStorage.setItem('flags:cached', JSON.stringify({ts:Date.now(), payload: json}));
    return json.flags;
  } catch (e) {
    // offline or network failure: use cache
    const cached = JSON.parse(localStorage.getItem('flags:cached') || 'null');
    if (cached) return cached.payload.flags;
    // final fallback: empty
    return {};
  }
}

// usage in app
(async () => {
  const deviceInfo = { device_id: DEVICE_ID, user_id: USER_ID, country: 'IN', device_model: 'Pixel5' };
  const flags = await fetchFlags(deviceInfo);
  if (flags['new_payment_flow']) {
    startNewPaymentFlow();
  } else {
    startLegacyPayment();
  }
})();

Security note: if a flag unlocks a sensitive server path, the server must authorize (server-side) — don't rely purely on client flags.

7) Orchestration & rollout strategies

Gradual % rollout: start at 0 → 1% → 5% → 25% → 100%. Use deterministic bucketing so devices stay in same bucket across updates.
Canary by cohort: route a percentage of traffic or specific devices (internal QA devices) to the new feature.
Geo / device targeting: limit to certain countries or device models.
User segment: power users, paid users, etc.
Time-based rules: enable on a date/time window.
Kill switch: global boolean that can be toggled to instantly disable feature everywhere. Put kill_switch evaluation before rules.

8) Telemetry and safety

Emit an exposure event whenever the client or server evaluates a flag: {timestamp, flag_key, device_id_hash, variant, context}. Use hashed device id to preserve privacy.
Track errors and KPIs (error rate, latency, crash rate) by flag exposure. Ramp back if errors rise.
Automated alerting based on metric thresholds.
Keep audit trail of who changed flags + when.

9) Performance & caching

Keep flag config small and cache on CDN/edge. TTL tradeoffs:
- Short TTL (e.g., 30s–1min): quick kill but more load.
- Long TTL (10m–1h): less load but slower response to kills.
Use push (WebSocket/FCM/APNs) to notify clients of critical flag changes (e.g., kill switch) so they fetch immediately.

10) Data schema changes & migrations

Use flags to gate new database schema changes (e.g., feature off uses old schema). Migrate progressively:
1. Add optional columns.
2. Run backend to write both old+new data under a gate.
3. Flip flag for backends and clients to read new schema.
4. Remove old code after full rollout.

11) Rollback & incident playbook

Have a single global kill switch that bypasses rules (immediately off).
Maintain a team runbook: how to flip kill switch (UI + API + direct DB update), who authorizes, communication plan.

12) Example: signed client config (prevent tampering)

If you do client evaluation, sign the flag bundle with server private key and verify in the client:

// server: produce signed config
// pseudo:
const payload = JSON.stringify({flags: {...}, exp: Date.now()+60000});
const signature = HMAC_SHA256(serverSecret, payload);
return { payload, signature };

// client:
if (verifyHMAC(serverSecretPublic, resp.payload, resp.signature)) {
  useFlags(JSON.parse(resp.payload).flags);
} else {
  // ignore, use fallback
}

(Use proper crypto libs; never embed private keys in the client.)

13) Multi-platform notes

Mobile (iOS/Android): use SDK that caches flags, receives push notifications for immediate refresh, uses deterministic device id (or hashed advertising id) for bucketing. Keep TTL small for kill switches.
Web: fetch flags on boot, store in cookie/localStorage; consider server-side rendering to ensure search engines and first render obey server evaluation.
Backend: do sensitive checks server-side (e.g., enable payment route only if server says so).

14) Danger / anti-patterns to avoid

Shipping logic for enabling sensitive features purely to the client (easy to tamper). Always enforce critical checks on server.
Using non-deterministic random for rollouts (users get different experiences every request). Use deterministic hashing.
Long TTLs for critical flags (makes kill slow).
No telemetry or metrics — you must measure.

15) Example: quick Python evaluator (for integrations)

# evaluator.py
import hashlib, json, time
FLAGS = {} # load from DB/redis

def stable_bucket(device_id: str, salt: str="") -> int:
    s = (salt + "|" + device_id).encode('utf-8')
    h = int(hashlib.sha1(s).hexdigest()[:8], 16)
    return h % 100

def eval_flag(flag, ctx):
    if flag.get('kill_switch'): return False
    on = flag.get('default', False)
    for rule in flag.get('rules', []):
        t = rule['type']
        if t == 'user_id_allow' and ctx.get('user_id') in rule.get('ids', []):
            return True
        if t == 'device_model' and ctx.get('device_model') in rule.get('values', []):
            return True
        if t == 'country' and ctx.get('country') in rule.get('values', []):
            return True
        if t == 'percentage':
            if stable_bucket(ctx.get('device_id',''), rule.get('salt','')) < rule.get('percent',0):
                return True
    return on

16) Orchestration & workflow (who does what)

Product configures flag intent and rollout plan.
SRE/Release sets kill switch and rollout gates in the control plane.
Engineers instrument exposure events and gate code paths.
Data/Analytics monitor KPI changes and recommend ramp up/down.
Legal/Privacy ensure device ids are hashed and PII handled correctly.

17) Checklist before rollout to millions

Flag control UI + API (audit logs + RBAC).
Deterministic bucketing implemented.
Server enforcement for sensitive paths.
Client caching + fallback behavior defined.
Telemetry for exposures + metrics instrumented.
Kill switch tested (flip and observe).
Test groups (QA, internal) pinned.
Rollout plan (percent steps + criteria to proceed).
Alerts for spike in errors/crashes.

If you want, I can:

Produce a small, ready-to-deploy reference repo layout (server + client) in your preferred language/framework.
Draft the rollout playbook and monitoring queries you should add to Prometheus/Grafana (or Datadog).
Give a short, copyable policy text for RACI/approval for flips and kills.

Which of those would be most useful next?

My Code Xploits : These are my short trips into the programming world

Pages

Wednesday, November 19, 2025

Feature Flag orchestration guide