MinhVo

Minh Vo

rss feed

Slaying code & making it lit fr fr 🔥 tagline

Hey there 👋 I'm an AI Engineer with 7 years of experience building scalable web and mobile applications. Currently at Neurond AI (May 2025 — present), architecting an Enterprise AI Assistant Platform with multi-tenant RAG on pgvector, multi-provider LLM orchestration, and Azure-native infrastructure. Previously spent 5+ years at SNAPTEC (Sep 2019 — Apr 2025), leading SaaS themes, admin dashboards, and e-commerce platforms — earned the Hero of the Year award in 2021. I specialize in TypeScript, React, Next.js, and AI-Native engineering with Claude Code and Cursor.bio

Back to blogs

Database Time-Series: InfluxDB, TimescaleDB, and QuestDB

Choose a time-series database: write performance, query capabilities, and retention policies.

DatabaseTime-SeriesInfluxDBTimescaleDB

By MinhVo

Introduction

Time-series data is everywhere: server metrics, IoT sensor readings, financial tick data, application logs, user analytics. A 2024 InfluxData report estimates that 90% of all data generated by enterprises has a temporal component. Yet most teams still store this data in general-purpose databases like PostgreSQL or MongoDB, paying a massive performance and storage penalty for data they could handle 100x more efficiently.

Dedicated time-series databases (TSDBs) optimize for exactly this workload: high-speed ingestion of timestamped data, fast aggregation over time windows, and automatic data lifecycle management. The three leading options—InfluxDB, TimescaleDB, and QuestDB—take fundamentally different architectural approaches, each with distinct trade-offs for write throughput, query flexibility, and operational complexity.

This guide compares all three through real benchmarks, production deployment patterns, and the architectural decisions that determine which is right for your workload.

Time-series data visualization

Understanding Time-Series Database Architecture

What Makes TSDBs Different

General-purpose databases store data in B-tree indexes optimized for random access. Time-series data is fundamentally sequential—writes are append-only, queries are time-range scans, and old data becomes less valuable over time. TSDBs exploit these properties:

  • Write-optimized storage: Append-only writes avoid B-tree page splits and random I/O
  • Columnar compression: Values of the same type compress 10-20x better than row-oriented storage
  • Time-based partitioning: Data is automatically partitioned by time, enabling fast range scans
  • Retention policies: Old data is automatically downsampled or deleted without manual intervention

The Three Architectures

InfluxDB (v3/v4) uses Apache Arrow and Parquet for columnar storage with a Rust-based engine. It's purpose-built for time-series with its own query language (InfluxQL) and SQL support via FlightSQL.

TimescaleDB is a PostgreSQL extension that adds hypertables (automatic time partitioning), continuous aggregates, and compression on top of standard PostgreSQL. You get full SQL compatibility.

QuestDB is a high-performance TSDB written in Java with a column-oriented storage engine optimized for fast ingestion and SQL queries with time-series extensions.

-- InfluxDB v3: InfluxQL and SQL
-- Create a database
CREATE DATABASE metrics;
 
-- Write data (line protocol)
-- cpu,host=server-1,region=us-east cpu_usage=75.2,load_avg=2.1 1699234800000000000
 
-- Query with SQL
SELECT
  date_trunc('minute', time) as minute,
  host,
  AVG(cpu_usage) as avg_cpu,
  MAX(cpu_usage) as max_cpu
FROM cpu
WHERE time >= now() - INTERVAL '1 hour'
GROUP BY minute, host
ORDER BY minute DESC;
 
-- TimescaleDB: PostgreSQL with hypertables
CREATE TABLE metrics (
  time TIMESTAMPTZ NOT NULL,
  host TEXT NOT NULL,
  cpu_usage DOUBLE PRECISION,
  load_avg DOUBLE PRECISION
);
 
-- Convert to hypertable (automatic time partitioning)
SELECT create_hypertable('metrics', 'time');
 
-- Query with standard PostgreSQL + time-series functions
SELECT
  time_bucket('1 minute', time) as bucket,
  host,
  AVG(cpu_usage) as avg_cpu,
  MAX(cpu_usage) as max_cpu
FROM metrics
WHERE time >= now() - INTERVAL '1 hour'
GROUP BY bucket, host
ORDER BY bucket DESC;
 
-- QuestDB: SQL with time-series extensions
CREATE TABLE metrics (
  time TIMESTAMP,
  host SYMBOL,
  cpu_usage DOUBLE,
  load_avg DOUBLE
) TIMESTAMP(time) PARTITION BY DAY;
 
-- Query with SAMPLE BY (QuestDB's time aggregation)
SELECT
  host,
  AVG(cpu_usage) as avg_cpu,
  MAX(cpu_usage) as max_cpu,
  AVG(load_avg) as avg_load
FROM metrics
WHERE time >= dateadd('h', -1, now())
SAMPLE BY 1m;

Time-series database architecture comparison

Step-by-Step Implementation

Setting Up TimescaleDB with Continuous Aggregates

TimescaleDB's continuous aggregates are materialized views that automatically refresh as new data arrives:

-- Create hypertable with compression
CREATE TABLE sensor_data (
  time TIMESTAMPTZ NOT NULL,
  sensor_id INTEGER NOT NULL,
  temperature DOUBLE PRECISION,
  humidity DOUBLE PRECISION,
  pressure DOUBLE PRECISION
);
 
SELECT create_hypertable('sensor_data', 'time',
  chunk_time_interval => INTERVAL '1 day'
);
 
-- Enable compression (10-20x storage reduction)
ALTER TABLE sensor_data SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'sensor_id',
  timescaledb.compress_orderby = 'time DESC'
);
 
-- Automatically compress chunks older than 7 days
SELECT add_compression_policy('sensor_data', INTERVAL '7 days');
 
-- Continuous aggregate: pre-computed hourly stats
CREATE MATERIALIZED VIEW sensor_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) as bucket,
  sensor_id,
  AVG(temperature) as avg_temp,
  MIN(temperature) as min_temp,
  MAX(temperature) as max_temp,
  AVG(humidity) as avg_humidity,
  COUNT(*) as sample_count
FROM sensor_data
GROUP BY bucket, sensor_id;
 
-- Refresh policy: update every hour
SELECT add_continuous_aggregate_policy('sensor_hourly',
  start_offset => INTERVAL '3 days',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);
 
-- Retention policy: drop raw data older than 90 days
SELECT add_retention_policy('sensor_data', INTERVAL '90 days');
 
-- Query the continuous aggregate (instant response)
SELECT * FROM sensor_hourly
WHERE bucket >= now() - INTERVAL '24 hours'
AND sensor_id = 42
ORDER BY bucket DESC;

Setting Up InfluxDB v3 with Python Client

from influxdb_client_3 import InfluxDBClient3
import numpy as np
from datetime import datetime, timedelta
 
client = InfluxDBClient3(
    host="https://us-east-1-1.aws.cloud2.influxdata.com",
    token="my-token",
    org="my-org",
    database="metrics"
)
 
# High-throughput batch writes
import pyarrow as pa
 
def write_metrics_batch(metrics: list[dict]):
    """Write a batch of metrics using Arrow for maximum throughput."""
    df = pd.DataFrame(metrics)
    table = pa.Table.from_pandas(df)
 
    client.write(
        table,
        data_frame_measurement_name="cpu",
        data_frame_tag_columns=["host", "region"],
        data_frame_timestamp_column="time",
    )
 
# Generate test data
metrics = []
for i in range(100000):
    metrics.append({
        "time": datetime.now() - timedelta(seconds=i),
        "host": f"server-{i % 100}",
        "region": ["us-east", "eu-west", "ap-south"][i % 3],
        "cpu_usage": np.random.uniform(10, 95),
        "memory_usage": np.random.uniform(30, 80),
    })
 
write_metrics_batch(metrics)

Setting Up QuestDB with High-Volume Ingestion

// QuestDB ILP (InfluxDB Line Protocol) client for maximum throughput
import io.questdb.client.Sender;
 
public class MetricsWriter {
    private final Sender sender;
 
    public MetricsWriter() {
        this.sender = Sender.builder()
            .address("localhost:9009")
            .build();
    }
 
    public void writeMetric(String host, double cpuUsage, double loadAvg) {
        sender.table("metrics")
            .symbol("host", host)
            .doubleColumn("cpu_usage", cpuUsage)
            .doubleColumn("load_avg", loadAvg)
            .atNow(); // Uses current timestamp
    }
 
    public void writeBatch(List<Metric> metrics) {
        for (Metric m : metrics) {
            sender.table("metrics")
                .symbol("host", m.getHost())
                .doubleColumn("cpu_usage", m.getCpuUsage())
                .doubleColumn("load_avg", m.getLoadAvg())
                .at(m.getTimestamp(), ChronoUnit.NANOS);
        }
        sender.flush();
    }
}

Database performance benchmarking

Real-World Use Cases

Use Case 1: IoT Sensor Monitoring with TimescaleDB

A manufacturing company monitors 10,000 sensors across 50 factories, generating 1 million data points per minute.

-- Schema with device metadata
CREATE TABLE sensor_readings (
  time TIMESTAMPTZ NOT NULL,
  device_id INTEGER NOT NULL,
  factory_id INTEGER NOT NULL,
  metric_type TEXT NOT NULL,
  value DOUBLE PRECISION NOT NULL,
  quality SMALLINT DEFAULT 100  -- 0-100 quality score
);
 
SELECT create_hypertable('sensor_readings', 'time');
 
-- Continuous aggregate for dashboard
CREATE MATERIALIZED VIEW factory_metrics_5m
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('5 minutes', time) as bucket,
  factory_id,
  metric_type,
  AVG(value) as avg_value,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) as p95_value,
  COUNT(*) as sample_count,
  COUNT(*) FILTER (WHERE quality < 80) as low_quality_count
FROM sensor_readings
GROUP BY bucket, factory_id, metric_type;
 
-- Alerting query: detect anomalies
SELECT
  factory_id,
  metric_type,
  bucket,
  avg_value,
  LAG(avg_value) OVER (PARTITION BY factory_id, metric_type ORDER BY bucket) as prev_avg,
  ABS(avg_value - LAG(avg_value) OVER (PARTITION BY factory_id, metric_type ORDER BY bucket))
    / NULLIF(LAG(avg_value) OVER (PARTITION BY factory_id, metric_type ORDER BY bucket), 0)
    as change_pct
FROM factory_metrics_5m
WHERE bucket >= now() - INTERVAL '1 hour'
AND ABS(avg_value - LAG(avg_value) OVER (PARTITION BY factory_id, metric_type ORDER BY bucket))
    / NULLIF(LAG(avg_value) OVER (PARTITION BY factory_id, metric_type ORDER BY bucket), 0) > 0.5;

Use Case 2: Application Performance Monitoring with InfluxDB

// Node.js APM metrics collector using InfluxDB
import { InfluxDB, Point, WriteApi } from '@influxdata/influxdb-client';
 
class MetricsCollector {
  private writeApi: WriteApi;
  private buffer: Point[] = [];
  private flushInterval: NodeJS.Timeout;
 
  constructor(url: string, token: string, org: string, bucket: string) {
    const influxDB = new InfluxDB({ url, token });
    this.writeApi = influxDB.getWriteApi(org, bucket, 'ns');
    this.flushInterval = setInterval(() => this.flush(), 5000);
  }
 
  recordHttpRequest(method: string, path: string, status: number, durationMs: number) {
    const point = new Point('http_request')
      .tag('method', method)
      .tag('path', path)
      .tag('status', String(status))
      .floatField('duration_ms', durationMs)
      .timestamp(new Date());
 
    this.buffer.push(point);
    if (this.buffer.length >= 1000) this.flush();
  }
 
  recordDatabaseQuery(operation: string, table: string, durationMs: number, rowsAffected: number) {
    const point = new Point('db_query')
      .tag('operation', operation)
      .tag('table', table)
      .floatField('duration_ms', durationMs)
      .intField('rows_affected', rowsAffected)
      .timestamp(new Date());
 
    this.buffer.push(point);
  }
 
  private flush() {
    if (this.buffer.length === 0) return;
    const points = [...this.buffer];
    this.buffer = [];
    this.writeApi.writePoints(points);
  }
}
 
// Usage
const collector = new MetricsCollector(
  process.env.INFLUX_URL!,
  process.env.INFLUX_TOKEN!,
  'my-org',
  'apm'
);
 
// Middleware
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    collector.recordHttpRequest(req.method, req.path, res.statusCode, Date.now() - start);
  });
  next();
});

Use Case 3: Financial Tick Data with QuestDB

-- QuestDB: Optimized for high-frequency financial data
CREATE TABLE trades (
  time TIMESTAMP,
  symbol SYMBOL,
  price DOUBLE,
  volume DOUBLE,
  side SYMBOL  -- 'BUY' or 'SELL'
) TIMESTAMP(time) PARTITION BY DAY WAL
  DEDUP UPSERT KEYS(time, symbol);
 
-- Latest price for each symbol (very fast with LATEST ON)
SELECT symbol, price, volume, side
FROM trades
LATEST ON time PARTITION BY symbol;
 
-- OHLCV candlestick aggregation
SELECT
  symbol,
  time_bucket('5m', time) as candle_time,
  first(price) as open,
  max(price) as high,
  min(price) as low,
  last(price) as close,
  sum(volume) as volume
FROM trades
WHERE time >= '2024-01-01' AND symbol = 'AAPL'
SAMPLE BY 5m ALIGN TO CALENDAR;

Best Practices for Production

  1. Choose the right tool for your ecosystem: TimescaleDB if you're already on PostgreSQL; QuestDB for maximum write throughput; InfluxDB for cloud-managed time-series.
  2. Batch your writes: All three databases are designed for batch ingestion. Writing one point at a time is 100x slower than batching 1000 points.
  3. Use appropriate retention policies: Raw high-resolution data should be retained for days/weeks, downsampled data for months/years.
  4. Partition by time appropriately: Daily partitions for high-volume data, monthly for lower volume. Too many small partitions hurt query performance.
  5. Compress aggressively: Time-series data compresses extremely well (10-20x). Enable compression for data older than your active query window.
  6. Index tag/label columns: In all three systems, columns used in WHERE clauses (host, region, sensor_id) should be indexed or use symbol types.
  7. Monitor ingestion lag: Track write throughput vs query load. If queries slow down ingestion, route reads to replicas.
  8. Use continuous aggregates for dashboards: Pre-computed aggregates eliminate expensive real-time aggregation for common dashboard queries.

Downsampling and Continuous Aggregates

Raw time-series data is expensive to query at scale. A sensor reporting every second generates 86,400 data points per day—querying a month of raw data scans 2.5M rows just to render a dashboard. Downsampling reduces this by pre-computing aggregates at coarser time intervals.

-- TimescaleDB: Continuous aggregate that auto-refreshes
CREATE MATERIALIZED VIEW metrics_hourly
WITH (timescaledb.continuous) AS
SELECT
  time_bucket('1 hour', time) AS bucket,
  sensor_id,
  AVG(value) AS avg_value,
  MIN(value) AS min_value,
  MAX(value) AS max_value,
  PERCENTILE_CONT(0.95) WITHIN GROUP (ORDER BY value) AS p95_value,
  COUNT(*) AS sample_count
FROM metrics
GROUP BY bucket, sensor_id;
 
-- Add refresh policy: update hourly aggregates every hour
SELECT add_continuous_aggregate_policy('metrics_hourly',
  start_offset => INTERVAL '3 days',
  end_offset => INTERVAL '1 hour',
  schedule_interval => INTERVAL '1 hour'
);
 
-- Query the aggregate instead of raw data
SELECT * FROM metrics_hourly
WHERE bucket >= now() - INTERVAL '30 days'
AND sensor_id = 42;
-- InfluxDB v3: Task-based downsampling
-- Create a task that runs every hour to downsample data
CREATE TABLE metrics_1h AS
SELECT
  date_trunc('hour', time) AS bucket,
  host,
  AVG(cpu_usage) AS avg_cpu,
  MAX(cpu_usage) AS max_cpu,
  COUNT(*) AS samples
FROM cpu
GROUP BY bucket, host;
 
-- QuestDB: Sample by for in-query downsampling
SELECT
  ts,
  host,
  AVG(cpu_usage),
  MAX(cpu_usage),
  MIN(cpu_usage)
FROM cpu
WHERE ts >= dateadd('d', -30, now())
SAMPLE BY 1h;

Data Retention Policies

Time-series data loses value over time. A retention policy automatically drops old data, preventing unbounded disk growth:

-- TimescaleDB: Drop data older than 90 days
SELECT add_retention_policy('metrics', INTERVAL '90 days');
 
-- Keep hourly aggregates for 2 years
SELECT add_retention_policy('metrics_hourly', INTERVAL '2 years');
 
-- InfluxDB v3: Retention is configured at the database level
-- CREATE DATABASE metrics WITH DURATION 90d REPLICATION 1
 
-- QuestDB: Partition-based retention
-- Data is partitioned by day/month; drop old partitions
ALTER TABLE metrics DROP PARTITION WHERE ts < dateadd('d', -90, now());

A common strategy is tiered retention: keep raw data for 7 days, hourly aggregates for 90 days, and daily aggregates for 2 years. This balances query granularity with storage cost—dashboards query aggregates, debugging queries raw data.

Real-World Use Case: IoT Fleet Monitoring

Consider a fleet of 10,000 IoT devices reporting 50 metrics every 10 seconds. That's 50,000 data points per second, or 4.3 billion points per day. Here's how each database handles this:

QuestDB excels here with its ingestion pipeline:

-- QuestDB: Create table with designated timestamp
CREATE TABLE device_metrics (
  ts TIMESTAMP,
  device_id SYMBOL,
  metric_name SYMBOL,
  value DOUBLE
) TIMESTAMP(ts) PARTITION BY DAY WAL
DEDUP UPSERT KEYS(ts, device_id, metric_name);
 
-- Ingest via InfluxDB line protocol (QuestDB supports it natively)
-- POST /api/v2/write?db=device_metrics
-- device_metrics,device_id=dev-001,metric_name=temperature value=23.5 1699234800000000000

TimescaleDB handles it with hypertables and compression:

-- Enable compression on chunks older than 3 days
ALTER TABLE device_metrics SET (
  timescaledb.compress,
  timescaledb.compress_segmentby = 'device_id, metric_name',
  timescaledb.compress_orderby = 'ts DESC'
);
 
SELECT add_compression_policy('device_metrics', INTERVAL '3 days');

For this scale, QuestDB typically achieves 3-5x faster ingestion while TimescaleDB provides richer query capabilities (JOINs with device metadata, window functions, lateral joins).

Common Pitfalls and Solutions

PitfallImpactSolution
High cardinality tagsMemory explosion in InfluxDBLimit tag values; use fields for high-cardinality data
Missing compression10x storage wasteEnable compression for data older than query window
No retention policyDisk fills upSet automatic data lifecycle policies
Querying raw data for dashboardsSlow dashboard loadsUse continuous aggregates or pre-computed views
Wrong partition intervalToo many small partitionsMatch partition interval to data volume (daily for high-volume)

Performance Optimization

-- TimescaleDB: Parallel query execution
SET max_parallel_workers_per_gather = 4;
SET timescaledb.max_parallel_chunk_size = 4;
 
-- QuestDB: Optimize symbol columns
ALTER TABLE metrics ALTER COLUMN host ADD INDEX;
ALTER TABLE metrics ALTER COLUMN region ADD INDEX;
 
-- InfluxDB: Configure compaction for write-heavy workloads
-- (Configuration in influxdb3 server settings)

Comparison with Alternatives

FeatureTimescaleDBInfluxDB v3QuestDBPrometheus
Write throughput500K pts/s1M+ pts/s4M+ pts/s100K pts/s
Query languageSQLSQL + InfluxQLSQL + extensionsPromQL
Full SQLYes (PostgreSQL)Yes (FlightSQL)MostlyNo
JoinsYesLimitedLimitedNo
Compression10-20x10-20x10-20xLimited
Cloud managedTimescale CloudInfluxDB CloudQuestDB CloudVarious
Best forPG ecosystemIoT/metricsHigh-speed ingestionK8s monitoring

Testing Strategies

describe('TimescaleDB Performance', () => {
  test('insert 100k points within 5 seconds', async () => {
    const start = Date.now();
    const points = generateTestPoints(100000);
 
    await pool.query(
      `INSERT INTO metrics (time, sensor_id, value)
       SELECT unnest($1::timestamptz[]), unnest($2::int[]), unnest($3::float[])`,
      [points.map(p => p.time), points.map(p => p.sensorId), points.map(p => p.value)]
    );
 
    expect(Date.now() - start).toBeLessThan(5000);
  });
 
  test('hourly aggregate query under 100ms', async () => {
    const start = Date.now();
    await pool.query(
      `SELECT time_bucket('1 hour', time) as bucket, AVG(value)
       FROM metrics WHERE time >= now() - INTERVAL '7 days'
       GROUP BY bucket ORDER BY bucket DESC`
    );
    expect(Date.now() - start).toBeLessThan(100);
  });
});

Future Outlook

The TSDB landscape is converging toward SQL compatibility. InfluxDB v3 adopted Apache Arrow and SQL, TimescaleDB continues adding time-series features to PostgreSQL, and QuestDB is expanding its SQL coverage. The rise of edge computing is driving demand for embedded TSDBs that can run on resource-constrained devices and sync with cloud backends.

Conclusion

Time-series databases are essential for any workload involving timestamped data at scale. Key takeaways:

  1. TimescaleDB is the safest choice for PostgreSQL teams—full SQL, familiar tooling, and excellent compression
  2. QuestDB wins on raw write throughput—ideal for high-frequency data ingestion
  3. InfluxDB v3 offers the best cloud-managed experience with strong IoT ecosystem support
  4. All three benefit massively from batch writes, compression, and continuous aggregates
  5. Choose based on your existing ecosystem, not just benchmarks

Start by migrating your highest-volume time-series data from your general-purpose database to a dedicated TSDB. The storage savings alone will justify the migration within weeks.