Reading Metrics

This guide explains how to read and interpret the metrics UptimeIO provides for your monitors.

Dashboard Overview

When you log in, your dashboard shows key metrics at a glance:

┌─────────────────────────────────────┐
│ Organization Overview               │
├─────────────────────────────────────┤
│ Total Monitors: 15                  │
│ Active: 12 | Paused: 2 | Down: 1    │
│                                     │
│ Overall Uptime (24h): 99.8%        │
│ Average Response Time: 145ms        │
│ Active Incidents: 1                 │
└─────────────────────────────────────┘

The dashboard provides a quick health check of all your services.

Uptime Percentage

Uptime percentage is the most important metric - it tells you what percentage of time your service was available.

How It’s Calculated

Uptime % = (Total Time - Downtime) / Total Time × 100

Example:

Period: 24 hours (1,440 minutes)
Downtime: 5 minutes
Uptime: (1,440 - 5) / 1,440 × 100 = 99.65%

Time Periods

UptimeIO shows uptime for multiple periods:

Period	Use Case
Last 24 hours	Recent performance
Last 7 days	Weekly trends
Last 30 days	Monthly SLA tracking
Last 90 days	Quarterly reporting

Uptime is calculated from actual check results within your plan’s data retention period.

Reading Uptime Values

Uptime %	Status	Meaning
100%	🟢 Perfect	No downtime at all
99.9%+	🟢 Excellent	< 43 minutes downtime per month
99.5-99.9%	🟡 Good	43 minutes - 3.6 hours per month
99.0-99.5%	🟡 Fair	3.6 - 7.2 hours per month
< 99.0%	🔴 Poor	> 7.2 hours downtime per month

Downtime Equivalents

Understanding what uptime percentages mean in real time:

Uptime %	Downtime per Day	Downtime per Week	Downtime per Month	Downtime per Year
99.9% (three nines)	1.4 minutes	10 minutes	43 minutes	8.7 hours
99.95%	43 seconds	5 minutes	22 minutes	4.4 hours
99.99% (four nines)	8.6 seconds	1 minute	4.3 minutes	52 minutes
99.999% (five nines)	0.9 seconds	6 seconds	26 seconds	5.3 minutes

Most SLAs target 99.9% (three nines) uptime, which allows for about 43 minutes of downtime per month.

Response Time

Response time measures how long it takes for your service to respond to requests.

What’s Measured

Total Response Time = DNS + TCP + TLS + Server Response

Breakdown:

DNS Resolution: Time to resolve domain to IP (10-50ms typical)
TCP Connection: Time to establish connection (20-100ms typical)
TLS Handshake: SSL/TLS negotiation (50-200ms typical)
Server Response: Time for server to process and respond (varies)

Example Response Time

Monitor: API Server
Total Response Time: 245ms

Breakdown:
├─ DNS Resolution: 12ms
├─ TCP Connection: 45ms
├─ TLS Handshake: 78ms
└─ Server Response: 110ms

Response Time Metrics

UptimeIO shows multiple response time statistics:

Average Response Time

Mean response time across all checks in the period.

Average: 145ms

Good for: General performance trends Limitation: Can be skewed by outliers

Minimum Response Time

Fastest response time recorded.

Minimum: 85ms

Good for: Best-case performance Use: Baseline for optimization

Maximum Response Time

Slowest response time recorded.

Maximum: 450ms

Good for: Identifying performance spikes Use: Troubleshooting slow requests

P95 Response Time

95th percentile - 95% of requests were faster than this.

P95: 245ms

Good for: Real-world user experience Use: SLA targets (better than average)

Response Time Ranges

Response Time	Status	User Experience
< 100ms	🟢 Excellent	Instant, imperceptible
100-300ms	🟢 Good	Fast, slight delay
300-1000ms	🟡 Acceptable	Noticeable delay
1000-3000ms	🟠 Slow	Frustrating
> 3000ms	🔴 Very Slow	Unacceptable

Response time expectations vary by service type. APIs should be < 500ms, while complex web pages can be 1-2 seconds.

Response Time Graph

The response time graph shows performance over time:

Response Time (ms)
┤                                    ╭─╮
┤                          ╭─╮      │ │
┤              ╭─╮    ╭───╯ ╰──╮   │ ╰─╮
┤      ╭───────╯ ╰────╯         ╰───╯   ╰─╮
┤──────╯                                  ╰──
    └────────────────────────────────────────────
00  14:00  16:00  18:00  20:00  22:00

Reading the Graph

Steady Line

200ms ──────────────────────────

Meaning: Consistent performance Status: ✅ Healthy

Gradual Increase

300ms              ╭───────
200ms      ╭───────╯
100ms ─────╯

Meaning: Performance degrading Action: Investigate resource usage

Spikes

500ms      ╭╮
300ms      ││    ╭╮
100ms ─────╯╰────╯╰─────

Meaning: Intermittent slowdowns Action: Check for:

Traffic spikes
Background jobs
Database queries
External API calls

Gaps

200ms ────    ────    ────

Meaning: Failed checks (no response) Status: ⚠️ Downtime periods

Check Success Rate

Percentage of checks that succeeded vs failed.

Success Rate = Successful Checks / Total Checks × 100

Example:

Total checks: 1,000
Successful: 998
Failed: 2
Success Rate: 998 / 1,000 × 100 = 99.8%

Success rate is similar to uptime but counts individual checks rather than time periods.

Recent Checks

The Recent Checks section shows the last 10-20 checks:

Recent Checks
─────────────────────────────────────────────
✅ 10:45:00 | 145ms | 200 OK | US East
✅ 10:40:00 | 152ms | 200 OK | Europe
✅ 10:35:00 | 138ms | 200 OK | Asia
❌ 10:30:00 | N/A   | Timeout | US East
✅ 10:25:00 | 141ms | 200 OK | Europe

Check Details

Each check shows:

Status: ✅ Success or ❌ Failure
Time: When the check ran
Response Time: How long it took
Status Code: HTTP status code (for HTTP monitors)
Region: Where the check ran from

Recent checks help you spot patterns like specific regions having issues or time-of-day performance variations.

Incident Metrics

Mean Time Between Failures (MTBF)

Average time between incidents.

MTBF = Total Uptime / Number of Incidents

Example:

Period: 30 days
Incidents: 3
MTBF: 30 days / 3 = 10 days

Higher is better - longer time between incidents means more reliability.

Mean Time To Recovery (MTTR)

Average time to resolve incidents.

MTTR = Total Downtime / Number of Incidents

Example:

Total downtime: 15 minutes
Incidents: 3
MTTR: 15 / 3 = 5 minutes

Lower is better - faster recovery means less impact.

Status Page Metrics

If you have a status page, additional metrics are available:

Uptime Badge

Shows current uptime percentage:

╔═══════════════╗
║  99.9% Uptime ║
║  Last 30 Days ║
╚═══════════════╝

Service Status

Real-time status of each monitored service:

🟢 Operational
🟡 Degraded Performance
🔴 Major Outage
🔵 Under Maintenance

Incident History

Public timeline of past incidents with:

Start and end times
Duration
Impact description
Resolution notes

Exporting Metrics

Export metrics for reporting or analysis:

Go to Monitor Details

Click on any monitor from your monitors list.

Select Time Period

Choose the date range you want to export.

Export Data

Click “Export” and choose format:

CSV: For spreadsheets
JSON: For programmatic access
PDF: For reports

CSV Export Example

timestamp,status,response_time_ms,status_code,region
2024-01-15T10:00:00Z,success,145,200,us-east
2024-01-15T10:05:00Z,success,152,200,europe
2024-01-15T10:10:00Z,failure,0,0,asia
2024-01-15T10:15:00Z,success,138,200,us-east

API Access to Metrics

Retrieve metrics programmatically via API:

# Get monitor status
curl https://api.uptimeio.com/v1/monitors/mon_abc123/status \
  -H "Authorization: Bearer YOUR_API_KEY"

# Response
{
  "success": true,
  "data": {
    "current_status": "up",
    "uptime": {
      "last_24h": 99.95,
      "last_7d": 99.87,
      "last_30d": 99.92
    },
    "response_time": {
      "last_24h": 145,
      "last_7d": 152,
      "last_30d": 148
    }
  }
}

See the API Reference for complete documentation.

Understanding Trends

Improving Trends 📈

Uptime: 99.5% → 99.8% → 99.9%
Response Time: 200ms → 180ms → 150ms

Indicators:

✅ Uptime increasing
✅ Response time decreasing
✅ Fewer incidents

Action: Keep monitoring, document what’s working

Degrading Trends 📉

Uptime: 99.9% → 99.7% → 99.5%
Response Time: 150ms → 200ms → 250ms

Indicators:

⚠️ Uptime decreasing
⚠️ Response time increasing
⚠️ More frequent incidents

Action: Investigate immediately:

Check server resources (CPU, memory, disk)
Review recent deployments
Analyze error logs
Check database performance
Review third-party dependencies

Best Practices

Set realistic targets

Don’t aim for 100% uptime - it’s unrealistic:

Critical services: 99.9% (three nines)
Important services: 99.5%
Non-critical: 99.0%

Account for planned maintenance and deployments.

Monitor trends, not just current values

A single data point doesn’t tell the story. Look at:

Week-over-week changes
Month-over-month trends
Time-of-day patterns
Day-of-week patterns

Investigate anomalies

When you see unusual metrics:

Check incident timeline
Review recent changes
Compare with other monitors
Check external dependencies
Review server logs

Use metrics for capacity planning

Track response time trends to predict when you’ll need to scale:

Gradual increases = growing load
Spikes at specific times = traffic patterns
Steady increases = resource exhaustion

Next Steps

Creating Monitors

Set up monitoring for your services

Understanding Incidents

Learn about incident detection

Setting Up Alerts

Configure notifications

API Reference

Access metrics programmatically

Getting Started

Monitor Types

Essentials

Notifications

Status Pages

Integrations

Plans & Billing

Dashboard Overview

Uptime Percentage

How It’s Calculated

Time Periods

Reading Uptime Values

Downtime Equivalents

Response Time

What’s Measured

Example Response Time

Response Time Metrics

Response Time Ranges

Response Time Graph

Reading the Graph

Check Success Rate

Recent Checks

Check Details

Incident Metrics

Mean Time Between Failures (MTBF)

Mean Time To Recovery (MTTR)

Status Page Metrics

Exporting Metrics

CSV Export Example

API Access to Metrics

Understanding Trends

Improving Trends 📈

Degrading Trends 📉

Best Practices

Next Steps

Creating Monitors

Understanding Incidents

Setting Up Alerts

API Reference

Getting Started

Monitor Types

Essentials

Notifications

Status Pages

Integrations

Plans & Billing

​Dashboard Overview

​Uptime Percentage

​How It’s Calculated

​Time Periods

​Reading Uptime Values

​Downtime Equivalents

​Response Time

​What’s Measured

​Example Response Time

​Response Time Metrics

​Response Time Ranges

​Response Time Graph

​Reading the Graph

​Check Success Rate

​Recent Checks

​Check Details

​Incident Metrics

​Mean Time Between Failures (MTBF)

​Mean Time To Recovery (MTTR)

​Status Page Metrics

​Exporting Metrics

​CSV Export Example

​API Access to Metrics

​Understanding Trends

​Improving Trends 📈

​Degrading Trends 📉

​Best Practices

​Next Steps

Creating Monitors

Understanding Incidents

Setting Up Alerts

API Reference

Dashboard Overview

Uptime Percentage

How It’s Calculated

Time Periods

Reading Uptime Values

Downtime Equivalents

Response Time

What’s Measured

Example Response Time

Response Time Metrics

Response Time Ranges

Response Time Graph

Reading the Graph

Check Success Rate

Recent Checks

Check Details

Incident Metrics

Mean Time Between Failures (MTBF)

Mean Time To Recovery (MTTR)

Status Page Metrics

Exporting Metrics

CSV Export Example

API Access to Metrics

Understanding Trends

Improving Trends 📈

Degrading Trends 📉

Best Practices

Next Steps