This guide explains how to read and interpret the metrics UptimeIO provides for your monitors.
Dashboard Overview
When you log in, your dashboard shows key metrics at a glance:Uptime Percentage
Uptime percentage is the most important metric - it tells you what percentage of time your service was available.How Itโs Calculated
- Period: 24 hours (1,440 minutes)
- Downtime: 5 minutes
- Uptime: (1,440 - 5) / 1,440 ร 100 = 99.65%
Time Periods
UptimeIO shows uptime for multiple periods:| Period | Use Case |
|---|---|
| Last 24 hours | Recent performance |
| Last 7 days | Weekly trends |
| Last 30 days | Monthly SLA tracking |
| Last 90 days | Quarterly reporting |
Uptime is calculated from actual check results within your planโs data retention period.
Reading Uptime Values
| Uptime % | Status | Meaning |
|---|---|---|
| 100% | ๐ข Perfect | No downtime at all |
| 99.9%+ | ๐ข Excellent | < 43 minutes downtime per month |
| 99.5-99.9% | ๐ก Good | 43 minutes - 3.6 hours per month |
| 99.0-99.5% | ๐ก Fair | 3.6 - 7.2 hours per month |
| < 99.0% | ๐ด Poor | > 7.2 hours downtime per month |
Downtime Equivalents
Understanding what uptime percentages mean in real time:| Uptime % | Downtime per Day | Downtime per Week | Downtime per Month | Downtime per Year |
|---|---|---|---|---|
| 99.9% (three nines) | 1.4 minutes | 10 minutes | 43 minutes | 8.7 hours |
| 99.95% | 43 seconds | 5 minutes | 22 minutes | 4.4 hours |
| 99.99% (four nines) | 8.6 seconds | 1 minute | 4.3 minutes | 52 minutes |
| 99.999% (five nines) | 0.9 seconds | 6 seconds | 26 seconds | 5.3 minutes |
Response Time
Response time measures how long it takes for your service to respond to requests.Whatโs Measured
- DNS Resolution: Time to resolve domain to IP (10-50ms typical)
- TCP Connection: Time to establish connection (20-100ms typical)
- TLS Handshake: SSL/TLS negotiation (50-200ms typical)
- Server Response: Time for server to process and respond (varies)
Example Response Time
Response Time Metrics
UptimeIO shows multiple response time statistics:Average Response Time
Average Response Time
Mean response time across all checks in the period.Good for: General performance trends
Limitation: Can be skewed by outliers
Minimum Response Time
Minimum Response Time
Fastest response time recorded.Good for: Best-case performance
Use: Baseline for optimization
Maximum Response Time
Maximum Response Time
Slowest response time recorded.Good for: Identifying performance spikes
Use: Troubleshooting slow requests
P95 Response Time
P95 Response Time
95th percentile - 95% of requests were faster than this.Good for: Real-world user experience
Use: SLA targets (better than average)
Response Time Ranges
| Response Time | Status | User Experience |
|---|---|---|
| < 100ms | ๐ข Excellent | Instant, imperceptible |
| 100-300ms | ๐ข Good | Fast, slight delay |
| 300-1000ms | ๐ก Acceptable | Noticeable delay |
| 1000-3000ms | ๐ Slow | Frustrating |
| > 3000ms | ๐ด Very Slow | Unacceptable |
Response time expectations vary by service type. APIs should be < 500ms, while complex web pages can be 1-2 seconds.
Response Time Graph
The response time graph shows performance over time:Reading the Graph
Steady Line
Steady Line
Gradual Increase
Gradual Increase
Spikes
Spikes
- Traffic spikes
- Background jobs
- Database queries
- External API calls
Gaps
Gaps
Check Success Rate
Percentage of checks that succeeded vs failed.- Total checks: 1,000
- Successful: 998
- Failed: 2
- Success Rate: 998 / 1,000 ร 100 = 99.8%
Success rate is similar to uptime but counts individual checks rather than time periods.
Recent Checks
The Recent Checks section shows the last 10-20 checks:Check Details
Each check shows:- Status: โ Success or โ Failure
- Time: When the check ran
- Response Time: How long it took
- Status Code: HTTP status code (for HTTP monitors)
- Region: Where the check ran from
Incident Metrics
Mean Time Between Failures (MTBF)
Average time between incidents.- Period: 30 days
- Incidents: 3
- MTBF: 30 days / 3 = 10 days
Mean Time To Recovery (MTTR)
Average time to resolve incidents.- Total downtime: 15 minutes
- Incidents: 3
- MTTR: 15 / 3 = 5 minutes
Status Page Metrics
If you have a status page, additional metrics are available:Uptime Badge
Uptime Badge
Shows current uptime percentage:
Service Status
Service Status
Real-time status of each monitored service:
- ๐ข Operational
- ๐ก Degraded Performance
- ๐ด Major Outage
- ๐ต Under Maintenance
Incident History
Incident History
Public timeline of past incidents with:
- Start and end times
- Duration
- Impact description
- Resolution notes
Exporting Metrics
Export metrics for reporting or analysis:1
Go to Monitor Details
Click on any monitor from your monitors list.
2
Select Time Period
Choose the date range you want to export.
3
Export Data
Click โExportโ and choose format:
- CSV: For spreadsheets
- JSON: For programmatic access
- PDF: For reports
CSV Export Example
API Access to Metrics
Retrieve metrics programmatically via API:See the API Reference for complete documentation.
Understanding Trends
Improving Trends ๐
- โ Uptime increasing
- โ Response time decreasing
- โ Fewer incidents
Degrading Trends ๐
- โ ๏ธ Uptime decreasing
- โ ๏ธ Response time increasing
- โ ๏ธ More frequent incidents
- Check server resources (CPU, memory, disk)
- Review recent deployments
- Analyze error logs
- Check database performance
- Review third-party dependencies
Best Practices
Set realistic targets
Set realistic targets
Donโt aim for 100% uptime - itโs unrealistic:
- Critical services: 99.9% (three nines)
- Important services: 99.5%
- Non-critical: 99.0%
Monitor trends, not just current values
Monitor trends, not just current values
A single data point doesnโt tell the story. Look at:
- Week-over-week changes
- Month-over-month trends
- Time-of-day patterns
- Day-of-week patterns
Investigate anomalies
Investigate anomalies
When you see unusual metrics:
- Check incident timeline
- Review recent changes
- Compare with other monitors
- Check external dependencies
- Review server logs
Use metrics for capacity planning
Use metrics for capacity planning
Track response time trends to predict when youโll need to scale:
- Gradual increases = growing load
- Spikes at specific times = traffic patterns
- Steady increases = resource exhaustion