Monitoring Cloudflare via... Cloudflare?
At Trendyol, we use various technologies to handle traffic and Cloudflare is one of them.
To keep a low latency between our customers and our backends, we have to monitor if Cloudflare's Istanbul PoP has been rerouted or not.
And that brings us to cloudflarestatus.com.
Cloudflare Status Page
Cloudflare uses Atlassian's Statuspage for their own status site and they maintain a great source of service statuses and incidents.
That can't be said to how Atlassian provides the Statuspage Client API.
The pain that is the Statuspage Client API
If you look at https://www.cloudflarestatus.com/api you'd realize it is a carbon copy of the Statuspage Client API documentation with the difference of s/cloudflare( )?(status)?/statuspage/gi
.
Because Statuspage is a generic solution to create status pages, each entry is counted as a “component”. So they conveniently provide the list of every component at /api/v2/components.json
with an array of objects.
However,
- Components can have sub-components
- Components might reference other components
- Sub-components can have duplicates of parent components
And this is a problem.
An example entry in the components array is as follows:
{
"id": "1km35smx8p41",
"name": "Cloudflare Sites and Services",
"status": "operational",
"created_at": "2014-10-27T21:59:30.264Z",
"updated_at": "2020-11-11T20:12:30.167Z",
"position": 1,
"description": "Sites and services that Cloudflare customers use to interact with the Cloudflare Network and its provided services",
"showcase": false,
"start_date": null,
"group_id": null,
"page_id": "yh6f0r4529hb",
"group": true,
"only_show_if_degraded": false,
"components": ["3sq3s4d20ywk"]
}
You might realize that the component ID is referenced inside the object. And if we want to get a single component, we'd have to process every component first.
So we create a function to convert the array into a key-value object list:
/**
* @param {Array.<Object>} arr
* @returns {Object.<string, Object>}
*/
const processComponents = (arr) => {
let comps = {};
for (let component of arr) {
comps[component.id] = {
"name": component.name || null,
"status": component.status || null,
"created_at": Date.parse(component.created_at) || null,
"updated_at": Date.parse(component.updated_at) || null,
"position": component.position,
"description": component.description || null,
"showcase": component.showcase,
"start_date": Date.parse(component.start_date) || null,
"group_id": component.group_id || null,
"page_id": component.page_id || null,
"group": component.group,
"only_show_if_degraded": component.only_show_if_degraded,
"components": component.components || []
};
}
return comps;
};
Sprinkle it with some fetch()
magic and you have a Cloudflare Workers script. I know, ironic, right?
This is only one half of the equation though, we now need an another script for checking the PoP we want.
The summary of this new script is:
- Script is triggered via Cron Triggers inside CF Workers.
- Script checks its linked Worker Key-Value Store for:
- A Slack Webhook Address,
- The component ID being tracked.
- Script fetches the KV version of the component list using the endpoint served with the script we mentioned above.
- Script picks the tracked ID from the list and:
- Loads the
status
andupdated_at
entries. - Checks if the Worker KV store has the previous check saved, if it does (or doesn't);
- It checks if the current values are different to the previous one and saves the new values to the Worker KV store.
- Sends a webhook about this change.
- Loads the
And now we have a somewhat over-engineered status tracker for Cloudflare, hosted at Cloudflare:
That's pretty much it! I hope you enjoyed this wild ride.
Copyright 2018-2024, linuxgemini (İlteriş Yağıztegin Eroğlu). Any and all opinions listed here are my own and not representative of my employers; future, past and present.