Cloudflare R2 for Video Storage

@andrei-gae
@andrei-gae

Hi Tom,

I just read your fantastic article on Adocasts, "Testing Cloudflare R2 for Video Storage," and wanted to say, great job. The solution you've outlined is really clever and I found the implementation very interesting.

I'm writing because I had a few follow-up questions, and I'd love it if you have a moment to elaborate on a couple of technical details:

  • Costs: Would it be possible to share any estimates on the costs you're seeing with this new setup? I'm especially interested in the comparison with what you were paying for Bunny Stream.

  • Worker Configuration: I'm fascinated by how you handled private access and caching. Could you share any more hints about the Cloudflare Worker's configuration? Any details on the logic you're applying would be a great help.

  • Cache Performance & Flow: How is the cache performing in practice? And more importantly, how are you applying it? Does the cache primarily operate between the worker and the end-user, or are you also using it to optimize requests between the worker and the R2 bucket?

And this brings me to my main question, which is about video protection vs. caching:

  • If each user needs to be authenticated to get a unique, authorized URL (similar to a presigned URL) to access a video, how do you manage to keep the Cloudflare cache effective? Traditionally, unique URLs aren't "cacheable." Did you have to make the R2 bucket public to get the cache to work, or did you find an ingenious way for the Worker to handle the user's authentication and still serve a cacheable response from Cloudflare's edge?

I totally understand if you're busy, but any insight you could share on that last point, in particular, would be incredibly helpful.

Again, congratulations on the article and thanks so much for sharing your knowledge.

Best,

Create a free account to join in on the discussion
  1. @tomgobich

    Hi Andrei! Absolutely, happy to answer any questions you have!

    Costs

    To preface, I haven't yet transitioned our older Bunny Stream videos to Cloudflare R2 (still going to, just need to find the time). Additionally, Cloudflare R2 offers 10GB/mo of free R2 storage and I've only been running R2 for about 10 months now. So, I was under the free tier for a few months there. Also worth noting, Cloudflare won't bill you until your accrued charges are high enough to be worth a bill.

    So, I think I went over the free tier for R2 storage around March. Since then, I've only been billed twice for a total of $0.45. I'm still under their A & B operation free tier as well as the worker free tier and I'm not sure I'll ever go over those monthly amounts, so that 45 cents is purely from R2 storage.

    Bunny Stream on the other hand I was seeing a bit of a compounding effect on our charges. The more videos released meant more views. They also, at least at the time, have/had a $1/mo minimum charge. I exclusively used Bunny Stream for about 14 months starting under that $1/mo minimum charge and by the end of the 14 months it creeped up to around $5/mo… which is still super affordable compared to most other options. As I believe I mentioned in that blog post though, my fear was the compounding effect I was seeing due to egress.

    Worker Configuration, Video Authorization, & Caching

    First, the R2 bucket is private and can only be accessed via token. The worker serves as an access point to request files from the R2 bucket.

    The authorization on whether the user has permission to watch the video is done directly on the Adocasts site. When authorized, it bundles up an HMAC hashed message and passes it along via a header with the requests to the worker. The worker will then also bundle up an HMAC message for the requested video. If the messages match, then the user can watch. It essentially just ensures the request hasn't been tampered with and originated from the Adocasts site. I chose this approach because I didn't want to have to relay back to the Adocasts site from the worker to check for authorization.

    Here's how I'm building that message on the Adocasts side:

    const version = 'v1'
    const userId = user?.id ?? 'NA'
    const videoId = post.videoR2Id
    const expiration = DateTime.now().plus({ hours: 48 }).toISO()
    const payload = [version, userId, videoId, expiration].join('|')
    const signature = createHmac('sha256', env.get('R2_SIGNING_KEY'))
      .update(payload)
      .setEncoding('base64')
      .digest('hex')
    Copied!

    Due to the authorization being needed in the worker, requests between Adocasts and the worker aren't cached. However, requests between our worker and R2 bucket are. I don't track cache-hits so I'm not sure exactly how much its helping. That's one down side to using R2 instead of Bunny Stream, Bunny Stream does come with a bunch of out-of-the-box statistics.

    There's probably room for improvement, but here's specifically what I'm doing to cache… nothing special:

    const cacheKey = new Request(url.toString(), request)
    const cache = caches.default
    
    let response = await cache.match(cacheKey)
    
    // if cached, return directly
    if (response) {
      return response
    }
    
    // ... [fetching file]
    
    // add to cache
    ctx.waitUntil(cache.put(cacheKey, response.clone()))
    Copied!

    Hope this helps!!

    1
    1. Responding to tomgobich

      Hi Tom,

      Thanks for the detailed response! Ah, that makes perfect sense. The key is that the cache sits between the Worker and R2, not between the end-user (Adocast) and the Worker. That completely clears things up.

      This actually sparks a follow-up question because my initial approach was different, and your answer has raised a new, exciting possibility if my understanding of how Workers operate is correct.

      To give you some context, I was experimenting with making an R2 bucket public via a custom domain and then setting an aggressive "Cache Everything" rule on Cloudflare. My main goal was to leverage Cloudflare's global CDN to its fullest. I ran a quick test and saw that when I accessed a file from Spain, I got a cache HIT served directly from the Madrid data center, which was fantastic for performance.

      My assumption was that if a user from the US requested the same file, they would get a cache HIT from a local US data center, effectively creating a globally distributed cache with minimal R2 operational costs. The obvious and critical downside, as you know, is the complete lack of security.

      This brings me to my follow-up, just to make sure I'm understanding the power of Workers correctly.

      Based on my research, it seems that Workers are true "edge" functions, meaning they execute on the Cloudflare data center closest to the end-user.

      If that's the case, does this also mean that the caches.default you're using is local to that specific data center?

      If so, I'm beginning to realize that your solution might be the best of both worlds.

      For example:

      - A user from Madrid makes a request. It hits the Madrid data center, the Worker runs there, validates the HMAC, and on a cache miss, pulls from R2 and caches the file in Madrid.

      - Later, a user from the US makes a request. It hits a US data center, the Worker runs there, validates the HMAC, and on a cache miss, it would also pull from R2 and cache the file in that US data center.

      Am I interpreting this correctly? If so, it would mean that your setup provides the exact same global CDN performance and caching benefits as the "public bucket" method, but with the crucial authentication layer on top?

      I just wanted to briefly confirm if this understanding is right, or if I'm missing a nuance in how Workers are deployed.

      Thanks again for your time and for sharing your expertise. This has been incredibly helpful!

      Best,

      1
      1. Responding to andrei-gae
        @tomgobich

        Anytime, Andrei!! Yep exactly, so long as you aren't using tiered caching, then your understanding is spot on! If you hit a Worker in the Madrid data center and that Worker performs a put into the cache, that item will be cached in Madrid. Tiered caching, however, will use origin servers, limiting the number of data centers it'll use to a select few. The Cache API doesn't support tiered caching though, so if you use that you'll be all set.

        Cache API is local to a data center, this means that cache.match does a lookup, cache.put stores a response, and cache.delete removes a stored response only in the cache of the data center that the Worker handling the request is in.

        I can't guarantee the Worker approach will be as performant as the global CDN, but so long as your Worker is lightweight, it should be fairly close! Also, unless you're an enterprise user, the cache file size limit is the same between Workers and the global CDN as well, 512MB.

        0
New Discussion
Topic