У нас вы можете посмотреть бесплатно Why Uploading to S3 Isn’t Enough: The Evolution of Large File Transfer Architecture или скачать в максимальном доступном качестве, видео которое было загружено на ютуб. Для загрузки выберите вариант из формы ниже:
Если кнопки скачивания не
загрузились
НАЖМИТЕ ЗДЕСЬ или обновите страницу
Если возникают проблемы со скачиванием видео, пожалуйста напишите в поддержку по адресу внизу
страницы.
Спасибо за использование сервиса ClipSaver.ru
When Hugging Face’s XET team (https://xethub.com/blog/rearchitectin...) analyzed 8.2 million upload requests transferring 130.8 TB in a single day, they discovered that basic S3 uploads couldn’t cut it anymore. This article walks through the architectural evolution from simple blob storage to sophisticated content-addressed systems, showing why companies like Hugging Face, Dropbox (https://www.hellointerview.com/learn/...) , and YouTube (https://www.hellointerview.com/learn/...) all converged on similar patterns: CDNs for global distribution, chunking for reliability, and smart deduplication for efficiency. You’ll learn why the “obvious” solution is never enough when you’re moving terabytes across continents. Check out the full article at medium. ( / why-uploading-to-s3-isnt-enough-the-evolut... ) Introduction Here’s the thing nobody tells you about cloud storage: uploading a file to S3 is easy. Uploading 131GB of model weights from Singapore to Virginia at 3 AM when your internet decides to hiccup? That’s a completely different problem. Hugging Face learned this the hard way. They’re running one of the largest collections of ML models and datasets in the world, with uploads streaming in from 88 countries. Meta’s Llama 3 70B model alone weighs 131GB, split across 30 files because nobody wants to babysit a single file upload for two hours. And here’s the kicker: their infrastructure was starting to crack under the pressure. The XET team (Hugging Face’s infrastructure wizards) sat down with 24 hours of upload data. 8.2 million requests. 130.8 TB transferred. Traffic from everywhere: California at breakfast, Frankfurt at lunch, Seoul at midnight. And their current setup, S3 with CloudFront CDN, was hitting a wall. CloudFront has a 50GB file size limit. S3 Transfer Acceleration helps, but it doesn’t solve the fundamental problem: you’re still treating files like opaque blobs. This is the same wall that Dropbox hit when syncing became a bottleneck. The same wall YouTube crashed into when 50GB raw video uploads kept timing out. The pattern repeats because the physics don’t change: large files + unreliable networks + global users = you need a better architecture. Let me show you how they solved it. The Naive Approach: Just Use S3 (And Watch It Burn) When you’re starting out, the S3 solution looks perfect. User uploads a file, you generate a presigned URL, they POST directly to S3, boom, done. Dropbox started this way. YouTube did too. Everyone does. Here’s what that looks like in practice: This works great until it doesn’t. And it stops working the moment any of these things happen: The timeout problem. Let’s do the math. You’ve got a 50GB file and a 100 Mbps connection (which is actually pretty good). That’s 50GB × 8 bits/byte ÷ 100 Mbps = 4,000 seconds. Divide by 3,600 and you get 1.11 hours. Your API Gateway times out at 30 seconds. Your web server gives up after 2 minutes. The user’s browser shows a spinning wheel for over an hour with zero feedback. One hiccup in the connection and the entire upload fails. The size ceiling. CloudFront, which you’re probably using for downloads, caps out at 30GB for single file delivery [1]. API Gateway? 10MB payload limit, non-negotiable [2]. Even if you bypass the gateway and go straight to S3, you’re asking users to upload massive files over a single HTTP connection. That’s fragile as hell. The geography problem. Hugging Face’s S3 bucket sits in us-east-1 (Virginia). When someone in Singapore uploads a 10GB dataset, that data is traveling 9,000 miles. Every packet, every retry, every byte. There’s no caching on uploads. No edge acceleration that actually helps. It’s just your file crawling across the Pacific. Dropbox hit this exact issue early on. They had users uploading multi-gigabyte folders, watching progress bars freeze, then having to restart from scratch. YouTube’s story was even worse because video files are huge by nature. A 4K raw video shoot can easily be 100GB+, and filmmakers don’t have patience for “please try again” error messages. The fundamental problem: you’re treating the network like it’s reliable and the file like it’s atomic. It’s neither. So what’s the first fix? Bring the data closer to the user. CDNs: Moving the Goalpost (But Only Halfway) Content Delivery Networks sound like magic. You put your files in S3, flip on CloudFront, and suddenly users worldwide get fast downloads because the CDN caches files at 400+ edge locations. Someone in Tokyo requests a file? It’s served from Tokyo, not Virginia. Latency drops from 200ms to 20ms. Problem solved, right? For downloads, absolutely. This is why YouTube doesn’t melt when a viral video gets 10 million views in an hour. The video chunks get cached at edge locations. The origin se...