NETFLIX System Design: How does Netflix onboard new content?

NETFLIX System Design: How does Netflix onboard new content?


Sometimes it feels like I am a god. Hi everyone, today we’ll be talking about how NETFLIX onboards new content onboards new content onto their platform So if you have a TV series or a movie and you want to get it uploaded on NETFLIX, apart from the legal challenges there’s also engineering challenges that NETFLIX solves. I’ll be keeping this video as simple as possible so that the maximum number of viewers can understand what’s going on. But there will be some technical details when it comes to video encoding and other technical processes. Firstly what kind of challenges will we face when we are uploading new content ? Well, we need to store it in different formats sometimes you might be knowing about MP4, AVI and other formats The reason they have this is because different people have different internet connection speeds. So if you have a really good internet connection speed and you can deal with a really difficult format for example a detailed one where the data loss is minimum and you want to see like maximum video quality And then you’ll have something like medium quality and low quality too. So, all of these are nothing but codecs. A codec is a way in which you compress video. So originally, like this video right now, It’s going to be taking a lot of detail, but when I edit this video I’ll make sure that the size of the file is not huge. I’ll try to keep it within 1GB. So that is one type of codec. If I reduce the quality more then the size of the file reduces because it is lossy compression. I’m losing some data to keep the file size smaller. And the second thing that NETFLIX does is play with different resolutions. If you are watching on a cell phone, then the resolution that you need is much lesser than the resolution you need on your “What’s it called” TV or even on your laptop. In this way you’re seeing that a single video has multiple formats and multiple resolutions and each of these formats and resolutions are creating tuples like they are creating pairs. You have high quality 720p The number of formats lets call that F into(multiplied by) the number of resolutions R are the number of videos that you’ll end up processing. If the engineers in the Netflix come up with much better technique of storing data. Let’s say you had high quality requiring you 6GB Now its just requiring you 1GB. Then you take the older movies that you had encoded which are 6GB big. You run them through the new process and it becomes 1 GB. But the thing is this process is going to take some time.So you don’t want to give all this responsibility to a single computer because it’s going to take time and it has a chance of failing. (what if the computer shuts down?) So what netflix does is really interesting and very smart. It takes the original video and breaks it into chunks. Now what you can do with each of these chunks is to run them through different resolutions and different formats. At the end of it, you will have this chunk lets say chunk A .mp4 So that’s a format In resolution 1020.Then you will have A in avi may be 480 and so on and so forth. Effectively you have taken a really big video and broken it into small parts, so that you can deal with it effectively per processor One resolution, one format, one chunk That’s one task. The story of processing these chunks is pretty interesting. Initially what used to happen is, you would have this video file and you would break it into chunks of 3 minutes each.So that’s equal size it looks good because every processor is doing the equal amount of work and you can actually quantify it. But the thing is imagine an action movie and at the 3rd minute the two cars-the villain’s car is just about to overtake the hero’s- and then you have a new chunk. If that’s the case and someone makes an API call for this chunk it’s going to take time. Like initially you are watching this video you come to this point, you get an API call and there is a lag. The user experience is bad because you wanted to see that seamlessly. What they ended up doing is breaking the chunks not based on time stamps but based on scenes. So you can make this instead of 3 minute thing, you can make it much more fine grained 4 secs each. It’s called a shot one shot 4 seconds and you can collate shots, put them all together to create a scene So that’s the car scene you can think about. Instead of having it arbitrarily stop at 3 minutes you collate them into scenes and each scene has a lot of chunks. 4 second long chunks. Right. Now if a person is watching a video and they click on some point.The video suggestion algorithm will take this as one scene. And the user experience will be much better because you get the entire block fetched together. In fact this algorithm is much more complicated. What happens is netflix sees the entire movie and treats it like a set of chunks. If you arbitrarily go to points then netflix assumes that this movie is a sparse movie, in the sense that you go one point and you see a scene and then you head to next point and then you see a scene and so on and so forth So its recommendation algorithm, its prediction algorithm is going to say that this is a sparse movie or sparsely seen movie and what we should be doing is not trying to be too smart not trying to be sending a lot of data, instead just send the data that the user has asked for because they are probably clicking on different points in that buffer that you get. On the other hand if it’s a very engaging movie lets say, I don’t know whats an engaging movie but something that is dense movie meaning that people are watching it for a continuous period of time and you can easily say that you know linearly that this part is going to be picked up next. Then this is called a dense movie. Instead of sending just the part that you have asked for it predicatively, proactively fetches the future parts, gets it onto your computer and shows it to you. If you are wondering where netflix stores all this data, then its like google drive called Amazon S3 Something that nearly all the engineers know. This is where people store their static data meaning that you don’t change that data, you can go and store stuff. It’s extremely cheap compared to a database because a database has updates and gives you other guarantees also. So Amazon S3 is what netflix uses to store that video content. The most interesting thing about netflix is that they were able to bring up an innovative solution to something that was there in the internet space for ages. You know about internet service providers. If you go on your browser right now and type facebook.com. What’s going to happen is that you will talk to your internet service provider. They have a list of addresses.They map that to IP addresses. So if you facebook.com,its mapped to an IP address: they have a table over here, which maps it. And this IP address is, you can assume it to be physical place. Its actually a computer some where on the internet which is giving you Facebook. So you are literally talking to Facebook when you say facebook.com. So that’s, let’s say, over here. Very similarly when you say Netflix, it is an IP address. It’s going to be taking you to a computer which gives you Netflix or is Netflix basically. So you can actually, end up chatting with it maybe. But Netflix exists somewhere and every time you ask your internet service provider to talk to netflix, it goes and talks to that computer and then returns you the response. These servers are usually in the U.S which means they are geographically concentrated. In a place like India which is really far its going to take a lot of time to send a signal and then receive it especially if its video because there is a lot data which is going to be coming in and its going to be slow. So to improve on user experience,one of the principle things you do as an engineer is to cache information. which means you pre-compute and store it in some place. Let’s say sacred games comes out in India You want to watch that, you put in in a cache. Now Netflix extended the concept and applied it to ISP’s. So what the ISP does is that when ever it gets a request from India, let’s say and its a movie which is from Bollywood, they won’t go and hit the Netflix U.S server just like that. They are going to be asking a cache which has been placed by netflix.This is called a Open Connect box. In this box, you are going to have a ton of movies. You can assume this to be something like a hard drive and if you find the movie here, that’s well and good you just return it quickly. So that’s a lot of bandwidth which was saved hitting the netflix server, that’s a lot time which was saved that’s much better user experience and also this is localized. So for India you can keep separate movies for Britain you can have different movies, for U.S you can have different movies. This is a brilliant concept because what you have done is reduced the load on not just you but also the ISP’s. So they really want to have these boxes. Every time you hit netflix and get a really quick response,you end up assuming that your ISP guy is a really nice guy. Its gone upto such an extent that around 90% of netflix trafic is taken care of by these ISP boxes that they provide. They are called open connect and this technology is revolutionary not so much who knows but youtube is also doing this. I think youtube red boxes come up with ISP again saving a lot of bandwidth for them and really improving user experience in a lot of places. And also of course you can keep all your local popular movies in this box. In that way the user’s here are going to be hitting this box far more often than they are going to be hitting this. Sometimes you do need some content change because something new has come up, a new series or a new movie in that case what you can do is, around 4 am in the night is a good time:The load on boxes is minimum. So you can have a lot of write operations being sent in from the U.S server, so it will suggest you what to copy. 1) You register your movie on netflix, 2) netflix processes them the same way that we talked about. 3) After it has been brought down to chunks 4) It sends them to your ISP or maybe it can directly send it over here and populate this box with these new movie chunks. That way this box has the latest content and the users are happy. So its the innovative menthods on the video processing and the video serving side which keep netflix running at scale.If you think about 90% of your requests are being taken care of by this box. So that is a superb gain and its a really innovative solution.We will be having a lot more videos like this which is system designing in the real world.This is the interesting bit and of course if you have any doubts or suggestions,you can leave them in the comments below. If you like the video then make sure to hit the like button and if you want notifications for further videos like this, hit the subscribe button I’ll see you next time πŸ™‚

Danny Hutson

100 thoughts on “NETFLIX System Design: How does Netflix onboard new content?

  1. Wow, excellent video. Love your unique style of breaking down the topics in chunks and explaining them neatly. Keep them coming.

  2. Great video ,Does local storage made at ISP's or Netflix replicated storage Data centers ? because ,if Netflix stores data at ISP's what gurantees them the security ?

  3. Nice video!! Video request about Design Online food ordering service like Uber eats and explain how to integrate it with existing Uber ride-sharing service.

  4. Have a look at http://highscalability.squarespace.com/blog/2017/12/11/netflix-what-happens-when-you-press-play.html
    and https://medium.com/netflix-techblog for more detailed, jargon..ed and time consuming version. πŸ˜‰
    Thanks Gaurav. Keep going (y) <3

  5. Gourav, thank you.
    Can you please make a video swiggy (example) payments and it handles different response like success, failed, or pending.

  6. I had a doubt if some one gets the access for open connect as it is a local cache then there is big potential problem right there. Any why we should require an open connect it means you have your data in netflix as well as in open connect. it means wastage of data same data is presemt at 2 places

  7. One question.
    There are several isp in india so does Netflix type of cache it in each and every isp's and is it safe because any since it's the cache of the isp's the contents can easily be confiscated or does Netflix takes care it the security by itself?

  8. Hi Gaurav!
    Neatly presented ideation of Netflix! I have a question about open connect as talked in your video.
    1. How movie will be inserted in open connect if it’s out of the box of interest of user? E.g. I want to see Korean series.
    A. What algo they use to prevent load for this kind of interest and how they figure out?
    B. Do they use ML for next chunk prediction or explicit programming?
    2. Which recommendation algo they use for predictions of favourites?

    Thanks πŸ™πŸ» and keep it up!

  9. Have been working in this arena for a while now, you got everything correct man, other stuff you post usually goes over my head because I haven't dabbled in a lot of those things but for once, felt nice to already know what you were gonna say. Haha.

  10. Hey gaurav nice video. Just one doubt. Isn't the client and open connect/netflix servers talk directly once ISP found the location of site ? I see you are speaking abt ISP taking load. Can you explain more on this

  11. Awesome explanation gaurav , even though i m not learning system design i learnt a lot about the way u taught the concepts and correlated to some of the concepts currently i have been studying , πŸ™‚ thnkx bro , loved ur 'follow ur passion' blog on wordpress πŸ™‚

  12. what is happening to you bro…you don't look good. You seems tired and working a lot..please do something about it.

  13. Netflix subscribers : We are the coolest people living on this planet ! We watch netflix and chill
    Netflix Engineers : Hold my(our) Beer !

  14. Can u suggest the best book for understanding and also learning System Design questions, which might also help in interviews

    Pls am in a great need for it….pls

  15. Thanks for the video Gaurav. What I knew was that Netflix has servers for different regions and not just in the US. Also will not the Open connect or the cache will ever get full? What about when many users are accessing at the same time, like on weekends?

  16. Let's talk about tik tok. Why It's so fuckingg fast even in low connection!! What engineering they use and how they manage all.

  17. What about the security of such cache boxes? Given that netflix thrives on content… Keeping these movies safe is paramount and the more you distribute the files the more risk you're at. I'm sure they have a way… Just wondering how different would it be to the usual security?

  18. Another informative video from you. Nice explanation. Thanks. Just wanted to add something on the caching part. Many websites take help of CDN providers like Akamai to do the caching on behalf of them. The CDN providers have the required infrastructure across the globe wherein they have placed their caching servers in most of the countries.

  19. Aren't those boxes which serve 90% of content are CDNs'? And how will Netflix decide which videos' should be stored in the boxes near to the region?

  20. How does that make Open Connect different from a CDN? What you end up achieving is more or less the same. (Maybe CDN won't lead to 90% localization of traffic due to some limitation!?)
    https://openconnect.netflix.com/en/

  21. I recommend reading this article:

    β€œHow Netflix works: the (hugely simplified) complex stuff that happens every time you hit Play” by Mayukh Nair https://link.medium.com/hOGSFphJFZ

  22. It's amazing to see how Guarav improves the quality of his content, and I can tell you, guys, as a newbie tech Youtuber, it's a big deal. Keep it up, bro! πŸ˜‰

  23. How ISP know what I am requesting to any website? Isn't it a encrypted request? ISP do know which website you want to visit but not what you want from a site?

  24. Where is this Open Connect placed and maintained like for say Indian users?…Like Uber does for updating it's db about uber rides…

  25. Hey Gaurav, loved your video. I had one suggestion, there is this technology called pixel shuffling, which is used to reconstruct the video playback. Can you dwelve deeper into how is that being done?
    Thanks a lot

  26. What is so revolutionary about Open Connect boxes? Its nothing but placing your localised servers close to the clients isn't it ? I will be shocked to know it had not been prevelant before Netflix did

  27. Can you explain some of the advanced concepts of video streaming too? For eg., Bandersnatch? I wonder how Netflix is providing conditional streaming of video chunks based on user selection.

  28. Another question, what if some users from India only watch hollywood and not bollywood? Does that cache is based on watched content(data driven) or they assume based on language/other params?

Leave a Reply

Your email address will not be published. Required fields are marked *