Uses of ware house and its scale in era of today | Importance of ware house
The most important property of data warehouse so if you guys had any problem the previous slide then I'm pretty sure this is something you have really understood after this slide because this is probably the epitome of for data warehouse right these three properties if you understand these four properties then you are pretty much ready to understand the next part the next or you are ready to go to the next step and data warehousing right so finally you should have understood how important all this is why I know how important business intelligence is and what can improve the data warehouse place you can just think of how among us our deal a data warehouse is correct yeah so a couple of people are satisfied with that so in order to spoken about the properties let's go to the next slide and okay now we have to talk about key terminology okay so right now we've understood data warehousing from a higher level okay now let's dig deep let's go to mode basics okay let's understand the key terminology that are all related and that are involved in a data warehousing so first of all we have LTP and OLAP okay now there are four things I will talk about the differences between OLTP and Ola okay then I'll talk about UDL I'll talk about data Mart and then finally about metadata okay so let me go to the first topic that is Ltd versus OLAP okay so in this part which is l DB over DT stands for online transaction processing okay now this is something that is the representation of data fir database if you're running any kind of queries on your database then that's called online transaction processing okay and then o is last stands for online analytical processing and this is the property of a data warehouse so any kind of query or any kind of analysis that you run on your data warehouse that's called as an OLAP activity correct so let's go to the differences between the two so first of all any data that is stored in a relational database right in an which involves OLTP that contains the current data as well as past data okay current data as well as path data but with respect to data warehouse and while performing an OLAP you will be dealing with only historical data here okay it contains only historical data and the data that will be stored in your database okay when you use the OLTP then those queries will be useful and running your business okay when you have to run your business like if you want to store
the data of the number of sales that has happened today like every time a field happens then your records in your database as we updated right so that's what we say so when you update your record with the latest details of your customer then that is what is the meaning of useful and running the business okay but in order for something that is usually analyzing the business so here the kind of activities that you would do is that of finding out details like at what time how many customers bought the products or at what time which customer but which all products all these kind of questions will be answered with as both hola okay and then the whole OLTP model of accessing data of accessing or querying data on a database is based on the entity relationship model okay but whereas with the data warehouse it's based on the star schema or the snowflake schema and the fact constellation schema okay so it's called also called as the Gatsby schema so
all these three things will come into picture okay and then your relational database it provides a primitive and a highly detailed data so since you'll have one database if you run one kind of a query like a select star from this particular table then it would give you all the details that are stored right so you cannot filter too many details with respect to the data that is stored in a database okay of course you can but the level of you know the filtering and analysis that you can do is not that much so the reader that you will get back from your query write the result that you will get back from a query will be highly detailed okay and it will not be exactly what you want so it will not be that accurate but whereas with pull up when you do an OLAP on a data warehouse it provides summaries and the consolidated data Never land point to you exactly what you want to look for correct so it will it's a very processed data and it points to one particular aspect which matters the most so that's what this is and then the oil TB you use this for writing data into the database okay so ever like I said the same sales example whenever a new sales happens your database has to be updated right with the new records of the product sold of the customer who bought the product and all these things so you basically use it for writing data into the database
but your data barrows is primarily used for reading data from the data warehouse so right into the data warehouse or something that is done so that you can do the reading from the data warehouse ok the primary concept here is to read the data from the data warehouse and to do the analysis and all the visualization actually but with the database it's more of writing the data into the database alright and the size so speaking of the size a databases size would raise anywhere between hundred me to 1gb ok and this is also a very big number 100 MB is a very less number I would say and one tip is also very less so it would typically be much more than this but come back to this range if you look at a data warehouse a data warehouse is ranges from 100 GB to 1 TB correct so your data wells will have all the historic Durant and it will have all the relationships between the different data right such that so you can do your analysis straight away so since it makes all the data more efficient and stuff the data here the size ranges from 100 GB to almost 1 P so that is what our data warehouses and that's the part of data warehouse all right and I can actually show you the difference between the two in today's demo session ok later during this session I'll show you that the size of the source file that I have added with the different sizes of the two files that are they using as the source and then after the UH processed the data are all sorted in my data warehouse I'll show you what is side of that data so there will be a big difference between the two okay and the red arrows will be modes in the database
I will show you that aspect later all right so and that's about this point and then yeah of course this will be fast ok database is fast and it provides high performance alright so your data where owes of course it's not as fast as your database but it's how you sexy it's highly flexible because it gives you different views so you have something called as the OLAP cube right so using the OLAP cube you can get the you can look at insights from different angles different perspectives and different views of data we get so that is the big advantage sure okay and the number of cycles that is accessed it is intense but whereas with the data warehouse the number of records access is in millions all right an example of this can be all the bank transactions made by a particular customer so if there is one customer and whatever transactions he's made you will get all those or you know all those videos right
so supposing take the example of any email statement that you request for any email statement or a bank account statement all these things so they are an example of OLTP so whatever radials are present in the database those will be given to you but whereas the bank transactions made by your customer at a particular time okay this is a most filtered query and the answer also will be a very accurate and point Q to exactly that particular question so that's what an OLAP is so it's normal to be one overview here is one exactly point you to what you want it's not very much in detail but it's more accurate correct so that's what an app is and that's
the difference between LTP and OLAP okay so there are two strategies that are used for you know accessing data well DB is used for accessing data in a database and we'll appreduce for accessing data in a data warehouse alright so I hope I mean this is cured everyone all right guys okay if you have any doubts then that will be cleared by this slide okay because the examples are there for both these strategies so if you want more examples of an OLTP then one would be that of a supermarket server which records every single product purchase at that market okay so every single perk in their history or probably in the last
one month all these things can be accessed using your OLTP okay from your database so you don't have options to do much of filtering here and then another example that of a bank server which records every time a transaction is made for a particular account every time or transaction is made in a particular account the data will be updated in that table and you can create that kind of data and you'll get that result okay
another example is our railway reservation server which records the transaction of a so whatever equities that passenger does so all these things will be recorded by the reservation server right so this is one example but when you look at the OLAP there will be much more detail the query juror will be much more detail and the answers will be much more accurate and very crisp okay so
an example is a data from bank manager wants to know how many customers are utilizing the ATM of his branch okay because maybe based on that he may take a call whether to continue that idiom or relocate that idiom to a different place right so this guy this bank manager would want to one of us for understand if there's any use and having the ATM in that place are people using it correct if people are using it then how many people are using it at what time are they using it or would it be better to have the ATM in a different location where it's much more easier for people to access it so all these kind of questions will be answered only if you have the oil up in place okay only when you are a data warehouse in this not with the database
so another example is that of an insurance company that wants to know the number of policies each agent has sold okay this will happen a better performance management of agents so you'll have multiple agents in your company and you'd want to know their performance right you know you would want to know who is the best performing agent you would want to know how they're performing why is this person performing better you can you know optimize each of their performances so all these things can be done with the help of data barrows and they can be done with help of Allah right the OLAP strategy the OLAP activity that is done on a data warehouse so that is a difference between the two right so I hope you're getting the difference as your guys right OAP and OLAP okay since it's all clear let me go to the next slide now
the second important terminology that we have with data warehouse is that of ETL extract transform and load so by going by the definition usual is the process of extracting the data from various sources transforming this data to meet your requirements and then loading the data into a target data warehouse okay you are exciting the data from here right you're extracting it and then you're transforming it into the way you want in a more readable fashion in a more relatable fashion then once that is done you load the data into a data warehouse and the whole process of for getting rid of from your data source to your data warehouse this done by the EDL the activities of extraction extraction transformation and loading so we have got popular tools for for this very process so you have tools like talent informatica you have open all these things and informatics and talent are probably the most popular tools for this process for extraction transformation and loading data into a data warehouse right so this is something that usually understood by now and it out guys because I don't want to waste much time
I want to go to the next slide and teach the next concept ok great so the next one is data Mart okay now if you've understood so far to LA ETL then half your job is done because data Mart is something that's close to a data warehouse and you don't have that much of a difference when it compared it to our data warehouse but the basic difference between the two is the data Mart is just the same data warehouse itself but a smaller version ok so let's look at the line the definition here the data Mart is a smaller version of the data warehouse which deals with a single subject ok the data Mart's are focused on one area hence they draw the data from limited number of sources and the time taken to build the data match is very less compared to the time taken to build a data warehouse now to give you
this probably brings more advantages right so that's the data Mart and that's why I told you that you know if you understood so far till then half your job is done but data Mart or something like that extends the functionality of your data warehouse right so that is the thing and stinking off the differences that is there in this table the data warehouse will store the enterprise-wide data right the enterprise of the enterprise wide data whereas data much will store it departments wise data there'll be multiple departments in the whole company in the enterprise and they will sort department-wide data and then the data warehouse
it will have multiple subject areas okay but a data Mart will have a single subject area there'll be multiple data sources in case of a data warehouse okay but in case of data Mart's they'll be limited data source in fact they can be just one data where all right this will act as a source pure data match but however we have gossiped limited your because not always do you need a data warehouse there are also instances where your data source itself as an input data model okay now that is something that you will answer next slide okay so just don't get transfused when we see that limited resources it does not mean this is the only source they can also come from a data a property a source like a flat file or from a data base and all these things all right and then
a data warehouse it occupies largely this of course because there is a lot of data on to price where data will be stored your multiple subject areas will be dead because of that there's larger data here at state and that's why there is greater memory that's occupied okay but in case of your data mod it occupies limited memory because it's very crisp and limited to only a particular department okay and then the other thing is data where else is long it takes longer time to implement
but a data Mart is a takes very short time to implement it is once you have all your data warehouse or stuff in place you can easily divide them by creating different data mass okay so be a tough part is your data well so once you've got a data warehouse ordered you can easily from your data mass in fact there is even the other way there are also practices where you first bring your data mass and once you've done that after that you create one single repository and that's when you create a data warehouse so there's also two approaches here one is the top/bottom approach and the other one is bottom-up approach so those are the two things and I'll go to details about
these two approaches in my next session ok of course I can't do it today because we have very limited time all right so moving on to the next slide ok now speaking of the different types of data mass ok this is what I was talking about in the previous slide so you have three different types one is a dependent data Mart the other one is independent data Mart the third one is hybrid data mod you are dependent data matters are the data's first extracted from the OLTP systems and then populated in the central data barrows and then from this data warehouse the Ritter travel to the data
Mart so look at this example ok this is the standard practice or the year the default approach where you have an OLTP source then you get the data into a data warehouse and then from the data warehouse you form a data model ok we'll have multiple data mass where each different Mart will have particular data from the entire data warehouse ok this is the first regular approach and then you have the independent data Mart which is a slight variation compared to dependent so here the rears directly received from the system okay you don't have a data barrels in place over here that is what this line means and this is suitable for small organizations or smaller groups within an organization so basically an organization which is very small it might not need to go to the trouble of creating a data warehousing stuff so you can just have an OLTP source and from there they can just get the data in onto a data Mart and they can use for various purposes okay that's what we have an independent data Mart for so that's the difference it just does not involve a major data warehouse so directly the data goes to a data Mart in fact you'll have multiple data Mart over your you'll have great or not one data amount to one stuff okay which they coming directly from the OLTP source and then you have a third type which is the hybrid
data Mart so by definition you might know what this is right by your the name of by the name itself is pretty obvious it's combination of these two the data here is fed from both the OLTP systems as well as the data warehouse okay so look at this example for that tour instance you three are here it's coming from the ODB source as well as from a data warehouse so this is what the hybrid data models and depending on your company depending on the size of your company the requirements of your company or your organization you can choose one of these you can model your entire database and data barrows in any one of these models either the dependent or the independent data mass or hybrid Radames alright so that's about the different types of data model so moving on to the next slide we have something called as metadata now people here from programming background or from the technology background you might all be aware of what America does metadata basically is defined as data about data okay it contains data about where your actual data stored supposing you have your raw data right where is that data stored
what is the size of that data so these are the answers to these kind of questions is what will be Plus on your metadata your metadata will contain the location of your actual data it is not n the size of your actual data it will contain details like which was the source it came from and when was it created all these details will be stored in your metadata right so that's what made our IRA's so that's how different our metadata is from regular data and a metadata is specifically in a data warehouse it defines the Soraa there is a flat file a tional database and other objects so the reason that we give so much importance to metadata in a data warehouse is because take
the example of any company that's having a 24/7 business okay they have a rolling system that works throughout the clock the 24/7 they are sales coming in data will be going into your database okay now in this case you cannot always you know keep adding data into your data warehouse because you know that data warehouse is not real-time correct so you have to manually update your data into your data warehouse probably at every day at a particular time maybe at 6 o'clock every day in the morning or maybe once in the night at 10 o'clock or at basically at regular intervals you've got to so data into your data warehouse and this whole strategy becomes difficult because you have to do this process every day so every day when a guy comes in in the morning he has to look at the new data that has come into your data database correct and then from that data base which is data source he has to add that adagio data warehouse now this process becomes difficult that is where a metadata comes in handy with metadata you define the address where your data sources okay you define the address of your raw of that file from my data's coming in or your relational data from measure data is coming in and then you can also impact us or the metadata of your raw data warehouse where you want the data to go so it basically Ellen you know see a lot of your time okay
and this is the most common and the most you know unspoken fact about data warehouse so any professional that's dealing with data barrows you would always be using metadata because it saves a lot of time because every time you cannot be importing data from a database you always have to get it from your metadata by defining rules and defining your source and your targets correct so mirror it has gives lot of your time and this is something I will show you a demonstration on ok and when I show demonstration you will understand how easy it is so we just read the rule says that you have your source over here and then you have your destination over your target over here and once you've written this thing you don't have to go back here so every day it might pick up the data from here and it might move your data warehouse and all these things it will update all the new details which are present in your database and it will add it into your data warehouse so that's what a metadata does
it's a very big advantage and it's one of the best Sergey's are being okay and then the final point here is metadata is used to define which table is source and target and which concept is used to build a business logic called transformation to the acts alone but yeah this is what I told you right so it's used to define which is your source and target and how to build your business logic called transmission so transmissions basically your rob act of converting your source data into the form that you want to and what is the logic that you use correct so all the different the filter criteria all the transformational criteria's all these things can be also done using the metadata correct so even the process of forgetting data from your source your destination it can involve extra additional steps which can save your time and default process basically so every time you have data in a particular format you might want to store
it in a different format and for all that purpose you can use a metadata right so since things will already be defined you're the work of metadata is just to get the data from the source define and do the set of activities that is required and which is already defined it will perform all those activities and it will sort in the case where you want to do so that is the role of a metadata correct so many data is very important very highly used and it's actually the most important or let's say the epitome of data barrows okay this is probably the best thing that can happen to data warehouse right so that is the thing about the major a degrees now going to the next slide we have the architecture so now that you know everything okay
these four raw terminologies are in our removing enough for you to understand the architecture okay so let's understand what the architecture is so this is the entire architecture diagram okay so this is what you know data comes in from various sources it can come from either a database or in the from flat file and then that data an action of ETL will be performed on that data and it will go to the staging area okay this called the sitting area and this is the staging database and the rails are stored over here it is temporary data before data completely moves to the data warehouse it will be present in this area okay and that is done by using the act of ETL and also between moving to the data warehouse the process of idle continues so ETL process starts over here and it ends over here okay and between the conversion it is a temporarily stored in a staging area and this is most often present inside the ETL Coolidge okay
like your talent order semantics and all these things and then this data will be stored in your data warehouse so whatever is the extracted transformed unloaded it will be loaded into your data warehouse and in your data barrows you'll have metadata okay and of course you'll have your raw data and then you will have your aggregate data okay and this is the reason why a data warehouse is generally you know it's a larger in size because it has not just raw data or data base which from where the data is coming in it will only have your raw data okay but your data warehouse will have additional stuff here it will of course your metadata and your aggregate data and together all these three things together is what helping you being you you doing your analysis sooner okay this is what powers you're ready to do OLAP online analytical processing okay so that's what a data warehouse is and that's what it stores and there's this data warehouse you can either your entire company can use the data warehouse or if you want more for security based access then you can divide it into a different data mass where your sales team and your different teams like here we have a purchase table and then you have a stock table so
these are like three different maths data mass for three tables like a sales purchasing stock so you'll have different tables here and different teams can access different set of tables okay your purchase can be something that's used by your operations people right and then this may be something that's used by your raw sales people and your stock may be again used by your operations or your sales group so that's what each of these your you know defines you have your user group one user group two and user group three and each group will be getting access to different parts of your raw data warehouse because your data valves will be divided into different data Mart's and your data your different groups we could access only that data which they want to or which they can get access to right so this way no group can get access to every data that is personal data warehouse and there's advanced there's a little more data security in this is okay same thing over here so this group get action into this data and this data and this table or this mod right and then this user group gets accessed only the Riddler is Belgian they smart and then this Mart so that's what the entire architecture looks like and resolve the data flow is right so if you guys have understood
this much then you're ready to get a demonstration correct guys and this is also my last topic in this presentation so right now I can go to my demonstration and what I'll be doing in my demonstration is I'll be using talent okay so I'll be x importing data from my database I'll be using be getting it from my Oracle database and I'll be showing that into a data warehouse which can be ready for any kind of analysis or visualization on any other visualization tool okay so this act over here which I'm going to show you this is what powers your business intelligence right so are you guys ready for the demonstration any doubts your guys okay great Ronnie says he's already he's all pumped up already Ronnie that's very good so Jacobs also ready and so is Rajesh are you Rajesh can I get a yes from you yes very good olive oil Rajesh
so let's go to the next part of my session and there's going to be demonstration where I will populate or detail barrows okay I'll be using talent for the data warehouse activities and let's see how we can import data from waste data sources and create or data barrows so for our hands-on session I'll be importing data into my talent bi okay like I said earlier we love using my talent bi for night data warehousing and ETL processes and I will create a data warehouse out of using talent bi okay now the data set that I'll be using is that of per 10,000 row table and 3,000 row table okay so the there be one table having 10,000 customer details okay and then there'll be another table having 50,000 rows of transactions which each of those customers make okay now based on this data we have to find those customers who have the lowest number of purchases okay so right now my dataset is present in my Oracle database okay and so I have two tables and both my tables customers table and my transactions table is present in my Oracle database
I also have them in my here excel so first let me show you how they are present in my excel file okay okay so this is my freshmen table as you can see we have all these days right we have all these fields here so we have customer ID we have customer name we have the address of that customer city the country is from the contact number and his email address right and if I do this then you can see that there are almost 10,000 cutoff fields here so 10,000 rows are there here so this means that you know I have it's a pretty big data set and I can use this in competition with my other table okay so here we have customer ID as the primary key okay and if I go back to my other data set that's my transactions table you can see that I have furthermore freeze your Abdul for invoice number of a stock code I have description app quantity I have invoice date unit price customer ID and Product ID okay so there's a customer ID over here also so the customer ID is the primary key for there and this is the foreign key over here so basically the customer ID is the same and I will have to do a lookup to that table with the help of my customer ID okay
since these are two different tables I can create my data warehouse with the help of this particular primary and foreign key concept I can create a link and I can create I can probably just link these two tables with the primary key for a key concept okay so any doubt guys I'm sure that you all know what a primary key and a foreign key is and I fed all anybody has a problem please let me know okay so Rodney says I don't know what's a foreign key okay so see Rodney the thing is we have something called as the primary key and a foreign key okay and we use these two columns when we want or when you want to use a combination of two different tables right so if you would consider the example of this one then you can see that there are a number of columns here and not everything is matching with the radius in that put in the other table okay so only customer ID is the only common field okay so there's a customer ID here and even in this table we have a customer ID now the thing is the customer ID is a primary key because it's it's constant and it's unique for each and every single record over here so each and every customer here will have a separate customer ID okay so no two customers will have the same customer ID okay and over here the transactions here which has the inverse numbers the description of the product data spot is the stock code the quantity all these things are sorted by the customer writing okay so if you find a particular customer ID more than once then well here you can find the person already more than once that's because a customer would have made more than one transactions.