Google Associate Cloud Engineer – Choosing Database in Google Cloud Platform Part 2

  • By
  • August 17, 2023
0 Comment

8. Step 06a – Understanding Database Fundamentals – Choosing Databases

In a step, let’s talk about the different categories of databases. There are several categories of databases relational databases, document databases, key value databases, graph in memory databases. Even in relational databases, there are two kinds of databases OLTP and OLAP. Choosing the type of database for your use case is not easy. It is one of the most important decisions that you would need to make when it comes to choosing a database for your application. And also it is one of the most important decisions you need to make as far as the certification exam goes. So a few factors that you need to consider do you want a fixed schema? Do you want flexibility in defining and changing your schema? Do you want to go schema less or do you want a strict schema.

What level of transaction properties do you need? Do you need atomicity and do you need high consistency? What kind of latency do you want? Is it fine if the data comes back in seconds, milliseconds or microseconds? How many transactions do you expect? Are you expecting hundreds or thousands or millions of transactions per second? How much data will be stored in your database? Are we talking terabytes, petabytes or exabytes and a lot of other things. What we’ll do in the next few steps is to take a journey through some of the important categories of databases. I’ll see you in the next step. Welcome back. Let’s get started with relational databases. This was the only option until a decade back. This is the most popular or you can say the most unpopular type of databases.

When you are using a relational database, you have a predefined schema with tables and relationships. Over here you can see a high level structure of an example relational database. You have a table called Course in which you have a field called ID which is int. The fact that it’s in bold indicates that this is a primary key and a primary key is unique in a table. And you can efficiently query on the primary key. Other than that, you have a number of columns which are defined department Name, Instructor ID, Duration, Name and you are also defining what is the type of data that can be in that specific column. Department Name is a set of characters up to 255 characters. Instructor ID should be an int. Duration is an int. Name is a set of characters 255.

It’s a set of 55 characters. You are also defining a relationship. You are defining the relationship between Course and department. You are defining the relationship between Course and instructor and you are also defining the relationship between Course and student through a table. You are creating a table called Course Strength and defining the relationship between Course and student. So you have a predefined schema with tables and relationships when it comes to a relational database and it provides you with very strong transactional capabilities. What does that mean? It means you can make a change in multiple of these tables together and you can commit the change only when the update is successful to all the tables. So in a single transaction you can make updates to multiple tables.

You can rely on relational databases to ensure that it is atomic either all the changes succeed or none of the changes succeed so relational databases have very strong transactional capabilities. When do you go for relational databases? You would use them for online transaction processing use cases for example banking or you are developing a stock trading application over here transactions and the structure of data is very very important. The other kind of use cases are all app use cases online analytics processing use cases you would want to run a lot of standard queries on huge amount of data these are typically Kelly called OLAP online analytics processing use cases you can also use relational databases for these use cases. In this quick step we got introduced to relational databases let’s talk a little bit more about them in the next steps.

9. Step 07 – OLTP Relational Databases in Google Cloud – Cloud SQL and Cloud Spanne

Welcome back. In this step, let’s look at the relational databases in Google Cloud. Let’s start with the online transaction processing. Relational Databases in Google Cloud transaction processing applications are applications where large number of users make large number of small transactions. Let’s take a banking application. There will be a lot of small small transactions being made. I mean, I’m not talking in terms of the amount, the amounts might be large, but it would be transactions which should be impacting small amounts of data. So small data reads, updates and deletes typical use cases. Most traditional applications ERP, CRM, Ecommerce, banking applications, all these are OLTP use cases.

The popular databases in this space are MySQL, Oracle, SQL Server, et cetera. And the recommended Google managed service for this use case are Cloud SQL. Cloud, SQL Supports PostgreSQL MySQL and SQL Server. For regional relational databases, cloud SQL supports up to a few terabytes of data. So if you want a regional database up to a few terabytes of data, and you’d want to use one of the popular database options like Post the SQL MySQL and SQL Server, then you can go for Cloud SQL. The other option is Cloud Scanner. Cloud Spanner provides you with unlimited scale. You can scale to multiple petabytes of data. This is huge amount of data and this is really really huge.

When it comes to relational databases. Cloud Spanner also provides you with 99 point triple 9% availability, so it provides you very very high availability. And Cloud Spanner is recommended for global applications with horizontal scaling. So if you have a global application with users across the globe and you’d want unlimited scale for your relational database, you can go for Cloud Spanner. If you’d want one of the traditional databases like PostgreSQL MySQL and SQL Server and you’d want to host a regional database with data up to a few terabytes, then you can go for Cloud SQL. In this quick step, we looked at the relational database options for online transaction processing in Google Cloud. I’ll see you in the next step.

10. Step 08 – OLAP Relational Database in Google Cloud – BigQuery

Welcome back. In this step, let’s look at relational databases in Google Cloud for online analytics processing or OLAP. What is online analytics processing? Applications allowing users to analyze petabytes of data. You have a huge amount of data which might be coming in from multiple applications and you want to analyze the data. You want to query the data. Examples are reporting applications data warehouses, business intelligence applications, analytics systems all of these are good examples of online analytics processing use cases. Consider a sample application. You want to decide insurance premiums analyzing the data from, let’s say, the last hundred years.

The data might actually be consolidated from multiple transactional databases. And the recommended GCP managed service for these kind of scenarios is Big Query. Big Query is a peta byte scale distributed data warehouse. An important thing you might be wondering about is how are these databases different? Both OLAP and OLTP use cases. We are using relational databases. How are the relational databases different? OLAP and OLTP use similar data structures. We are using tables and relationships, but they are very different in how data is stored. OLAP databases use row storage. Each table row is stored together.

So if this is the table and this is the data which is present in the table, then this is how data is stored in OLTP databases. You are storing row by row because you would be changing one row at a time. And that’s the reason why OLTP databases use row storage. This is really efficient for processing small transactions. However, OLAP databases, on the other hand, use column or storage. Each table column is stored together. You can see how column storage is so the entire column is stored together. Next the next column. Next, the next column what is the advantage of storing columns together? You get really high compression because each column contains similar data.

If I have a city column, there are possible 100 values or 200 values, not more than that. And you can achieve high compression. So you can store petabytes of data efficiently if you are using columnar storage. The other advantage of OLAP databases is that you can distribute data much more easily. What you can do is you can create a cluster with multiple nodes and each of these columns can be stored in different nodes. And this allows you to execute a single query across multiple nodes. Whenever we talk about Online Analytics processing databases, the most important thing is to be able to run queries very very quickly.

And by distributing the data related to the same table to multiple nodes, you can distribute the processing across multiple nodes so complex queries can be executed efficiently. In this step, we started with talking about the recommended OLAP relational service in Google Cloud. The recommended service is BigQuery and we also looked at what is the difference between OLAP and OLTP databases? OLTP databases used storage. Olab databases use columnar storage. Olab databases allow you to execute very complex queries and distribute the execution across multiple nodes in a cluster. I’m sure you’re having a wonderful time and I’ll see you in the next.

11. Step 09 – NoSQL Databases in Google Cloud – Firestore, Datastore and BigTable

Welcome back now that we talked about relational databases, let’s switch to NoSQL databases it’s one of the new approaches actually not so new approach there have been no SQL databases for a few decades now to building your databases NoSQL means not only SQL most NoSQL databases provide you with flexible Schema. It’s not as rigid as a relational database structured data the way your application needs it. Let the Schema evolve with type. You can horizontally scale most NoSQL databases easily to petabytes of Data with millions of transactions per second. Whatever we are talking about right now is not 100% accurate.

Generalization, but typically a great starting point. Typical node SQL databases trade off strong consistency and SQL features to achieve scalability and high performance. Relational databases offer strong consistency and they provide SQL features. However, no SQL databases typically do not have as strong consistency, most of them do not support SQL and they trade this off to achieve scalability and high performance. And the Google managed services in NoSQL space are Cloud Firestore. It was earlier called Datastore and now it’s called Cloud Firestore and the other option is Cloud BigTable. Now how do you choose between them? When you go for cloud. Firestore when you go for cloud.

BigTable Cloud datastore is managed serverless NoSQL document database. It provides you with asset transactions, so it provides you with transactions, and it also provides you with SQL like queries and it also provides you with indexes. So if you have a transactional, mobile or web application, then you can go for cloud datastore. Firestore, which is the next version of Datastore, adds in a few more capabilities like strong consistency, and it also provides you with mobile and web client liabilities.So if you want to access a database from your mobile application, Firestore makes it easy. Cloud datastore and Firestore are recommended for small to medium databases. So zero to few terabytes.

On the other hand, Cloud BigTable is a managed scalable NoSQL wide column database. Important thing to remember is Cloud BigTable is not serverless. You need to first create an instance and then you can create your tables whenever you have a lot of data. Let’s say you have ten terabytes or more data to several petabytes of data. In those kind of situations, it is recommended to go with Cloud Big table if you have a large analytical or operational workload. Even in those kind of situations, you can go for cloud Big Table. Cloud Big Table is not recommended for transactional workloads. The reason why it is not recommended is because it does.

Not support multi row transactions. It only supports single row transactions if you have multiple rows across which you’d want atomicity that is not supported by BigTable. So if you have transactional mobile and web applications, you’d go for cloud. Datastore. If you have a medium or small database and you’d want a NoSQL database, you’d go for cloud datastore. If you’d want a huge database running into petabytes and you’d want NoSQL, you’d go for BigTable. If you have a large stream of analytical or operational data which is coming in, you can send it to cloud BigTable. In this quick step, we looked at some of the important options when it comes to no SQL databases in Google cloud. I’ll see you in the next step.

12. Step 10 – In memory Database in Google Cloud – Memorystore

Welcome back. This step let’s talk about inmemory databases. What are in memory databases, why do we need them, and what are the options that are provided in Google Cloud? Retrieving data from memory is much faster than retrieving data from disk. In memory databases like Redis deliver microsecond latency by storing persistent data in memory, and recommended GCP managed service for this is Memory Store. Memory Store provides you with two different options redis and memcache.

We’ll talk about it a little later. Fair enough. If you want to implement Caching or Session Management, or if you want to implement Gaming Leaderboards, or if you want to implement geospatial applications in these kind of scenarios, you’d want to go with Memory Store. When you’re using Memory Store, persistent data is stored in memory and therefore you can retrieve that much, much faster than retrieving it from disk. In this quick step we looked at in memory databases I’ll see you in the next step.

13. Step 11 – Databases in Google Cloud Platform – A Quick Review

Welcome back. In this step, let’s look at a summary of all the databases that we have discussed until now. Let’s get started with the Relational OLTP databases. We talked about two GCP services cloud SQL and Cloud spanner transactional use cases needing predefined schema and very strong transactional capabilities. These Relational OLTP databases use row storage you’d go to Cloud SQL if you’d want to use MySQL or PostgreSQL or SQL Server DBS. You can also use Cloud SQL if you have small amount of data like Terabytes. However, if you’d want unlimited scale and if you’d want 99. 39% availability and if you have users across the globe and you’d want horizontal scaling even for rights, then you’d go for Cloud spanner. BigQuery is the Relational OLAP database. Typical relational OAP databases use columnar storage.

These have predefined schema as well. Good examples are data warehousing and big data workloads. The next option is no SQL databases the options which are present in Google Cloud are Cloud Firestore. Earlier, this was called datastore and Cloud BigTable. You’d go for no SQL databases when your apps need quickly evolving structure. Basically you’re talking about schema less databases. You’d go for Cloud Firestore if you’d want a serverless transactional database which can support both mobile and web apps. This is recommended for small to medium databases up to a few terabytes. Cloud BigTable is recommended for large databases like Ten terabytes running into Petabytes it is recommended for streaming analytical and operational workloads.

It is not recommended for transactional workloads and it is important to remember that it is not serverless. You need to provision a server and then you’d be able to create your databases when you’re using Cloud BigTable the last option that we talked about are inmemory databases or caches. A good example is cloud memory. Memory store. You’d go for Cloud Memory Store when your applications need microsecond responses. You can put Cloud Memory Store in front of any of the database and improve the performance of the database. I’m sure you had an interesting time looking at all the databases in this specific section.

14. Step 12 – Databases in Google Cloud Platform – Scenarios

Welcome back. Instead, let’s review some of the important scenarios related to databases. A startup with quickly evolving schema that’s basically a quickly evolving table structure. What is the solution that is recommended? You want to go for cloud datastore or Firestore? If your data is growing very very fast, then you might even think about Cloud. BigTable but to start off, Cloud Datastore and Firestore are more than sufficient nonrelation database with less storage. You can go with Cloud Datastore or Firestore transactional global database with predefined schema needing to process millions of transactions per second. Important things in here are global database. So you need a global database, it is transactional and you need a predefined schema. You need to be able to process up to millions of transactions per second.

If you want to process millions of transactions per second, or if you do need a global database, Cloud SQL will not be able to support that. In those kind of scenarios you need to go for Cloud spanner. So the recommended database is Cloud spanner transactional local database. So in a specific region processing thousands of transactions per second. In this kind of scenario you can go for Cloud SQL cache data for a web application. So you have data coming in from the database and you want to cache it. Memory store database for analytics processing of petabytes of data. You have a lot of data in a data warehouse and you’d want to do analytics processing on top of it. You’d want to go for BigQuery database was storing huge volumes of stream data from IoT devices.

If there is a lot of analytical data or stream data that is coming in, the best service is Cloud. BigTable the last thing that we want to talk about is a database for storing huge streams of time series data. Time series data is nothing, but let’s say you have logs or if you are tracking user actions on a website, each of these log entries are associated with the timestamp. Any data which is associated with the timestamp is typically called a time series data. And whenever we talk about time series data, typically there is also huge volumes and in those kind of scenarios you can go for again big table. In this step we looked at some of the important scenarios with respect to databases. As far as Google Cloud is concerned. I’m sure you’re having an interesting time and I’ll see you in the next step.

Comments
* The most recent comment are at the top

Interesting posts

The Growing Demand for IT Certifications in the Fintech Industry

The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »

CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared

In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »

The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?

If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »

SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification

As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »

CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?

The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »

Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?

The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »

img