Google Associate Cloud Engineer – Exploring Database in Google Cloud Platform Part 3
10. Step 05a – Demo – Playing with Firestore
Let’s talk about Cloud BigTable. Cloud BigTable is a petabyte scale wide column, no SQL database. Important thing to remember is it’s HBase API compatible? HBase is an open source database. One of the things Google Cloud does is it provides its customers with open source options. So if you are using Cloud Big table and you don’t really want to be in the cloud and you’d want to move to an onpremised solution at a later point in time, you can install HBase and move from Cloud BigTable to HBase. And that’s the reason why you would see that most of the Google Cloud services would be compatible with some of the other open source options. Cloud BigTable is designed for huge volumes of analytical and operational data.
If you have a lot of streams of data coming in, BigTable is the right solution. IoT streams, analytics, time series data and things like that. You can handle millions of read or write transactions per second at very low latency. Important thing to remember is Cloud BigTable supports single row transactions. Only multi row transactions are not supported and that is the reason why Cloud BigTable is not a good candidate for transactional applications. Cloud Big Table is not serverless in the sense that you need to first create a server instance and then you need to actually go ahead and create your tables. When you are creating a server instance, you can choose either solid state drives or hard disk drives.
You can go for a solid state drive if you need high performance. Cloud BigTable can scale horizontally with multiple nodes. So you can add multiple nodes into a Cloud BigTable cluster and Cloud BigTable can automatically do cluster resizing without any downtime. With Cloud BigTable. You cannot export data using Cloud console or Gcloud. The options that are present are either use a Java application so there is a Jar which is provided, which is a Java application, and you can use that to do an export. So you can say Java Hyphen Jar, the path to the jar and the commands which are present are export or import. Or you can use the HBase commands. We talked about the fact that Cloud BigTable is HBase API compatible, so you can use HBase commands to export data from Cloud BigTable as well.
Another important thing to remember about BigTable is the fact that you cannot use Gcloud to interact with Big Table. The command line tool to interact with BigTable is CBT. So it’s CBT which is present in here. So if you want to create a table, it’s CBT create table my table. Let’s quickly discuss the structure in which data is stored in a big table. Big Table is a wide column database. You can see that in each of the tables you can have column families, so you have a column family one, column family two, column family three. These column families might have different columns in each of them and for each of the row you can store values for each of these column values. What we are looking is a simplified diagram of the cloud big table.
The actual white column database can get really really complex and it can even be looked at like a 3D database. At the most basic level, each table is a sorted key map value. So each value in the row is indexed using a key which is a row key. Related columns can be grouped into column families. Each column is identified by using column family, colon, column qualifier or the name. So if you’d want this column column one, then you need to say column family one colon column one. A lot of research went into creating big table at Google and this structure is a result of that research.
The great advantage is that it is scalable to petabytes of data with millisecond responses up to millions of transactions per second. It is used for a variety of streaming kind of use cases, IoT streams, graph data, real time analytics, so time series data, financial data, transaction histories, stock prices any place you have huge volumes of time series data which is flowing in cloud BigTable is a great option. You can use cloud data flow to export data from BigTable to cloud storage. If you want to move data from BigTable to somewhere you can actually export first to cloud storage using cloud dataflow. In this step we talked about cloud BigTable. Cloud BigTable is recommended for huge volumes of streaming data. I’ll see you in the next step.
11. Step 06 – Getting started with Cloud BigTable
Come back. Next up, let’s talk about Memory Store memory Store is an in memory data store service why do you store data in memory? You can reduce access times memory store is fully managed provisioning, replication, failover and patching all of them are taken care by the manager memory Store memory store provides you with high availability with 99% availability SLA you can also monitor it easily using cloud monitoring memorystore supports two options one is Redis, the other one is Memcached. If you want pure caching, only caching then you can go for Mem cached if you have reference data or if you are Caching database queries or if you are having a session store in those use cases, memched is recommended however.
If you want persistence and very high availability with low latency access then redis is recommended. If you want to build gaming leaderboards, or if you want to build player profiles, or if you want to perform in memory stream processing, redis is a good candidate for all those use cases. In this quick step, we looked at Memory Store important thing to remember about Memory Store is the fact that it supports Redis n Memcached as well. Go for Memcached if you’re looking for pure Caching, go for Redis if you’re looking for persistence as well I’m sure you’re having a wonderful time and I’ll see you on the next step.
12. Step 07 – Getting started with Memorystore
Back in this step. Let’s look at Memory store. I’ve just typed in Memory and picked up Memory Store from the list of items that came up. And this is Memory store for you. You can see two options which are present in here. You can either create a redis or a Mem cached cluster. If you want to use either redis or memcache, the first thing that you need to do is to enable the APIs. Over here. I’m enabling the memcache API’s. If I actually go back, I’m in able to go back to redis because previously I had already enabled the redis APIs. Now over here I can create the instance. Cloud Memory store for Redis is a fully managed Redis service for Google Cloud platform.
We are not really going to create a redis instance right now. All that we’ll do is take a look at what are the things that you need to configure. So you need to configure an instance ID and you can configure the tire whether you want Basic or Standard. Basic does not provide you with High Availability. However Standard provides you with high availability. It provides you with a failover replica in a separate zone. The next thing you get to choose, as usual, is the region and the zone. You can also configure a capacity. How many nodes do you want in the cluster? So I can say two nodes or three nodes.
How many nodes I would want in the specific cluster? You can choose the redis version and if you want to customize the redis configuration, you can do that in add configuration. And once you are ready, you can go ahead and create a cluster. As you can see in here, creating a redis cluster is very very easy. Similar to that, creating a memcache cluster is also very easy. So you can go into Memcache, let’s go and enable the APIs. The API is now enabled and I can create an instance similar to redis. You’d need to give an instance name, you need to choose the region and the zone, and you can configure the number of nodes in the cluster. If you have just one node, if it fails, you lose all the data.
If you have ten nodes, then even if one of the nodes fails, then you’d lose only some of the data which is present. In here, you can configure how many nodes you’d want. Let’s say I would want ten nodes and you can configure how much memory per node do you want. So I can say I would want, let’s say ten GB per node. And you can also specify how many vCPUs you would want per core. Let’s say I would want six vCPUs. As you can see in here, whatever I am asking for is a very expensive thing. I’m not really going create this cluster. But the important thing that you need to observe is the fact that creating memcache already is clusters. Creating your memory Store clusters is very, very easy. I’m sure you’re having a wonderful time, and I’ll see you in the next step.
13. Step 07a – Demo – Playing with Memorystore
Welcome back. In this step, let’s talk about BigQuery, which is a data warehouse in Google Cloud Exabytescale modern data warehousing solution from GCP. It’s a relational database, so it supports SQL. You can create your predefined schema and it ensures that the data is consistent. You can use SQLite commands to query massive data sets. Whenever we talk about a data warehousing solution, it’s very very important that you would be able to query from it efficiently. There are huge volumes of data in the Data Warehousing solution and the data is organized in BigQuery into data sets. Inside data sets, there might be multiple tables and you can use SQL like commands to query these data sets.
One of the most important factors around BigQuery is the fact that it offers traditional and modern approaches. The traditional approach is to have a lot of storage and have a lot of compute and use this to run BigQuery. BigQuery also offers modern features. It’s real time and serverless as well. You can bring in real time data into BigQuery and BigQuery uses Serverless approach. So BigQuery provides you a blend of the traditional and the modern approaches to data warehousing. Whenever we talk about data warehouse, a very important factor is importing and exporting data. Another important factor is what are the formats that are supported to import and export data.
For BigQuery, you can load data from a variety of sources, including streaming data. That’s the real time aspect that we were talking about in here. It supports a variety of formats. CSV, JSON, Avro, Parquet, ORC. Or. You can also load from Data Store backups. You can also export data from BigQuery to Cloud storage. And also you can visualize the data which is present in BigQuery using Data Studio. If you want long term storage of data like Archiving for example, then you can send it to Cloud Storage. If you want to visualize the data, then you can send it to Data Studio and use Data Studio for visualization. The typical formats which are supported are CSV, JSON and Avro. CSV and JSON are supported with gzp compression.
Compressing data is also very very important because when we are talking about huge volumes of data, you don’t want to send it as is, you don’t want to compress it and then store it to cloud storage. So you can use either formats like CSV or JSON with GGP compression, or you can use formats like AVO. And this supports two kinds of compressions deflate and snappy compression. You can also automatically expire data in BigQuery so you can configure that some data in the table should be expiring after a few days, after a year. This feature is called Configurable table expiration. So if you have streams of data coming in and you don’t really want to use it after a year, then you can configure expiration on that table to one year. All the data in that table would be automatically expired at the end of one year.
In addition to querying on data which is stored inside BigQuery, BigQuery also allows you to query external data sources without having to store data in BigQuery. So you can store data in cloud storage. Cloud SQL, BigTable and Google Drive. And you can directly query from BigQuery. To be able to do that, you can use permanent or temporary external tables. What we’d be creating is external tables connecting to these storage devices, and that would allow us to query from BigQuery. Now, the important thing to remember whenever it comes to BigQuery is it is all about querying. You’d want to run Big queries. By that I mean queries which process huge volumes of data. BigQuery mixes both the traditional and the modern approaches.
So you can get huge volumes of data into BigQuery using multiple approaches. You can load from other databases you can even stream data in. And you can run Huge queries efficiently in BigQuery. You can query on the data which is present in BigQuery as well as you can also query data which is present in Cloud Storage, cloud SQL, Big Table, and Google Drive using permanent or temporary external tables. Now let’s look at how you can actually access and query data in BigQuery. You can access databases using Cloud Console. You can use the BQ command line tool. Important to remember this is not Gcloud. This is BQ. Command line tool. You can also use BigQuery Rest API, or you can also use Hbased API based libraries.
Remember that BigQuery queries can be really really expensive. You’d be running them on large data sets and therefore they can get really really expensive. So the best practice is to estimate the BigQuery queries before running. There are multiple options to estimate BigQuery queries. Number one is you can use the UI, that’s the console. Or you can also use the Dry Run option which is present in BQ commands. So BigQuery BQ Command Line Tool provides you a Hyphen Hyphen Dry run option where you can get the amount of data that will be scanned for a query. This is just an estimate.This is not the accurate value. This is an estimate of how much data volume will be scanned when you run the query.
Important thing to remember is you will not pay for the amount of data which would be written by the query. You will actually pay for the amount of data in the database which is scanned by the query. So if you are scanning a lot of data, you’ll pay a lot. So the Hyphen Hyphen ryan returns how much data will be scanned by a specific query? And you can use the pricing calculator to find out the price of scanning one MB data. Pricing Calculator is an online tool. You can go there and you can check the price for scanning one MB data. And you can use the price to calculate the cost of scanning the amount of volume of data that you are planning to scan with the specific query. Instead, we talk about BigQuery. Whenever we run queries in BigQuery, it is very important to estimate them before you run them. I’m sure you’re having a wonderful time and I’ll see you in the next.
14. Step 08 – Getting started with BigQuery
Welcome back. In this quick step, let’s look at the different options around playing with databases. From the command line, let’s take a few example databases and see what is involved in playing with them. From the command line, let’s start with cloud SQL. Cloud SQL is g cloud so the command is g cloud SQL. So any command that is related to cloud SQL will start with gcloud SQL. So if you want to create an instance so whenever you’re creating a Cloud SQL database, first you’d need to create an instance, then you would need to create a database. So the way you can create an instance is this way. So instances create or delete all describe. If you want to make a clone, if you want to create a copy of the database, then you can go for a clone. If you want to update the software on a specific instance, you can go for patch.
So g, cloud SQL instances create a specific instance, or you can say I would want to patch, and you can set the start time similar to instances you can play with databases. So inside an instance you would want to create multiple databases. So you can say create, delete, describe, list and patch. So create a specific database in a specific instance. Once we create a database, you’d want to be able to connect to it. And the way you can connect to a database is saying gcloud, SQL, connect, specifying the instance and you can also specify the database to connect to. So you can say this is database I would want to connect to, and this is the user I would want to make use of. You could also create backups. So you can say create backups, describe a specific backup, or you can say list backups.
So g Cloud SQL, backups create I want to run it in a sync mode, I don’t want to wait for the backup to be created, just return back without executing it completely. And I can say this is the instance I would want to back up. This would create a onetime backup. So Cloud SQL is all about Gcloud, gcloud SQL and this specific thing you would want to do instances, databases, or you want to connect to it, or you want to create backups. BigQuery BigQuery does not make use of Gcloud in BigQuery, you’d make use of BQ. Let’s pick up one of these commands and see this in action. So I’ll pick this command and execute it. Let’s go to Cloud shell Reconnect and execute the command it’s authorized. What we are looking at in here is the structure of a specific data set. In BigQuery we will be playing with a lot of data sets.
In addition to private data sets which we can create in BigQuery, there are also some things called public data sets. And what we are looking at is one of those public data sets. The data set is Samples Shakespeare and you can see details about that specific data set in here you can see the different fields which are present in there. You can see how many rows are present in that data set and what is the size of that specific data set. So this command BQ show is used to show the details about a specific data set. You can also do a query typically BigQuery, what do you do? You do queries so you can say BQ query and you can specify the query you’d want to execute. One important thing to remember is these BigQuery queries can take a long time to run and that’s the reason why you’d want to estimate the query before you want to run it.
You’d want to know the cost before you run it. And that’s why you can go for Hyphen. Hyphen, right. Run option. If you want to export data from BigQuery we already looked at the command. It’s BQ extract. If you want to load data, it’s load BQ load. If you want to run a BigQuery command within a specific project, we would use G cloud to set the project. That’s one important thing to remember. Even though BigQuery we use BQ if you want to set the project, we would still use Gcloud config set project. The next tool is CBT. Cloud. Big table is CBT. BigQuery is BQ Cloud. BigTable is CBT. CBT is CLI for Cloud big table, not Gcloud. So if you want to install CBT, you can say Gcloud components install CBT. If I type in CBT in here, you can see that CBT is already present in here in Cloud Shell CBT is preinstalled.
If I just type in Gcloud over here if you type in G cloud Hyphen Hyphen version you can see all the things that are installed. You can see that BQ CBT are already installed in here. So we don’t really need to do anything related to Gcloud components install CBT if we are using BQ or CBT on Cloud shell. However, if you are using it on a local machine, then you’d need to first do G cloud components install CBT or Gcloud components install BQ and only then you’d be able to execute the CBT or the BQ commands. You can use CBT list instances to list the CBT instances which are active at that particular point in time. You can see that it’s asking me hey, you are asking me to list the instances but you are not specified the project. So as you can see in here, it’s saying missing project.
How can you configure the project for CBT? The way you can actually configure the project for CBT the recommended approach is to actually create a CBTRC file. So you need to configure the CBTRC file with an entry project is equal to with the project ID. So let’s try that echo project. So Echo project is equal to project if an ID and we are sending it to a file CBTRC. And before that, I would need to copy the project ID. Let’s copy the project ID and replace it in here. So we are sending the entry into a configuration file, CBTRC file. And now if I do a cathome CBTRC, I can see project is equal to this specific project. And now let’s do CBT list instances. What does it do? It’s now listing the instances which are present in here. And right now you can see that there are zero instances.
We do not have any Cloud Big Table instances which are present in here. But the important thing that you need to remember is how we are configuring the project. The way we can configure a project is by actually in the home folder. We are configuring a CBTRC file with the project information. Project is equal to this project ID which we want to make use of and run the CBT commands against. So if you want to configure a specific instance you would want to run against, you can configure instance. Instance is equal to the specific instance. You can create an instance CBT create instance. You can create a cluster whenever we talk about Big Table. First we would create an instance, and then inside that we would create a cluster. And after that we create tables.
And there are corresponding delete commands and list commands as well. If you want to list everything, including tables and column families, you can actually do a CPT LS. So the important takeaway from this step is that whenever we want to play with Google Cloud SQL, we would be using Gcloud SQL commands. Whenever we want to play with BigQuery, we would use BQ. Whenever you want to play with Cloud BigTable, we would be using CBT. Another important thing to remember is how we configure projects when we are playing with Cloud SQL or when we are playing with BigQuery, we can set the active project in the traditional G cloud way. However, when we are playing with Big Table, the way we set the active project is by configuring a CBTRC file. I’m sure you’re having a wonderful time and I’ll see you in the next step.
Interesting posts
The Growing Demand for IT Certifications in the Fintech Industry
The fintech industry is experiencing an unprecedented boom, driven by the relentless pace of technological innovation and the increasing integration of financial services with digital platforms. As the lines between finance and technology blur, the need for highly skilled professionals who can navigate both worlds is greater than ever. One of the most effective ways… Read More »
CompTIA Security+ vs. CEH: Entry-Level Cybersecurity Certifications Compared
In today’s digital world, cybersecurity is no longer just a technical concern; it’s a critical business priority. With cyber threats evolving rapidly, organizations of all sizes are seeking skilled professionals to protect their digital assets. For those looking to break into the cybersecurity field, earning a certification is a great way to validate your skills… Read More »
The Evolving Role of ITIL: What’s New in ITIL 4 Managing Professional Transition Exam?
If you’ve been in the IT service management (ITSM) world for a while, you’ve probably heard of ITIL – the framework that’s been guiding IT professionals in delivering high-quality services for decades. The Information Technology Infrastructure Library (ITIL) has evolved significantly over the years, and its latest iteration, ITIL 4, marks a substantial shift in… Read More »
SASE and Zero Trust: How New Security Architectures are Shaping Cisco’s CyberOps Certification
As cybersecurity threats become increasingly sophisticated and pervasive, traditional security models are proving inadequate for today’s complex digital environments. To address these challenges, modern security frameworks such as SASE (Secure Access Service Edge) and Zero Trust are revolutionizing how organizations protect their networks and data. Recognizing the shift towards these advanced security architectures, Cisco has… Read More »
CompTIA’s CASP+ (CAS-004) Gets Tougher: What’s New in Advanced Security Practitioner Certification?
The cybersecurity landscape is constantly evolving, and with it, the certifications that validate the expertise of security professionals must adapt to address new challenges and technologies. CompTIA’s CASP+ (CompTIA Advanced Security Practitioner) certification has long been a hallmark of advanced knowledge in cybersecurity, distinguishing those who are capable of designing, implementing, and managing enterprise-level security… Read More »
Azure DevOps Engineer Expert Certification: What’s Changed in the New AZ-400 Exam Blueprint?
The cloud landscape is evolving at a breakneck pace, and with it, the certifications that validate an IT professional’s skills. One such certification is the Microsoft Certified: DevOps Engineer Expert, which is validated through the AZ-400 exam. This exam has undergone significant changes to reflect the latest trends, tools, and methodologies in the DevOps world.… Read More »