In early days, I've wrote a blog about Oracle Reference Architecture and concept of Schema on Read and Schema on Write. Schema on Read is well suitable for Data Lake, which may ingest any data as it is, without any transformation and preserve it for a long period of time.
At the same time you have two types of data – Streaming Data and Batch. Batch could be log files, RDBMS archives. Streaming data could be IoT, Sensors, Golden Gate replication logs.
Apache Kafka is very popular engine for acquiring streaming data. It has multiple advantages, like scalability, fault tolerance and high throughput. Unfortunately, Kafka is hard to manage. Fortunately, Cloud simplifies many routine operations. Oracle Has three options for deploy Kafka in the Cloud:
1) Use Big Data Cloud Service, where you get full Cloudera cluster and there you could deploy Apache Kafka as part of CDH.
2) Event Hub Cloud Service Dedicated. Here you have to specify server shapes and some other parameters, but rest done by Cloud automagically.
3) Event Hub Cloud Service. This service is fully managed by Oracle, you even don't need to specify any compute shapes or so. Only one thing to do is tell for how long you need to store data in this topic and tell how many partitions do you need (partitions = performance).
Today, I'm going to tell you about last option, which is fully managed cloud service.
It's really easy to provision it, just need to login into your Cloud account and choose "Event Hub" Cloud service.
after this go and choose open service console:
Next, click on "Create service":
Put some parameters – two key is Retention period and Number of partitions. First defines for how long will you store messages, second defines performance for read and write operations.
Click next after:
Confirm and wait a while (usually not more than few minutes):
after a short while, you will be able to see provisioned service:
Hello world flow.
Today I want to show "Hello world" flow. How to produce (write) and consume (read) message from Event Hub Cloud Service.
The flow is (step by step):
1) Obtain OAuth token
2) Produce message to a topic
3) Create consumer group
4) Subscribe to topic
5) Consume message
Now I'm going to show it in some details.
OAuth and Authentication token (Step 1)
For dealing with Event Hub Cloud Service you have to be familiar with concept of OAuth and OpenID. If you are not familiar, you could watch the short video or go through this step by step tutorial.
In couple words OAuth token authorization (tells what I could access) method to restrict access to some resources.
One of the main idea is decouple Uses (real human – Resource Owner) and Application (Client). Real man knows login and password, but Client (Application) will not use it every time when need to reach Resource Server (which has some info or content). Instead of this, Application will get once a Authorization token and will use it for working with Resource Server. This is brief, here you may find more detailed explanation what is OAuth.
Obtain Token for Event Hub Cloud Service client.
As you could understand for get acsess to Resource Server (read as Event Hub messages) you need to obtain authorization token from Authorization Server (read as IDCS). Here, I'd like to show step by step flow how to obtain this token. I will start from the end and will show the command (REST call), which you have to run to get token:
curl -k -X POST -u "$CLIENT_ID:$CLIENT_SECRET"
-d "grant_type=password&username=$THEUSERNAME&password=$THEPASSWORD&scope=$THESCOPE"
"$IDCS_URL/oauth2/v1/token"
-o access_token.json
as you can see there are many parameters required for obtain OAuth token.
Let's take a looks there you may get it. Go to the service and click on topic which you want to work with, there you will find IDCS Application, click on it:
After clicking on it, you will go be redirected to IDCS Application page. Most of the credentials you could find here. Click on Configuration:
On this page right away you will find ClientID and Client Secret (think of it like login and password):
look down and find point, called Resources:
One more required parameter – IDCS_URL, you may find in your browser:
you have almost everything you need, except login and password. Here implies oracle cloud login and password (it what you are using when login into http://myservices.us.oraclecloud.com):
Now you have all required credential and you are ready to write some script, which will automate all this stuff:
#!/bin/bash
export CLIENT_ID=7EA06D3A99D944A5ADCE6C64CCF5C2AC_APPID
export CLIENT_SECRET=0380f967-98d4-45e9-8f9a-45100f4638b2
export THEUSERNAME=john.dunbar
export THEPASSWORD=MyPassword
export SCOPE=/idcs-1d6cc7dae45b40a1b9ef42c7608b9afe-oehtest
export PRIMARY_AUDIENCE=https://7EA06D3A99D944A5ADCE6C64CCF5C2AC.uscom-central-1.oraclecloud.com:443
export THESCOPE=$PRIMARY_AUDIENCE$SCOPE
export IDCS_URL=https://idcs-1d6cc7dae45b40a1b9ef42c7608b9afe.identity.oraclecloud.com
curl -k -X POST -u "$CLIENT_ID:$CLIENT_SECRET"
-d "grant_type=password&username=$THEUSERNAME&password=$THEPASSWORD&scope=$THESCOPE"
"$IDCS_URL/oauth2/v1/token"
-o access_token.json
after running this script, you will have new file called access_token.json. Field access_token it's what you need:
$ cat access_token.json
{"access_token":"eyJ4NXQjUzI1NiI6InVUMy1YczRNZVZUZFhGbXFQX19GMFJsYmtoQjdCbXJBc3FtV2V4U2NQM3MiLCJ4NXQiOiJhQ25HQUpFSFdZdU9tQWhUMWR1dmFBVmpmd0UiLCJraWQiOiJTSUdOSU5HX0tFWSIsImFsZyI6IlJTMjU2In0.eyJ1c2VyX3R6IjoiQW1lcmljYVwvQ2hpY2FnbyIsInN1YiI6ImpvaG4uZHVuYmFyIiwidXNlcl9sb2NhbGUiOiJlbiIsInVzZXJfZGlzcGxheW5hbWUiOiJKb2huIER1bmJhciIsInVzZXIudGVuYW50Lm5hbWUiOiJpZGNzLTFkNmNjN2RhZTQ1YjQwYTFiOWVmNDJjNzYwOGI5YWZlIiwic3ViX21hcHBpbmdhdHRyIjoidXNlck5hbWUiLCJpc3MiOiJodHRwczpcL1wvaWRlbnRpdHkub3JhY2xlY2xvdWQuY29tXC8iLCJ0b2tfdHlwZSI6IkFUIiwidXNlcl90ZW5hbnRuYW1lIjoiaWRjcy0xZDZjYzdkYWU0NWI0MGExYjllZjQyYzc2MDhiOWFmZSIsImNsaWVudF9pZCI6IjdFQTA2RDNBOTlEOTQ0QTVBRENFNkM2NENDRjVDMkFDX0FQUElEIiwiYXVkIjpbInVybjpvcGM6bGJhYXM6bG9naWNhbGd1aWQ9N0VBMDZEM0E5OUQ5NDRBNUFEQ0U2QzY0Q0NGNUMyQUMiLCJodHRwczpcL1wvN0VBMDZEM0E5OUQ5NDRBNUFEQ0U2QzY0Q0NGNUMyQUMudXNjb20tY2VudHJhbC0xLm9yYWNsZWNsb3VkLmNvbTo0NDMiXSwidXNlcl9pZCI6IjM1Yzk2YWUyNTZjOTRhNTQ5ZWU0NWUyMDJjZThlY2IxIiwic3ViX3R5cGUiOiJ1c2VyIiwic2NvcGUiOiJcL2lkY3MtMWQ2Y2M3ZGFlNDViNDBhMWI5ZWY0MmM3NjA4YjlhZmUtb2VodGVzdCIsImNsaWVudF90ZW5hbnRuYW1lIjoiaWRjcy0xZDZjYzdkYWU0NWI0MGExYjllZjQyYzc2MDhiOWFmZSIsInVzZXJfbGFuZyI6ImVuIiwiZXhwIjoxNTI3Mjk5NjUyLCJpYXQiOjE1MjY2OTQ4NTIsImNsaWVudF9ndWlkIjoiZGVjN2E4ZGRhM2I4NDA1MDgzMjE4NWQ1MzZkNDdjYTAiLCJjbGllbnRfbmFtZSI6Ik9FSENTX29laHRlc3QiLCJ0ZW5hbnQiOiJpZGNzLTFkNmNjN2RhZTQ1YjQwYTFiOWVmNDJjNzYwOGI5YWZlIiwianRpIjoiMDkwYWI4ZGYtNjA0NC00OWRlLWFjMTEtOGE5ODIzYTEyNjI5In0.aNDRIM5Gv_fx8EZ54u4AXVNG9B_F8MuyXjQR-vdyHDyRFxTefwlR3gRsnpf0GwHPSJfZb56wEwOVLraRXz1vPHc7Gzk97tdYZ-Mrv7NjoLoxqQj-uGxwAvU3m8_T3ilHthvQ4t9tXPB5o7xPII-BoWa-CF4QC8480ThrBwbl1emTDtEpR9-4z4mm1Ps-rJ9L3BItGXWzNZ6PiNdVbuxCQaboWMQXJM9bSgTmWbAYURwqoyeD9gMw2JkwgNMSmljRnJ_yGRv5KAsaRguqyV-x-lyE9PyW9SiG4rM47t-lY-okMxzchDm8nco84J5XlpKp98kMcg65Ql5Y3TVYGNhTEg","token_type":"Bearer","expires_in":604800}
Create Linux variable for it:
#!/bin/bash
export TOKEN=`cat access_token.json |jq .access_token|sed 's/"//g'`
Well, now we have Authorization token and may work with our Resource Server (Event Hub Cloud Service).
Note: you also may check documentation about how to obtain OAuth token.
Produce Messages (Write data) to Kafka (Step 2)
The first thing that we may want to do is produce messages (write data to a Kafka cluster). To make scripting easier, it's also better to use some environment variables for common resources. For this example, I'd recommend to parametrize topic's end point, topic name, type of content to be accepted and content type. Content type is completely up to developer, but you have to consume (read) the same format as you produce(write). The key parameter to define is REST endpoint. Go to PSM, click on topic name and copy everything till "restproxy":
Also, you will need topic name, which you could take from the same window:
now we could write a simple script for produce one message to Kafka:
#!/bin/bash
export OEHCS_ENDPOINT=https://oehtest-gse00014957.uscom-central-1.oraclecloud.com:443/restproxy
export TOPIC_NAME=idcs-1d6cc7dae45b40a1b9ef42c7608b9afe-oehtest
export CONTENT_TYPE=application/vnd.kafka.json.v2+json
curl -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: $CONTENT_TYPE"
–data '{"records":[{"value":{"foo":"bar"}}]}'
$OEHCS_ENDPOINT/topics/$TOPIC_NAME
if everything will be fine, Linux console will return something like:
{"offsets":[{"partition":1,"offset":8,"error_code":null,"error":null}],"key_schema_id":null,"value_schema_id":null}
Create Consumer Group (Step 3)
The first step to read data from OEHCS is create consumer group. We will reuse environment variables from previous step, but just in case I'll include it in this script:
#!/bin/bash
export OEHCS_ENDPOINT=https://oehtest-gse00014957.uscom-central-1.oraclecloud.com:443/restproxy
export CONTENT_TYPE=application/vnd.kafka.json.v2+json
export TOPIC_NAME=idcs-1d6cc7dae45b40a1b9ef42c7608b9afe-oehtest
curl -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: $CONTENT_TYPE"
–data '{"format": "json", "auto.offset.reset": "earliest"}'
$OEHCS_ENDPOINT/consumers/oehcs-consumer-group
-o consumer_group.json
this script will generate output file, which will contain variables, that we will need to consume messages.
Subscribe to a topic (Step 4)
Now you are ready to subscribe for this topic (export environment variable if you didn't do this before):
#!/bin/bash
export BASE_URI=`cat consumer_group.json |jq .base_uri|sed 's/"//g'`
export TOPIC_NAME=idcs-1d6cc7dae45b40a1b9ef42c7608b9afe-oehtest
curl -X POST
-H "Authorization: Bearer $TOKEN"
-H "Content-Type: $CONTENT_TYPE"
-d "{"topics": ["$TOPIC_NAME"]}"
$BASE_URI/subscription
If everything fine, this request will not return something.
Consume (Read) messages (Step 5)
Finally, we approach last step – consuming messages.
and again, it's quite simple curl request:
#!/bin/bash
export BASE_URI=`cat consumer_group.json |jq .base_uri|sed 's/"//g'`
export H_ACCEPT=application/vnd.kafka.json.v2+json
curl -X GET
-H "Authorization: Bearer $TOKEN"
-H "Accept: $H_ACCEPT"
$BASE_URI/records
if everything works, like it supposed to work, you will have output like:
[{"topic":"idcs-1d6cc7dae45b40a1b9ef42c7608b9afe-oehtest","key":null,"value":{"foo":"bar"},"partition":1,"offset":17}]
Conclusion
Today we saw how easy to create fully managed Kafka Topic in Event Hub Cloud Service and also we made a first steps into it – write and read message. Kafka is really popular message bus engine, but it's hard to manage. Cloud simplifies this and allow customers concentrate on the development of their applications.
here I also want to give some useful links:
1) If you are not familiar with REST API, I'd recommend you to go through this blog
2) There is online tool, which helps to validate your curl requests
3) Here you could find some useful examples of producing and consuming messages
4) If you are not familiar with OAuth, here is nice tutorial, which show end to end example