milearning

Tuesday 23 February 2021

Point yum to Centos Repository

Point Centos repository to download packages.

In here you can just add these to your /etc/yum.repos.d/centos.repo file and paste these contents.

[base]
name=CentOS $releasever – Base
baseurl=http://mirror.centos.org/centos/7/os/$basearch/
gpgcheck=0
enabled=1

[updates]
name=CentOS $releasever – Updates
baseurl=http://mirror.centos.org/centos/7/updates/$basearch/
gpgcheck=0
enabled=1

[extras]
name=CentOS $releasever – Extras
baseurl=http://mirror.centos.org/centos/7/extras/$basearch/
gpgcheck=0
enabled=1

Install epel repo,

rpm -Uvh https://dl.fedoraproject.org/pub/epel/7/x86_64/Packages/e/epel-release-7-11.noarch.rpm

# yum clean all

# yum list installed

Thanks,

Sunil

Saturday 14 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 5

File Beats

Whether you're collecting from security devices, cloud, containers, hosts, or OT, Filebeat helps you keep the simple things simple by offering a lightweight way to forward and centralize logs and files.(Easlticsearch)

Reference:

https://www.elastic.co/beats/filebeat

https://www.elastic.co/guide/en/beats/filebeat/current/index.html

How does the file beat work?

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html

https://www.elastic.co/guide/en/beats/filebeat/6.3/how-filebeat-works.html

How to install filebeat?

From your kibana console -> Logging -> Apache Metrics -> Download the filebeats and configure.

https://www.elastic.co/guide/en/beats/filebeat/6.3/logstash-output.html

I have modified the logstash file so that input would be using the filebeats ..

cat apache-filebeat.conf

input
{
beats {
port => 5044
}
}

filter
{
grok{
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
mutate{
convert => { "bytes" => "integer" }
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
locale => en
remove_field => "timestamp"
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}

output
{
stdout {
codec => dots
}
elasticsearch {
hosts => ["http://localhost:9200"]
index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
}
}

Start the logstash server from a new terminal, search for the logs which would listen to the beats..

C:\elk\logstash>bin\logstash.bat -f C:\elk\data\apache-filebeat.conf
Sending Logstash's logs to C:/elk/logstash/logs which is now configured via log4j2.properties
[2020-11-14T17:18:12,918][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-11-14T17:18:13,368][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2020-11-14T17:18:16,372][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-11-14T17:18:16,784][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2020-11-14T17:18:16,784][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2020-11-14T17:18:16,971][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2020-11-14T17:18:17,018][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2020-11-14T17:18:17,018][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2020-11-14T17:18:17,049][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["http://localhost:9200"]}
[2020-11-14T17:18:17,065][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2020-11-14T17:18:17,096][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2020-11-14T17:18:17,252][INFO ][logstash.filters.geoip ] Using geoip database {:path=>"C:/elk/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"}
[2020-11-14T17:18:17,811][INFO ][logstash.inputs.beats ] Beats inputs: Starting input listener {:address=>"0.0.0.0:5044"}
[2020-11-14T17:18:17,827][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x7758f1a6 run>"}
[2020-11-14T17:18:17,936][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-11-14T17:18:17,943][INFO ][org.logstash.beats.Server] Starting server on port: 5044
[2020-11-14T17:18:18,208][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}

It's now time to modify the filebeat configuration.

config file: filebeat.yml

Change these in the config file..

filebeat.inputs:

- type: log
enabled: true
paths:
- C:\elk\data\logs\*

Comment out from elastic search and un-comment logstash

#----------------------------- Logstash output --------------------------------
output.logstash:
hosts: ["localhost:5044"]

Save and Quit, start the filebeat server.

Open a new terminal and execute below command

elk/filebeat>filebeat.exe

Go to the Kibana console and check on the Management, you would find all the index building done by the logstash.

All your data is being sent from the host using filebeat to the logstash server.

That's all for ELK.

Elasticsearch/Kubernetes/Logstash(ELK) - Part 4

Installations of Logstash with Kibana

Install the logstash, and start the service. You can test if its working fine using the below command,

bin/logstash -e '{ input stdin{}}' '{ output stdout{}}'

This means whatever you type would be taken as input to logstash and would output the same on the output.

I would be taking an Apache log file and would create a logstash config file where it would load all the data that has to be searched and loaded as an index into kibana.

Place this file in your elk folder, create another folder as data and place this file over there.

Reference:

https://www.elastic.co/guide/en/logstash/current/input-plugins.html

https://www.elastic.co/guide/en/logstash/current/output-plugins.html

https://www.elastic.co/guide/en/logstash/current/filter-plugins.html

input
{
file {
path => "C:\elk\data\logs\logs"
type => "logs"
start_position => "beginning"
}
}

filter
{
grok{
match => {
"message" => "%{COMBINEDAPACHELOG}"
}
}
mutate{
convert => { "bytes" => "integer" }
}
date {
match => [ "timestamp", "dd/MMM/YYYY:HH:mm:ss Z" ]
locale => en
remove_field => "timestamp"
}
geoip {
source => "clientip"
}
useragent {
source => "agent"
target => "useragent"
}
}

output
{
stdout {
codec => dots
}
elasticsearch {
}
}

Download the apache log(https://github.com/elastic/elk-index-size-tests/blob/master/logs.gz) from whatever source you have mentioned in the above document(C:\elk\data\logs\logs) and start the logstash server. All these logs would be read from the elasticsearch as the output is destined over there. we could see the same from the kibana console.

C:\elk\logstash>bin\logstash.bat -f C:\elk\data\apache.conf

Sending Logstash's logs to C:/elk/logstash/logs which is now configured via log4j2.properties
[2020-11-14T09:41:21,710][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"C:/elk/logstash/data/queue"}
[2020-11-14T09:41:21,722][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"C:/elk/logstash/data/dead_letter_queue"}
[2020-11-14T09:41:21,880][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2020-11-14T09:41:21,927][INFO ][logstash.agent ] No persistent UUID file found. Generating new UUID {:uuid=>"ec39c09a-712e-4d86-a9d8-ab629546e04f", :path=>"C:/elk/logstash/data/uuid"}
[2020-11-14T09:41:22,721][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2020-11-14T09:41:27,029][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2020-11-14T09:41:27,751][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://127.0.0.1:9200/]}}
[2020-11-14T09:41:27,767][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://127.0.0.1:9200/, :path=>"/"}
[2020-11-14T09:41:28,095][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://127.0.0.1:9200/"}
[2020-11-14T09:41:28,189][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2020-11-14T09:41:28,189][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2020-11-14T09:41:28,236][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//127.0.0.1"]}
[2020-11-14T09:41:28,251][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2020-11-14T09:41:28,298][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"@timestamp"=>{"type"=>"date"}, "@version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2020-11-14T09:41:28,392][INFO ][logstash.outputs.elasticsearch] Installing elasticsearch template to _template/logstash
[2020-11-14T09:41:28,801][INFO ][logstash.filters.geoip ] Using geoip database {:path=>"C:/elk/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-geoip-5.0.3-java/vendor/GeoLite2-City.mmdb"}
[2020-11-14T09:41:30,949][INFO ][logstash.pipeline ] Pipeline started successfully {:pipeline_id=>"main", :thread=>"#<Thread:0x58a3a982 run>"}
[2020-11-14T09:41:31,074][INFO ][logstash.agent ] Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
[2020-11-14T09:41:31,851][INFO ][logstash.agent ] Successfully started Logstash API endpoint {:port=>9600}
.....................................................................................................

Once these dots are completed, you can close this window. You can test the same in the kibana console that you have all the indexes from the logstash..

[ ctrl-c]

..................................................................[2020-11-14T10:46:36,690][WARN ][logstash.runner ] SIGINT received. Shutting down.
[2020-11-14T10:46:38,334][INFO ][logstash.pipeline ] Pipeline has terminated {:pipeline_id=>"main", :thread=>"#<Thread:0x58a3a982 run>"}
Terminate batch job (Y/N)? y

C:\elk\logstash>

you can also check from the elasticsearch url

http://localhost:9200/logstash-*/_count

Now, login to kibana, and go to Discover and create an index pattern using @timestamp (select range from 01-June-2014 to 20-July-2014)and you see the diagram below. Once the datas is loaded you will need to get a playaround, creating your own visuals and those being presented in the Dashboard.

We can create metrics of various fields and add into a single dashboard.

Created Pie-Charts, Bar Charts, & Geo-location graphs into the dashboard for the total requests during the mentioned time line.

We could write more of the data and create more dashboards to provide more insights.

Friday 13 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 3

Elastic Search Query & Practice

Everything in Elasticsearch is HTTP and the body uses the JSON format to query the Elasticsearch. This language to query elasticsearch is DSL which will match your search criteria and provide some kind of relevancy score, more occurrences of the words in the document is considered to be more relevant(relevancy score).

Syntax for DSL components

Query: Query context which is used for full text searches and that's supposed to match documents that contain the search criteria as well to specify how well the document matched that particular search criteria by providing that relevancy score.

Filter: Filter context is mostly used for filtering structured data

But both the query context and the filter context can be combined together to form one large query.

Index Creation and Query using DSL

Create these below 10 records and get hands on how to use query Elasticsearch. This is mainly for practice, create it one by one in the kibal console which you can practice for querying the searches.

PUT /courses/classroom/1
{
"name": "Accounting 101",
"room": "E3",
"professor": {
"name": "Thomas Baszo",
"department": "finance",
"facutly_type": "part-time",
"email": "baszot@onuni.com"
},
"students_enrolled": 27,
"course_publish_date": "2015-01-19",
"course_description": "Act 101 is a course from the business school on the introduction to accounting that teaches students how to read and compose basic financial statements"
}

PUT /courses/classroom/2
{
"name": "Marketing 101",
"room": "E4",
"professor": {
"name": "William Smith",
"department": "finance",
"facutly_type": "part-time",
"email": "wills@onuni.com"
},
"students_enrolled": 18,
"course_publish_date": "2015-06-21",
"course_description": "Mkt 101 is a course from the business school on the introduction to marketing that teaches students the fundamentals of market analysis, customer retention and online advertisements"
}

PUT /courses/classroom/3
{
"name": "Anthropology 230",
"room": "G11",
"professor": {
"name": "Devin Cranford",
"department": "history",
"facutly_type": "full-time",
"email": "devinc@onuni.com"
},
"students_enrolled": 22,
"course_publish_date": "2013-08-27",
"course_description": "Ant 230 is an intermediate course on human societies and cultures and their development. A focus on the Mayans civilization is rooted in this course"
}

PUT /courses/classroom/4
{
"name": "Computer Science 101",
"room": "C12",
"professor": {
"name": "Gregg Payne",
"department": "engineering",
"facutly_type": "full-time",
"email": "payneg@onuni.com"
},
"students_enrolled": 33,
"course_publish_date": "2013-08-27",
"course_description": "CS 101 is a first year computer science introduction teaching fundamental data structures and alogirthms using python. "
}

PUT /courses/classroom/5
{
"name": "Theatre 410",
"room": "T18",
"professor": {
"name": "Sebastian Hern",
"department": "art",
"facutly_type": "part-time",
"email": ""
},
"students_enrolled": 47,
"course_publish_date": "2013-01-27",
"course_description": "Tht 410 is an advanced elective course disecting the various plays written by shakespere during the 16th century"
}

PUT /courses/classroom/6
{
"name": "Cost Accounting 400",
"room": "E7",
"professor": {
"name": "Bill Cage",
"department": "accounting",
"facutly_type": "full-time",
"email": "cageb@onuni.com"
},
"students_enrolled": 31,
"course_publish_date": "2014-12-31",
"course_description": "Cst Act 400 is an advanced course from the business school taken by final year accounting majors that covers the subject of business incurred costs and how to record them in financial statements"
}

PUT /courses/classroom/7
{
"name": "Computer Internals 250",
"room": "C8",
"professor": {
"name": "Gregg Payne",
"department": "engineering",
"facutly_type": "part-time",
"email": "payneg@onuni.com"
},
"students_enrolled": 33,
"course_publish_date": "2012-08-20",
"course_description": "cpt Int 250 gives students an integrated and rigorous picture of applied computer science, as it comes to play in the construction of a simple yet powerful computer system. "
}

PUT /courses/classroom/8
{
"name": "Accounting Info Systems 350",
"room": "E3",
"professor": {
"name": "Bill Cage",
"department": "accounting",
"facutly_type": "full-time",
"email": "cageb@onuni.com"
},
"students_enrolled": 19,
"course_publish_date": "2014-05-15",
"course_description": "Act Sys 350 is an advanced course providing students a practical understanding of an accounting system in database technology. Students will use MS Access to build a transaction ledger system"
}

PUT /courses/classroom/9
{
"name": "Tax Accounting 200",
"room": "E7",
"professor": {
"name": "Thomas Baszo",
"department": "finance",
"facutly_type": "part-time",
"email": "baszot@onuni.com"
},
"students_enrolled": 17,
"course_publish_date": "2016-06-15",
"course_description": "Tax Act 200 is an intermediate course covering various aspects of tax law"
}

PUT /courses/classroom/10
{
"name": "Capital Markets 350",
"room": "E3",
"professor": {
"name": "Thomas Baszo",
"department": "finance",
"facutly_type": "part-time",
"email": "baszot@onuni.com"
},
"students_enrolled": 13,
"course_publish_date": "2016-01-11",
"course_description": "This is an advanced course teaching crucial topics related to raising capital and bonds, shares and other long-term equity and debt financial instrucments"
}

The most basic kind of query in a classic search is the match or query. You can see the _score to find the relevance of the document.

There are more examples in this section for practice purposes.

GET /courses/_search
{
"query":{
"match_all": {}
}
}

GET /courses/_search
{
"query":{
"match": {"name":"computer"}
}
}

PUT /courses/classroom/5
{
"name": "Theatre 410",
"room": "T18",
"professor": {
"name": "Sebastian Hern",
"department": "art",
"facutly_type": "part-time"
},
"students_enrolled": 47,
"course_publish_date": "2013-01-27",
"course_description": "Tht 410 is an advanced elective course disecting the various plays written by shakespere during the 16th century"
}

displays all the records which exists in the field

GET /courses/_search
{
"query":{
"exists": {"field":"professor.email"}
}
}

Match multiple criteria

GET /courses/_search
{
"query":{
"bool": {
"must": [
{"match": {"name":"computer"}},
{"match": {"room": "c8"}}
]
}
}
}

Multi match criteria

GET /courses/_search
{
"query":{
"multi_match":{
"fields": ["name","professor.name"],
"query": "accounting"
}
}
}

Searches for the string with exact words

GET /courses/_search
{
"query":{
"match_phrase":{
"course_description": "financial statements"
}
}
}

Searches for the words with partial contexts also.

GET /courses/_search
{
"query":{
"match_phrase_prefix":{
"course_description": "financial statements"
}
}
}

Search in range

GET courses/_search
{
"query": {
"range": {
"students_enrolled": {
"gte": 10,
"lte": 15
}
}
}
}

Search for date

GET courses/_search
{
"query": {
"range": {
"course_publish_date": {
"gte": 2013
}
}
}
}

Combine two or more queries.

GET courses/_search
{
"query": {
"bool": {
"must": [
{"match": {
"name": "accounting"
}}
]
, "must_not": [
{"match": {
"room": "e7"
}}
]
, "should": [
{"range": {
"students_enrolled": {
"gte": 10,
"lte": 20
}
}}
]
}
}
}

Filters

GET courses/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must":[
{
"range":{
"students_enrolled":{
"gte": 30
}
}
}
]
}
}
}
}
}

You can create a new set of index, to practice aggregations and filter.

Bulk Indexing

POST /vehicles/cars/_bulk
{ "index": {}}
{ "price" : 10000, "color" : "white", "make" : "honda", "sold" : "2016-10-28", "condition": "okay"}
{ "index": {}}
{ "price" : 20000, "color" : "white", "make" : "honda", "sold" : "2016-11-05", "condition": "new" }
{ "index": {}}
{ "price" : 30000, "color" : "green", "make" : "ford", "sold" : "2016-05-18", "condition": "new" }
{ "index": {}}
{ "price" : 15000, "color" : "blue", "make" : "toyota", "sold" : "2016-07-02", "condition": "good" }
{ "index": {}}
{ "price" : 12000, "color" : "green", "make" : "toyota", "sold" : "2016-08-19" , "condition": "good"}
{ "index": {}}
{ "price" : 18000, "color" : "red", "make" : "dodge", "sold" : "2016-11-05", "condition": "good" }
{ "index": {}}
{ "price" : 80000, "color" : "red", "make" : "bmw", "sold" : "2016-01-01", "condition": "new" }
{ "index": {}}
{ "price" : 25000, "color" : "blue", "make" : "ford", "sold" : "2016-08-22", "condition": "new" }
{ "index": {}}
{ "price" : 10000, "color" : "gray", "make" : "dodge", "sold" : "2016-02-12", "condition": "okay" }
{ "index": {}}
{ "price" : 19000, "color" : "red", "make" : "dodge", "sold" : "2016-02-12", "condition": "good" }
{ "index": {}}
{ "price" : 20000, "color" : "red", "make" : "chevrolet", "sold" : "2016-08-15", "condition": "good" }
{ "index": {}}
{ "price" : 13000, "color" : "gray", "make" : "chevrolet", "sold" : "2016-11-20", "condition": "okay" }
{ "index": {}}
{ "price" : 12500, "color" : "gray", "make" : "dodge", "sold" : "2016-03-09", "condition": "okay" }
{ "index": {}}
{ "price" : 35000, "color" : "red", "make" : "dodge", "sold" : "2016-04-10", "condition": "new" }
{ "index": {}}
{ "price" : 28000, "color" : "blue", "make" : "chevrolet", "sold" : "2016-08-15", "condition": "new" }
{ "index": {}}
{ "price" : 30000, "color" : "gray", "make" : "bmw", "sold" : "2016-11-20", "condition": "good" }

Aggregation
till now we were only searching, now we would do data analytics by insights into data.

Search all data with price in descending order.

GET /vehicles/cars/_search
{
"from": 0,
"size": 5

, "query": {
"match_all": {}
}
, "sort": [
{
"price": {
"order": "desc"
}
}
]
}

Creating the Aggregation which searches on all the cars which are in color 'red' with price details.

GET /vehicles/cars/_search
{
"size": 1,
"query": {
"match": {
"color": "red"
}
},

"aggs": {
"popular_cars": {
"terms": {
"field": "make.keyword",
"size": 10
}
, "aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"max_price": {
"max": {
"field": "price"
}
},
"min_price": {
"min": {
"field": "price"
}
}
}
}
}
}

Creating the buckets for the sold date.

GET /vehicles/cars/_search
{
"aggs": {
"popular_cars": {
"terms": {
"field": "make.keyword",
"size": 10
}
, "aggs": {
"sold_date_range": {
"range": {
"field": "sold",
"ranges": [
{
"from": "2016-01-01",
"to": "2016-05-18"
},
{
"from": "2016-05-18",
"to": "2017-01-01"
}
]
},
"aggs": {
"avg_price": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}

Find the conditions of the cars, their max, min price as an aggregate scoping.

GET /vehicles/cars/_search
{
"aggs": {
"car_condition": {
"terms": {
"field": "condition.keyword",
"size": 10
}
, "aggs": {
"avg_price": {
"avg": {
"field": "price"
}
},
"make":{
"terms": {
"field": "make.keyword",
"size": 10
},
"aggs":{
"min_price": { "min": { "field": "price" }},
"max_price": { "max": { "field": "price" }}
}
}
}
}
}
}

Elasticsearch/Kubernetes/Logstash(ELK) - Part 2

Index Creations

What happens when we send requests to index a document or query the document in a classic search ?

As you know elastic search is a distributed technology, which means it's faster to solve the given problem. So if we have a really large search problem and we break it up into really small ones and we hand them out to each one of those computers and all those computers can sort of work in parallel to solve the problem in a much shorter period of time as opposed to just one computer.

index search is actually very scalable and the logical representation for any human readable data, under the hood it's actually called as shards. So a particular index like this could be split up into multiple shards.

example,
document from 1 to 50 -> shard_0
document from 51 to 100 -> Shard_1

Shards, since we already know that these are distributed(one or more computers) so node one is one computer and Node 2 is another computer and both of them have a classic search running and they can communicate with one another, when on the same network(clustering). so smaller units of storage is called "shards"

shards are placed in either of the containers(computers) so that when actual search requests come up, any of the nodes in the cluster would be able to respond. For backup of these, you call as replica shards if required to be configured. so when creating an index, it will be an exact copy of the actual shard, this is mainly used in the production.

How does search take place ?

what node is going to do is take the ID of this document that we are trying to index and it's going to run it through something called the hashing function.

Shards
So a shard is basically a container of inverted indices also called segments. A segment belongs inside of charts, and a chart can have multiple segments and each one of these segments or inverted indices.

inverted index:
long list of words in alphabetical order and next to each word we need to put the document that word occurred the actual location for that, called token and the process of doing it is called tokenization.

And as you're aware and an elastic search index is made up of multiple shards, and that is how an index can span multiple nodes because the unit of separation is the Shard.

The process of taking raw tax and converting it and turning it into an inverted index is called analysis, so that data is searchable which makes it faster. Once it forms this inverted index then it gets into the memory buffer.

so this one document we sent it to a logic search for indexing. It went through the analysis step and inverted indexes created this document are sent to the buffer and get populated with that document, in the same way, another document to classic search goes through the analysis process and inverted indexes formed and it sends it into the buffer. Once this buffer gets filled up then this buffer gets committed to a segment. which are not called as immutable index

And this is the basis of setting your data so that it's searchable now. Once the segments are done this chart is now searchable. It's searchable and you can rest assured that the data in here is not going to be changed; it's been processed through analysis; inverted indices were formed and then they were committed to the segment. So this is permanent data that you can know you can search and that's all. Each one of those shards are formed.

So the magical process that happens here is this analysis step that's what turns a document like this into an inverted index.

Analysis Process

When we send a document into elasticsearch goes through a process of running this analysis step. The objective of this step is to convert or transform the document into an inverted index and store it into a shard, so this inverted index gets put into a shard segment. So this process of analysis is the key in indexing documents and not only during indexing but also during Query time when we retrieve or read the documents.

Example:
we will put these two sentences into analysis process

"sentence 1"
"sentence 2"

We were indexing these two sentences. We'd need to get rid of unnecessary information so that we can get to the most important pieces of both
of these documents and only those pieces would be indexed so if we were to convert these documents into an inverted index. This is called tokenization.

It goes through this process called an analyzer, who does all of the analysis and has two parts:

- The first part is tokenization So it has a tokenizer.
- The second step is filtering.

these steps are being done at the filtering,
- Remove stop words (whitespaces)
- Lowercasing
- Stemming (eg running, run, swimming, swim)
- Synonyms (eg thin, lean, skinny)

And when the text goes into this analyzer let's say we are indexing right we are indexing the document when text goes into this analyzer it gets first tokenized and now filtering takes place the tokens that come out are the ones that make it into the inverted index. So this is the indexing step.

Define Custom Index Structure

we could define an index "hr" we deal with this particular structure whichis the logical representation

of the actual data the actual data resides on the disk called shards. We're usually concerned with the logical

representation of the index and how to load the data into index.

Now let's get into the details of how the index structure can be defined if you had to create your index

manually and define the different fields in the properties and so on. So far elastic search created an index

structure for us dynamically on the fly. Now, lets see how to manually create it.

PUT /customer

{

"settings" : {

"number_of_replicas" : 2,

"number_of_shards": 1

"mappings" : {

"online" : {

"properties" : {

"gender" : {

"type" : "text",

"analyzer": "standard"

"age": {

"type": "integer"

"total_spent": {

"type": "float"

}

Lets create data to our index, as what we defined earlier.

PUT /customer/online/2343

{

"gender": "male",

"age": 22,

"total_spent": 50000,

"location": "Kashmir"

}

notice that when we use GET /customer all the data, even though what ever we have mentioned while defining

index, elastic search was added itself dynamically with "location".

we could restrict Elasticsearch, to set value of dynamic

- false: indexing field will be ignored.

- strict: indexing field will throw error.

PUT /customer/_mapping/online

{

"dynamic": "strict"

}

Analyzers

Elastic search has wide range of built-in analyzers, which can be used in any index without further configurations.

https://www.elastic.co/guide/en/elasticsearch/reference/6.8/analysis-analyzers.html

Thursday 12 November 2020

Elasticsearch/Kubernetes/Logstash(ELK) - Part 1

In this article we will learn about ELK.

Installing Elasticsearch & Kibana:

As part of the prerequisite, ensure you have installed Java.

Download Elastic and Kibana from below release page

https://www.elastic.co/downloads/past-releases

https://www.elastic.co/downloads/past-releases.

unzip the downloaded file for both.

cd kibana*

vim config/kibana.yml

search for elasticsearch.url in the config file and uncomment the line.

it would be default pointed to elasticsearch at localhost:9200

Running Elasticsearch and Kibana

First always start the elasticsearch, bin/elasticsearch.sh

second, start kibana, bin/kibana.sh

Open your browser and point to urls,

Elasticsearch: http://localhost:9200

Kibana: http://localhost:5601

As part of practice working ELK, we can populate data into a classic search, retrieve data, and delete data.
In the elastic search rolled data is stored into something called an index.

We would be taking an example of HR index and will create an index called hr and will store employee type and each employee would be created with an id.
e.g
<index>/<type>/<name>
/hr/employee/xyz

PUT /hr/employees/sunil
{
"Name": "Sunil",
"EmpID": "123"
}

Returns, the success code of the API call.
HEAD /hr/employees/sunil

Retrieve, data
GET /hr/employees/sunil

Update data,
POST /hr/employees/sunil/_update
{
"doc":{
"Location": "Bengaluru"
}
}

whenever data is being written it would not just change the attributes, instead the document itself.

Delete data,
DELETE /hr/employees/sunil
The deletion only did on the attribute on the call, however the index still remains.

DELETE /hr

Index Components

GET /business
You won't have any index hence it returns error, we will try to create a new index.

PUT /business/building/200
{
"address": "498 Dave Street In",
"floors": 3,
"offices": 5,
"loc": {
"latitude": 23.2332,
"longitute": 34.23233
}
}

GET /business
You would get the below output which has main componets as
aliases, mappings, settings.

So when we try to add more records into the search with different fields, elastics would map itself to the mapping section.
elastic search dynamic

{
"business": {
"aliases": {},
"mappings": {},
"settings": {}
}
}

PUT /business/building/201
{
"address": "498 Dave Street In",
"floors": 3,
"offices": 5,
"price": 5000000,
"loc": {
"latitude": 23.2332,
"longitute": 34.23233
}
}

Note: we could only have 1 type in the index.
e.g PUT /business/employees/232, this would give an error as /business is already associated with "buildings"
so you can crate in this way, with new Index

PUT /employees/_doc/200
{
"Name": "Sunil",
"title": "Senior Engineer",
"joining_data": "Jan 01 2020"
}

PUT /employees/_doc/201
{
"Name": "Ram",
"title": "Senior Tech Engineer",
"joining_data": "Jul 01 2000"
}

PUT /contracts/_doc/220
{
"Name": "System Admins",
"start_date": "Jan 10 2015",
"employees": [200, 201]
}

Query data

GET business/building/_search
or
GET business/_search

Search and get only the required record
GET business/_search
{
"query": {
"term": {
"address": "498"
}
}
}

Actual request for elastic search which goes from kibana console would be like below,

curl -X GET "http://localhost:9200/business/_search?pretty" -H 'Content-Type: application/json' -d'
{
"query": {
"term": {
"address": "498"
}
}
}'

We will further check on the next parts on text analysis for indexing and searching etc.

Sunday 8 November 2020

Prometheus Monitoring - Part 5

My previous articles mainly discussed regarding the alerts, alerts manager, and notifications.

https://sunlnx.blogspot.com/2020/11/prometheus-monitoring-part-4.html

In this article, we will include Prometheus as a datasource from Grafana for data visualisations.

Install & Configure Grafana

Install nginx
Install & Configure Nginx
Configure Nginx reverse proxy
Configure SSL
Register to DNS

Setup Prometheus DataSource
Setup Prometheus Dashboards
Create dashboards for node exporters

Install & Configure Grafana

I am installing Grafana using Ubuntu 20.04, with a root account.

sudo apt update
sudo apt-get install -y adduser libfontconfig1
wget https://dl.grafana.com/oss/release/grafana_7.2.0_amd64.deb
sudo dpkg -i grafana_7.2.0_amd64.deb
sudo service grafana-server start
sudo systemctl enable grafana-server.service

Your Grafana server will be hosted at http://[your Grafana server ip]:3000
The default Grafana login is
Username :admin
Password :admin

If you need to have an SSL, installing an nginx proxy would be fine and then configure reverse proxy to redirect accordingly.

sudo apt install nginx -y
sudo vim /etc/nginx/sites-enabled/prometheus

server {
listen 80;
listen [::]:80;
server_name prometheus.YOUR-DOMAIN-NAME;

location / {
proxy_pass http://localhost:3000/;
}
}

Save and test the new configuration has no errors
nginx -t

http://YOUR-DOMAIN-NAME
Visiting your ip address directly will still show the default Nginx welcome page. you can remove( rm /etc/nginx/sites-enabled/default )

restart nginx,
sudo service nginx restart
sudo service nginx status

Add SSL certificates to the grafana dashboards.

sudo snap install --classic certbot
sudo certbot --nginx

Once those certs are installed, you can use https://grafana.domainname.com to login from the browser.

Setup Prometheus DataSource

Once you logged into the grafana, go to "Configurations" → Click on "DataSources" → Select "Prometheus" .
Configurations required over here have to be filled up.

Go to the explore tabs in which Prometheus is already selected, and run the query "go_threads".

Setup Prometheus Dashboards

Go to Prometheus Configurations → Datasources → Click on the datasources which you have created → select Dashboards → Prometheus 2.0 Stats → Click on Import.

Create dashboards for node exporters

Configurations section choose → Plugins → Click on "Find more plugins on Grafana.com" → Select "Dashboards" → Select [ English version ] → copy ID : 11074

Grafana Web page, select "Manage" from Dashboards → Select "Import" → Paste the ID : "11074"