By / 10 February 2016 / development / < 1 min read

Parsing output with multiple whitespace

This post is older than a year. Consider some information might not be accurate anymore.

This post demonstrates how to parse output separated with multiple whitespace in the bash/shell.

I have to implement some elasticsearch curator functions, since python is not an option on my machine :-( . I query elasticsearch for the catalog of indices.

vinh@cinhtau:~> curl -s http://localhost:9200/_cat/indices?v
health status index                  pri rep docs.count docs.deleted store.size pri.store.size
green  open   logstash-2016.02.06      5   1    1899524      1077536      4.4gb          2.2gb
green  open   logstash-2016.02.05      5   1    3051521      1078468      6.1gb            3gb
       close  logstash-2016.02.04
       close  logstash-2016.02.03
green  open   logstash-2016.02.09      5   1    3571320      1077284      6.1gb            3gb
green  open   logstash-2016.02.08      5   1    3854980      1076828      8.3gb          4.1gb
green  open   logstash-2016.02.07      5   1    1384753      1077256      3.5gb          1.7gb
green  open   .marvel-es-2016.02.10    1   1     415332         2970    393.9mb        196.9mb
green  open   .kibana                  1   1         53            4    245.3kb        122.1kb
green  open   .marvel-es-2016.02.08    1   1     113514          850     97.4mb         48.7mb
green  open   .marvel-es-2016.02.09    1   1     348231         2682    332.2mb          166mb
green  open   logstash-2016.02.12      5   1    1623111            0      5.9gb          2.8gb
green  open   logstash-2016.02.11      5   1    2748311        42212      5.9gb          2.9gb
green  open   logstash-2016.02.10      5   1    4494718      1021304      8.3gb          4.1gb
..

If try cut with the delimiter ‘ ‘ it won’t work, because of the multiple spaces between the status and index name. In this case you can use awk with the regex of multiple spaces ' +':

vinh@cinhtau:~> curl -s http://localhost:9200/_cat/indices | awk -F ' +' '{print $3}'
logstash-2016.02.06
logstash-2016.01.15
logstash-2016.01.16
logstash-2016.02.05
logstash-2016.02.04
logstash-2016.01.13
logstash-2016.02.03
logstash-2016.02.09
logstash-2016.02.08
logstash-2016.01.17
logstash-2016.01.18
logstash-2016.02.07
.marvel-es-2016.02.10
.kibana

Parsing output with multiple whitespace

Author

Tan-Vinh Nguyen