最近在做一个基于千万级日志数据的汇总统计。由于数据是json结构,用shell脚本处理时,需要从单条数据上同时读取多字段,没找到合适的办法。于是google了一下,找到了[jq](https://stedolan.github.io/jq/manual/)。安装步骤很简单,不再赘述。
###日志格式
```
{"h_lang":"zh","h_plat":"1","h_city":"潍坊市","h_akc":"49","pt":"100030","h_prov":"山东省","h_tz":"GMT 08:00","h_did":"8620010133189528","h_loc":"118.792279,36.875343","h_ak":"xbad3a","h_akv":"1.1.0","now":"2019-05-09 00:02:29","pi_func":"100030","h_act":"wifi","h_osv":"8.0.0","h_uid":"gsjk_6103379","pi_ext_title":"首页","ip":"17.192.174.32","h_chan":"p_on_common","h_ac":"wifi","h_appId":"1","h_reso":"1080*1808","h_dm":"MHA-TL00","pi_ot":"2019-05-08 23:54:30","h_ds":"2","h_cpu":"hi3660","h_cty":"CN","h_mc":"D8:C7:71:4B:6F:BB"}
```
###使用示例
```
zcat business/*.gz |jq -r '[.h_uid,(.pi_func|tostring),(.pi_ext_type|tostring),(.pi_ext_content|tostring),(.now)[0:10]]|join(",")'|sed 's/gsjk_//g' | awk -F ":" '$1>6000000&&$1<7000000' > all.log
```
### 字段读取
使用. 具体的字段名称
###tostring
将整数转换成字符串
### 多字段读取
使用,号作为多字段分隔符
### 多字段拼接成字符串
使用[]将多字段转成数组,再用join将数组转成拼接字符串
```
|join(",")
```
### 字符串截取
```
[star:end]
例如:(.now)[0:10]
```
###输出结果
```
6000460,100001,2,null,2018-11-28
6000460,100003,null,null,2018-11-28
6000460,100068,null,null,2018-11-28
```