undefined

points

by nchmy14 hours ago |

comments

by rennokki12 hours ago|

[-]

> Uses jq for TB json files

> Hadoop: bro

> Spark: bro

> hive: bro

> data team: bro

by eevmanu7 hours ago|

parent|

[-]

made me remember this article

<https://adamdrake.com/command-line-tools-can-be-235x-faster-...>

  Command-line Tools can be 235x Faster than your Hadoop Cluster (2014)

  Conclusion: Hopefully this has illustrated some points about using and abusing tools like Hadoop for data processing tasks that can better be accomplished on a single machine with simple shell commands and tools.

by f311a9 hours ago|

parent|

prev|

[-]

JQ is very convenient, even if your files are more than 100GB. I often need to extract one field from huge JSON line files, I just pipe jq to it to get results. It's slower, but implementing proper data processing will take more time.

by anonymoushn9 hours ago|

parent|

prev|

[-]

are those tools known for their fast json parsers?

by 10 hours ago|

parent|

prev|

[-]

deleted

by szundi13 hours ago|

prev|

[-]

[dead]