Christian Hollinger

Software Engineering, GNU/Linux, Data, ML, and other things

12 Jun 2018

Analyzing Reddit’s Top Posts & Images With Google Cloud (Part 1)

3087 words, ~12 min read

In this article (and its successors), we will use a fully serverless Cloud solution, based on Google Cloud, to analyze the top Reddit posts of the 100 most popular subreddits. We will be looking at images, text, questions, and metadata...
18 Mar 2018

Analyzing Twitter Location Data with Heron, Machine Learning, Google's NLP, and BigQuery

3487 words, ~13 min read

In this article, we will use Heron, the distributed stream processing and analytics engine from Twitter, together with Google’s NLP toolkit, Nominatim and some Machine Learning as well as Google’s BigTable, BigQuery, and Data Studio to plot Twitter user's assumed location across the US.
04 Nov 2017

Data Lakes: Some thoughts on Hadoop, Hive, HBase, and Spark

4958 words, ~19 min read

This article will talk about how organizations can make use of the wonderful thing that is commonly referred to as “Data Lake” - what constitutes a Data Lake, how probably should (and shouldn’t) use it to gather insights and why evaluating technologies is just as important as understanding your data...
06 Mar 2017

(Tiny) Telematics with Spark and Zeppelin

2881 words, ~11 min read

How I made an old Crown Victoria "smart" by using Telematics...
02 Dec 2016

Storm vs. Heron – Part 2 – Why Heron? A developer’s view

1841 words, ~7 min read

This article is part 2 of an upcoming article series, Storm vs. Heron.
15 Oct 2016

Storm vs. Heron, Part 1: Reusing a Storm topology for Heron

787 words, ~3 min read

This article is part 1 of an upcoming article series, Storm vs. Heron.
04 Sep 2016

Update an HBase table with Hive... or sed

1952 words, ~7 min read

This article explains how to edit a structured number of records in HBase by combining Hive on M/R2 and sed - and how to do it properly with Hive.