Mastering Spark for Data Science
上QQ阅读APP看书,第一时间看更新

Summary

In this chapter, we walked through the full setup of an Apache NiFi GDELT ingest pipeline, complete with metadata forks and a brief introduction to visualizing the resulting data. This section is particularly important as GDELT is used extensively throughout the book and the NiFi method is a highly effective way to source data in a scalable and modular way.

In the next chapter, we will get to grips with what to do with the data once it's landed, by looking at schemas and formats.