0%

前言

最近在Clubhouse上参与一个《Designing Data-Intensive Application》(后文略称DDIA)的读书讨论。最直观的感受是输入加输出才是最好的学习模式,之前光是凭借阅读的输入,书上的东西的确是学到了,但是因为没有相应的实践机会来刷新这些知识,过一段时间就模糊不清了。所以想着借着这个机会重新开始更新博客(距离上一次更新似乎已经快一年了),一个是通过写写东西帮自己加深理解,另外是希望有机会通过讨论来进一步迭代。

前几天聊到DDIA中的batch processing章节,这章我之前看的时候是略过的(因为觉得对MapReduce模型派生出的batch processing已经相对了解了),这次完整看下来觉得还是很有帮助,特别是后一部分提到了dataflow model。而在我之前对《Streaming Systems》(后文略称SS)的理解里,只是把dataflow作为流处理的一个抽象模型。这次重新把Google Dataflow论文和SS过了一遍,有了一些新的理解。

Read more »

Chapter 3 Paradigm Overview

这一章介绍了三种编程范式structured programming, object-oriented programming以及functional programming,并对三者进行了总结:

  • Structured programming imposes discipline on direct transfer of control.
  • Object-oriented programming imposes discipline on indirect transfer of control.
  • Functional programming imposes discipline upon assignment.
Read more »

Both Spark Core and SQL support fundamental types of joins, but the general difference between core Spark joins and Spark SQL joins is that the optimizer in core Spark isn’t able to re-order or pushdown filters, so we need to think about the ordering of operations when applying core Spark joins.

RDD Partition

In order to perform join on RDDs, the two RDDs should share a partitioner, otherwise, they will need to be shuffled.

Read more »

Main Concepts: RDD, transformation/action functions, dependencies, SparkContent, jobs, stages, tasks

Transformations and Actions

Spark abstracts large datasets as immutable objects called RDDs. There are two types of functions defined on RDDs: actions and transformations. Basically, transformations just set up the computational graph. Transformation functions take in RDDs and return new RDDs for further processing. The real computation will be performed only when there is an action been applied to RDDs. This mechanism is called lazy evaluation.

Read more »

问题简介

最近在写microservice的课程project时尝试使用了JavaScript的“async/await”语法来实现一些asynchronous function,这样的写法相较于使用promise更容易且代码可读性更高(虽然用promise的写法已经很易读了)。

但是在对比这两种写法时,我注意到了一点,当不使用async关键词时,function直接return一个promise,在调用这一function时可以用await得到结果;而当使用async关键词时,在function中通过await从一系列asynchronous操作得到结果,最后return这一结果。具体例子如下:

Read more »

问题简介

出于自学Django框架的目的,我在刷完了Django for Beginners后开始尝试搭建博客。因为沿用了Django for Beginners一书的格式,我用Django.views中的view class来创建subclass,并在urls.py中用as_view来生成相应的页面。但是当想在view class中进行筛选操作时遇到了问题。

比如当我想实现传入一种文章类别来筛选出属于该类别的文章时,我需要传入该类别的key,通过这个key筛选出属于该类别的文章后将其数据传出,随后利用模版对其进行渲染,对于function-based的实现来说,类别的id作为参数传入是非常直观的,但是对于class-based的实现,获取id的地方并不是那么的一目了然。

Read more »