Merge Pandas DataFrame with Nested Dictionary

Not an avid Pandas or Numpy user myself, but I had to spend some time lately to optimise probably a fairly common process: looking up a nested dictionary (2 or more levels) to find the right values element-wise for a column in a Pandas DataFrame. If it’s not clear, the problem I’m trying to solve here is to optimise a look-up function that can be done with .apply() to something more performant.

You might say, why not using .map()? Because the look-up function is not y = f(x), no, it is more like y = f(x, a) or even y = f(x, a, b), depending on the level of nestedness.

As mentioned earlier, this can be implemented with .apply() by supplying a Python function that does the look-up. However, .apply() is very slow (it’s not vectorised). The solution here is actually straightforward (I’m very new to Pandas and it took me some time to get here so I decided to make a note here for this). We first flatten the nested dictionary to have different levels of keys as tuples, which allows us to create a DataFrame with MultiIndex. With MultiIndex, we can easily apply .merge to join the DataFrame objects.

Hopefully the code snippet is more understandable.

import pandas as pd

nested_dict = {
    "A": {
        "Apple": "Red",
        "Banana": "Green",
    "B": {"Apple": "Green", "Banana": "Yellow"},
df = pd.DataFrame.from_dict(
        "Fruit": {0: "Apple", 1: "Banana", 2: "Banana"},
        "Price": {0: 0.911, 1: 1.734, 2: 1.844},
        "Bucket": {0: "A", 1: "B", 2: "A"},

# Method 1: .apply()
# Apply Python function element-wise, as slow as a regular for loop
df1 = df.copy()
df1["Color"] = df1.apply(
    lambda row: nested_dict.get(row["Bucket"], {}).get(row["Fruit"]), axis=1

# Method 2: .merge()
# Vectorized, much faster (even though the big O is the same)
df2 = df.copy()
# The only overhead is to construct a dataframe with 'MultiIndex'
nested_df = pd.DataFrame.from_dict(
        (inner_key, outer_key): value
        for outer_key, outer_value in nested_dict.items()
        for inner_key, value in outer_value.items()
nested_df.index = pd.MultiIndex.from_tuples(nested_df.index)
nested_df.rename(columns={0: "Color"}, inplace=True)
df2 = df2.merge(nested_df, how="left", left_on=("Fruit", "Bucket"), right_index=True)

Visual Studio Code Server on Android

Microsoft has been steadily enhancing Visual Studio Code Remote Development. One of the components is VS Code Server, which is also open-source and hosted at GitHub. With VS Code Server, you can use Visual Studio Code in a browser. The usual setup involves a server (a VPS perhaps) that hosts the code server. It turns out that it’s also possible to run them locally on your Android device! Here is how.

Continue reading “Visual Studio Code Server on Android”

Announcing YapStocks 2.0

Over the last few weeks, I’ve been working on a new plasmoid (KDE Plasma Applet) that provides a simple interface to monitor stocks. The first version was rather basic, being able to show the current market price only. Now it’s time to announce the availability of the second iteration of YapStocks (Yet Another Plasma Stocks Applet). I’ve recorded a short video clip showcasing all the features it has, ranging from the information summary to the historical price chart.

Continue reading “Announcing YapStocks 2.0”

Use npm packages in QML

I’ve been trying to code up some nice GUI for a hobby project which was done in JavaScript (Node.JS). I’ve looked at a few options that I have

  1. Use the not-yet-stable NodeGUI
  2. Go Web and use Electron
  3. Rewrite the core in Python or C++ to use Qt
  4. Use QML which has limited support for JavaScript

I’ve explored the option 1, however, I soon ran into the problems with the Model/View/Delegate architecture which means I would have to implement native plugins/add-ons in order to use ListView. Not to mention the framework itself is still heavily under development.

As for option 2, I’m not a web frontend engineer and personally I much prefer something that is native (or looks and feels native at least). For the third option, it feels a bit overkill but it is a possible way out.

Luckily I don’t have to do the re-implementation, because I’ve managed to get the core functionality bundled into a single JS file which works flawlessly in the QML environment. Before I start diving into the details on how you can make your npm packages work in QML, I have to emphasise that there are many limitations in the QML environment and it’s very likely that only a small subset of the npm package that you’re interested in is going to work.

Continue reading “Use npm packages in QML”


研究生只念一年的坏处就是毕业设计好像变成一年一度的了……这次毕设是和并行计算有关(毕竟念的是「高性能计算硕士」),多线程是不够的,因为一台设备的CPU核心数毕竟有限,所以多进程的并行计算才能发挥计算机集群(HPC cluster)的计算威力。这方面的de facto standard就是MPI了,而在C++项目中可以通过Boost库的MPI包装更方便、「更C++」的来调用。Boost库的质量和重要程度个人感觉仅次于STL了,看看C++11吸收了大量Boost库进入STL就知道Boost有多厉害。

和Serialization的关系?既然用C++,就免不了自定义类吧,想要把一个类的实例通过MPI发送到其他MPI节点上,首先就要把类进行serialize,然后把serialized memory发送出去,接收方再unpack还原成一个实例。就不展开说了,简而言之这一点和MPI的通信原理有关。

Continue reading “也谈Boost::Serialization的用途和用法”




Continue reading “libQtShadowsocks项目介绍和备忘”



我的毕设项目是开发一个材料微观组织图像的分析软件,说直白点,主要功能就是数晶粒、测晶粒度级别数的,还能测多孔材料的孔隙率,第二相百分比等。软件名称是Computer-Aid Interactive Grain Analyser,简称CAIGA,名字没想太久,随便取的。。。

Continue reading “材料微观组织图像分析软件即我的毕设项目开源”


Lambda表达式(又称Lambda函数,英文原文是Lambda Expression),是C++11的新特性中非常实用的一个。



Continue reading “浅析C++11的Lambda表达式”


每次给这类文章取名字都很烦,一不小心就又臭又长了!>.< ...... 下面转入正(cai)题(guai) 相信大家一开始也和我一样,用QLabel来充当图像的显示控件,不过应该很快就会发现QLabel显示出来图像后,如果再改变父级控件的大小,此时QLabel的图像不会跟着变大而是保持原来的大小。更为糟糕的是,父级控件无法缩小了,因为QLabel的图像不会自动缩小,限制了父级控件的minimumSize! 其实是一个很囧的问题,网上一搜会发现几年前就有人提出怎么没有一个专门的QImageLabel啊?可能priority太低吧……Anyway,下面介绍通过自定义类(继承QWidget)实现一个可以自动缩放图像的控件。 Continue reading “自定义QWidget类使Qt图像控件能自动缩放”

让Qt Designer设计的Widget随MainWindow增大而增大



通过Qt Creator中的Designer(或者独立的Qt Designer程序)设计Qt程序外观的时候,里面添加的小部件(Widgets)不能随着主窗体(MainWindow)的增大而增大。


Continue reading “让Qt Designer设计的Widget随MainWindow增大而增大”