Merge Pandas DataFrame with Nested Dictionary

Not an avid Pandas or Numpy user myself, but I had to spend some time lately to optimise probably a fairly common process: looking up a nested dictionary (2 or more levels) to find the right values element-wise for a column in a Pandas DataFrame. If it’s not clear, the problem I’m trying to solve here is to optimise a look-up function that can be done with .apply() to something more performant.

You might say, why not using .map()? Because the look-up function is not y = f(x), no, it is more like y = f(x, a) or even y = f(x, a, b), depending on the level of nestedness.

As mentioned earlier, this can be implemented with .apply() by supplying a Python function that does the look-up. However, .apply() is very slow (it’s not vectorised). The solution here is actually straightforward (I’m very new to Pandas and it took me some time to get here so I decided to make a note here for this). We first flatten the nested dictionary to have different levels of keys as tuples, which allows us to create a DataFrame with MultiIndex. With MultiIndex, we can easily apply .merge to join the DataFrame objects.

Hopefully the code snippet is more understandable.

import pandas as pd

nested_dict = {
    "A": {
        "Apple": "Red",
        "Banana": "Green",
    },
    "B": {"Apple": "Green", "Banana": "Yellow"},
}
df = pd.DataFrame.from_dict(
    {
        "Fruit": {0: "Apple", 1: "Banana", 2: "Banana"},
        "Price": {0: 0.911, 1: 1.734, 2: 1.844},
        "Bucket": {0: "A", 1: "B", 2: "A"},
    }
)

# Method 1: .apply()
# Apply Python function element-wise, as slow as a regular for loop
df1 = df.copy()
df1["Color"] = df1.apply(
    lambda row: nested_dict.get(row["Bucket"], {}).get(row["Fruit"]), axis=1
)
print(df1)

# Method 2: .merge()
# Vectorized, much faster (even though the big O is the same)
df2 = df.copy()
# The only overhead is to construct a dataframe with 'MultiIndex'
nested_df = pd.DataFrame.from_dict(
    {
        (inner_key, outer_key): value
        for outer_key, outer_value in nested_dict.items()
        for inner_key, value in outer_value.items()
    },
    orient="index",
)
nested_df.index = pd.MultiIndex.from_tuples(nested_df.index)
nested_df.rename(columns={0: "Color"}, inplace=True)
df2 = df2.merge(nested_df, how="left", left_on=("Fruit", "Bucket"), right_index=True)
print(df2)

運用Termux將Android平板電腦變身程式設計利器

去年就把手上的Chromebook在eBay賣掉,加了點錢在Amazon買了一台華碩的7英吋平板電腦。上禮拜買了羅技的藍牙鍵盤,這下和平板電腦配合在一起就真的能當台小電腦💻用了。所以有琢磨著怎樣可以在不方便攜帶筆電的情況下用來旅途中練習一些算法題。

熟悉程式設計的(我也希望都是在UNIX環境下的)或多或少可能有過想在平板電腦上偶爾練練手的想法,Android畢竟是基於Linux設計的,加上系統多年的更新換代發展,灌上一個Linux的工作環境應該不會太難,而且將會是一個非常大的效率提升。不過如果真的灌一個完整的Linux環境,一來是佔用很大的磁盤空間,二來電力損耗估計也是會讓人頭疼。好在我在Google馬上找到了Termux,安裝步驟簡單到不行,直接在Google Play點選安裝即可。

Continue reading “運用Termux將Android平板電腦變身程式設計利器”

用Calibre导入Kindle电子书并去除DRM保护

本文的目的绝非为了盗版之用,而是很多美国亚马逊买的Kindle设备(如Kindle 4、Kindle 5)无法在中国亚马逊里注册,不能通过正常的推送渠道购买正版电子书,因此这是一个万不得已的做法,在中国亚马逊买书然后破解再传到美版(或日版)Kindle里。

需要下载的软件有:CalibreDeDRM插件Kindle for PC(或者for Mac

Continue reading “用Calibre导入Kindle电子书并去除DRM保护”

少年,在Android平板上编程吧(Python篇)

  1. 在Play商店搜索安装DroidEdit,当然你要是下载盗版的DroidEdit Pro我也无所谓。
  2. 然后下载并安装Python3ForAndroid (是时候从Python2转到Python3了)
  3. 确认平板联网,在平板上打开Python3forAndroid,等它获取最新的版本后,点击Install按钮然后会自动下载一些压缩包并解压安装好Python3 for Android(我装了之后Python的版本是3.2.2)
  4. Play商店安装Terminal(Linux下用终端习惯了,不装这个还真不爽)
  5. 下载bash (XDA帖子附件) ,解压,将里面的bash用R.E.管理器复制到/system/bin/下,并修改权限使其可执行。
  6. 打开Terminal,设置终端shell使用bash而不是系统内置的sh
  7. 如果有空,可以安装一个完整版的busybox,替换/system/xbin下的阉割版,并用alias命令让chmod、cp、mv、ls等常用的Linux命令改成完整的busybox版而不是Android阉割版。
  8. 最后一步比较重要,关系到能否在终端下简单的运行Python命令,下载我给的这个文件(盛大网盘已死)
  9. adb push python3 /system/bin
  10. 然后用R.E.管理器修改python3权限使其可执行

在Terminal里测试python3命令吧,如果出现下图所示界面就成功了。图已破,推他已死。