Crawling Notion database: mouse hover action
I got an idea using date crawled from Notion open database. To make crawler, the logic is quite simple.
- open database page link
- click each page and crawling inner contents
- save
I’m using python selenium for this crawling task, and the link of github repo is shared under this article. The hardest part is 2nd step. To click each page, I have to implement mouse hover action using selenium action chain.
Mouse hover action can be implemented by various ways but I choose using selenium action chain
Implement mouse hover action using Action Chain
ACC =
ActionChains(web_drvier)
ACC.move_to_element(title_block).perform()
( ‘title_block’: means notion database block contains title information)
Table_re = driver.find_element_by_css_selector('#notion-app > div > div.notion-cursor-listener > div > div.notion-frame > div.notion-scroller.vertical.horizontal > div:nth-child(3) > div > div > div.notion-selectable.notion-collection_view_page-block > div:nth-child(3)')
→ when mouse hover action performed, html page is changed. So selenium driver should re-parsing the database table.
a_block = Table_re.find_element_by_css_selector('a')
→ After the mouse hover action performed, a-tag block with inner notion page link should be revealed. Pasring it.
Crawling inner notion page
When crawling notion page contents, the hardest part is ‘click toggle’. In this time, I don’t need a perfect crawler so detouring is the best choice I could ever have.
This part, I use a simple line of code.
Content = driver.find_element_by_class_name('notion-page-content').text
Using the code lines above and assembling it properly with error handling code and for loop, you could replicate this easily. Let me know through pull request when you make a better Notion database crawler.