Tackled a problem of developing a web crawler architecture

Tomoharu Tsutsumi

2 min readJan 27, 2024

Hi, architects. I learned a new problem of architecture and summarized what point was good and bad.

The problem

system-design-primer/solutions/system_design/web_crawler/README.md at master ·…

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. …

github.com

My first answer

The ideal answer

system-design-primer/solutions/system_design/web_crawler/README.md at master ·…

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. …

github.com

Reverse Index Service: This service manages the inverted index, which maps keywords to the documents that contain them. It is used to quickly look up which documents are relevant to a given search query.

Document Service: This service manages the retrieval of documents. Once the Reverse Index Service identifies which documents are needed, the Document Service fetches the actual content of those documents.

There were some mistakes.

・Database was not necessary because saving indexes and snippets was enough to satisfy the specification.

・Didn't know what is reverse indexes(like below).

Standard Index:
  ・Document1: [word1, word2, word3]
  ・Document2: [word2, word4]
Reverse Index:
  ・word1: [Document1]
  ・word2: [Document1, Document2]
  ・word3: [Document1]
  ・word4: [Document2]

・Thought index service would be included in the web crawler.

・Didn't include the document service and queues.

・CDN is not necessary because static there are not static contents in this specification.

Tackled a problem of developing a web crawler architecture

The problem

system-design-primer/solutions/system_design/web_crawler/README.md at master ·…

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. …

My first answer

The ideal answer

system-design-primer/solutions/system_design/web_crawler/README.md at master ·…

Learn how to design large-scale systems. Prep for the system design interview. Includes Anki flashcards. …

Feel free to reach out to me on LinkedIn, which you can find below. Looking forward to connecting!

https://www.linkedin.com/in/tomoharu-tsutsumi-56051a126/

Written by Tomoharu Tsutsumi