Discussion

Very informative article!

I’m wondering how the HTML/HTTP response gets transformed into plain text. Presumably, there’s a preprocessing step that extracts the content from the page. I’m curious to understand the limitations of that preprocessing.

Tristan · QuestionsSuggests · · Nov 14, 16:19

1 reply

Yes. Very similar to that of Google’s and many other RAG solutions out there.

Dan Petrovic · Expands · · Nov 14, 21:06