What is handinger?
Handinger is an API to fetch information from the Internet. It is built for product builders that have reliable and cheap scraping needs.
Some months ago, I was building the bookmark manager for fika.bar, so I needed a way to fetch metadata, screenshots, and the markdown content of websites. "Easy", I thought. But weeks later, I was building data pipelines, proxy rotation schemes and TLS fingerprinting. Not fun, and really not what I want to spend my time on: My product was a bookmark manager, not a data extraction service, and yet, most of my efforts ended up there!
It turns out that the Internet has become a difficult place to extract information from. Most websites have implemented measures against it, which makes building products like fika.bar more difficult and expensive than it should be.
Handinger is my attempt to fix this. I extracted the work I've done for fika.bar and built it into a general purpose data extraction API.
My hope is that people building products like I was, have it easier, without breaking the bank.
What features does it have?
Html
This is the most basic extraction. You give Handinger a URL and it will return the HTML of the page. This is the base of the rest of the product and it basically attempts many ways to fetch the page, from cheaper to more expensive.
Markdown
This is a variant of the html extraction. Instead of returning the raw html, it will return the markdown of the page. This is useful if you want to extract the content of a page without the html structure. A lot of people need this to train LLMs or to incorporate in their tools-for-thought (obsidian et al).
Metadata Extraction
I've built a service that can extract the most common metadata from websites (title, description, rss feeds, image...). It is based on the amazing metascraper with some custom additions.
Image
Finally, you can take screenshots from a page. This can be used to generate thumbnails or build any product that needs a visual representation of a page.