Web scraping in Google Sheets

Web scraping in Google Sheets

I want to show you an example of how you can use Handinger to scrape websites and extract structured data directly in Google Sheets.

With the launch of Handinger 2.0, we’ve added a new /llm AI endpoint and the ability to use the service in no-code environments. Let’s do it!

Example Problem

Imagine you are the CEO of a SaaS company and you want to know more about your customers. You are tech savy, but you don’t really know how to code. Your team is focused trying to get an important release out of the door, so the last thing you want is to ask them for help. Here is what you are going to do:

  1. Get the company website address from a list of 1.000 customers.
  2. Use Handinger to take a screenshot of their company website.
  3. Use Handinger to get a small summary of what that company does, and also their industry.

Step 1: Get the company website addresses

We are going to open Google sheets and create a first column (A) for emails. Easy.

Then we are going to use something called “regex” to extract the company website address from the customer email address. They surely look scary, but fear not: AI is pretty good at them, and even the more seasoned developers will prompt their way out.

Create a second column (B) with the following code:

= "https://" & REGEXEXTRACT(A2, "@(.+)$")

This literally means: “http://” + “whatever comes after the @”. Easy!

step 1 getting the website address

Step 2: Take a screenshot

Now we are going to use Handinger to take a screenshot of their company websites. It will take less than 30 seconds, trust me.

Log in to your Handinger account and copy your Api Key. You can find it on any of the tabs. This is the key that identifies your account.

Create a new column (C) and use this code, replacing YOUR_API_KEY_HERE with your actual API key:

= IMAGE(
    IMPORTDATA(
        "https://api.handinger.com/image?response_type=link&api_key=YOUR_API_KEY_HERE&url=" & A2
    )
)

This looks scary, but it’s actually not. It’s just a formula that says: “fetch information from the URL, and use the answer to render an image”. Since we are using response_type=link handinger returns the link to the screenshot, which Google Sheets can use to render an image.

Feels like magic, isn’t it?

step 2 take a screenshot

Google Sheets has a limit of 300 requests per minute. So you might have to wait a bit before extracting the whole 1.000 customer websites.

Step 3: Use Handinger to get a small summary of what that company does, and also their industry.

Now we are going to use Handinger AI endpoint to ask a questions to our customers’ websites. This is the most fun part, and you can get very creative with it.

Create one last column (D) and use this code, replacing again YOUR_API_KEY_HERE with your actual API key:

= REGEXEXTRACT(
    TEXTJOIN(
        " ",
        TRUE,
        IMPORTDATA(
            "https://api.handinger.com/llm?api_key=YOUR_API_KEY_HERE&" &
            "prompt=Summarize+the+content+of+this+website&" & 
            "api_key=01JXHRCKAPH6ME3FZVHHATTP7N&url=" &
            B2
        )
    ),
    """result"":""([^""]+)"""
)

Ok, this one is a bit more complex. Let’s break it down:

  • We already know IMPORTDATA: it’s a function that fetches data from a URL. We are using it to fetch the result from the /llm endpoint.
  • We can see there is a prompt=Summarize+the+content+of+this+website. This is the prompt we are going to use to ask the LLM what to do. You can ask anything! but you’ve probably noticed that spaces are ”+”, this is because you can’t use spaces in URLs. If you are not sure, just login to Handinger and play with the prompt there, or just use this tool to encode the prompt.
  • We are using TEXTJOIN to concatenate the response from Handinger. This is because the response may contain multiple lines, and we want “flatten” it into a single line.
  • Lastly, we use our good old friend REGEXEXTRACT to extract the result from the response. This is because the response is a JSON object, and we want to extract the result property.

And that’s it! You can now see the results in your Google Sheets and easily tweak the prompt to get other kinds of information.

step 3 prompt a summary

Conclusion

This is just a simple example of how you can use Handinger to scrape websites and extract structured data directly in Google Sheets. You can get very creative with it, and you can use it to automate a lot of tasks.

Maybe I lied a little bit that you didn’t have to code, since things got a bit nasty at the end. But we got pretty far with very little effort!

About Privacy Terms of Service Changelog Blog