Filtering Out Library Staff Web Use in Google Analytics

As more and more libraries strive to achieve a data-driven culture, it's important to ensure that the data we're relying on is reporting what we think it's reporting about our users. While library staff are users, too, they're superusers, and their behavior in our online environments is not typical of the majority of our customers. So, how can we be sure that we're getting accurate data from web analytics software reports to improve the end user experience? The answer is filtering staff use out of our web site analytics data. In this post, I'll walk through this, step-by-step, in Google Analytics (GA) since it's widely adopted in libraries. If you use another web analytics product, never fear, this general method of determining IP ranges, and then adding profiles and filters may still work-but obviously, the step-by-step part will be a bit different.

The general process, in three steps, is to:

  1. Determine an appropriate range of staff IP addresses to exclude;
  2. Establish a master profile in Google Analytics and create a new profile to filter;
  3. Set up a filter that will exclude the IP range of end users on your new profile.

Assumptions:

  • You have Google Analytics tracking code installed on your site(s), OR you know how to and can set it up before you start the instructions below;
  • You have administrative access to the library's Google Analytics account, OR you can bribe someone who has admin access with baked goods to make changes for you.
  • If you need that bribe, you're a good baker OR know the location of a good bakery.

Finding a Useable IP Range to Filter

First and foremost, you'll need to determine if you can effectively filter out library staff use. This can be tricky because analytics software divines the "where?" information about user locations from computer IP addresses. Depending upon your library's network architecture (and also how it fits in to your parent institution, if you have one) and how IP ranges are defined for all of your employee and public workstations, it may not be possible to effectively filter out staff use in a way that provides clear, end-user-only, data. Let's consider two examples, a "good" one and a "better" one.

Good: Internal Library Use, External Library Use

At my previous workplace, a medium-sized academic library, we used DHCP to dynamically assign IP addresses to library staff computers as well as the computers in the library's lab and classroom areas. (Aside: DHCP makes managing the network much, much easier for network admins because it's much more efficient than assigning static IP addresses to each computer. It can really throw a wrench into the analytics process, though.) We could have set up a filter on this range of staff + lab + classroom IPs; however, this would have only separated in-library use from out-of-library use, as opposed to clearly drawing the line between library staff and library users. Looking back, this likely would have been good enough to provide some insight, but at the time, I waffled and didn't bother setting up a filter because it seemed so imprecise. However, no analytics reporting, due to limitations of how the data is generated, collected, and processed, is 100% accurate. That doesn't make our quest for user-only web data hopeless; we just need to understand how we're defining what we're asking the analytics software to give us. Had I gone ahead and set up an internal versus an external filter, I probably could have safely assumed that data collected in the library was a pretty close approximation of how the library's site is used by staff, since that activity also includes public service points and classrooms, and that the data collected on traffic external to the library was a pretty close approximation of student and faculty use.

Better: Staff Use, Public Use

At my current job, a medium-sized public library district with nine branches, our network engineer has internal versus external traffic for the whole district separated on two completely different networks-and therefore I have two distinct IP ranges. Hooray! Because of this, we were able to quickly and easily filter out staff use so that we would have a more accurate picture of customer use of the library's web site. All we had to do was add a filter to a new profile in Google Analytics that excluded the range of our internal network, which would remove staff traffic from our web statistics, giving us a more accurate report of end-user-only traffic. Even this, however, is not perfect; if I'm working from home, the IP that GA collects from my visit is going to be from Comcast (our home internet provider), so the data from my visits will show up in with the other users, not staff.

So, for starters, grab your network folks (if you need to) and ask, "Hey, we'd like to filter out staff use of our library web site(s) in our web statistics software. Could you help us determine an IP range that separates staff versus public web traffic?" If this isn't possible, remember, internal versus external use (or even on-campus versus off-campus in an academic setting) is a close second.

Setting up a Master Profile and Creating a Profile to Filter

Once you've arrived at a range of staff IP addresses to exclude in a filter, we're ready to make changes in Google Analytics. (Remember, you must have admin access, or a friend with admin access and a sweet tooth, to create profiles and filters.) One very important thing to know about filters in GA, however, is that they're "destructive": this means that once a filter is added, the information that you're asking to be omitted via the filter will no longer collected on your GA account for that web site. Well, you might be thinking, that's less than ideal; while I'd really like to see where my users are going versus my colleagues, I still really DO want all of the data about our site's use! Enter profiles, which are "defined view[s] of visitor data from a property." In GA, you can have up to 50 profiles per site that you track, and your account already has one by default. We'll start small and just add one, or optionally two, for our project at hand. The first step is to establish a master profile and then create a filtered profile. The master profile will collect all of your data, from everyone everywhere, all of the time, and once we add a filter to the fancy new filtered profile, it will be the source of our delicious, delicious, user-only analytics data.

The Master Profile

So, head on over to the GA site and log in. Maybe this goes without saying, but if there are multiple profiles already established, it would be a good idea to chat with your colleagues first before renaming them and/or creating new ones. If you've done that due diligence, or if you don't need to because you're the boss, go ahead and click the "Admin" button in the upper right corner. (If you track more than one site and have multiple web properties, you may need to click on the one you'd like to filter here.) Let's rename the default profile-assuming that you only have the one that automatically came with the account, and that it is in no way filtered-something descriptive like "Master Profile" to establish it as the master. Click "Profile Settings" in the second tab row, and in the "Profile name" field, type in "Master Profile" (or the name of your choosing.) Leave the rest of the form as-is, and click the "Apply" button, and behold, you have a master profile!

renaming the default profile to be the master profile

If the default profile is already in use by someone else, or if you're not sure if it is, follow the instructions below to set up a new profile to use as the master, and then create a second one to use as your filtered profile.

The New, Soon-to-be-Filtered Profile

Now that we have a master profile set up to catch all of our data, all of the time, it's time to create a new profile to filter. If you're still on the "Property Settings" screen, click on the "Profiles" tab. See the "+New Profile" button sorta to the center/right? Click it.

create a new profile in google analytics

Then, enter a descriptive name for the profile in the "Profile Name" field, such as "Public Use, No Staff." Click the "Create Profile" button. Easy, yes?

name your new profile in google analytics

Adding the Filter

Now, we're ready to add a filter to that new profile! You should still be on the page for your newly created profile; make sure that it's the one selected in the "Profile" dropdown, and then click on "Filters," and after that, the "New Filter" button:

add the filter to your new profile in google analytics

add the filter to your new profile in google analytics

Fill out the form (which is dynamic and will change as you enter data), typing whatever you like for the filter name ("Public Use (No Staff)" here); changing the "Filter Type" to "Custom filter"; selecting "Exclude" from the list of options; setting the "Filter Field" to "Visitor IP Address" from the dropdown.

define the filter to your new profile in google analytics

"What about the "Filter Pattern"?" you ask? GA has a handy dandy "IP Address Range Tool." Open it in another tab or window from GA, and type in the first and last IP addresses in the range you established earlier, and click the "Generate RegEx" button. Copy and paste the results you get into the "Filter Pattern" field back on your GA screen. (If you have more than one range of IP addresses, read "More tips on IP address filtering" below the IP Address Range tool.)

Important: double check just to make sure you added the filter to the new profile, not your master profile!

If you have more than one site, lather, rinse, and repeat until you have profiles and filters set as you wish on all of them. You could also create a "Staff Only" filtered profile if you'd like to track staff only use of the web site; just follow the above steps to create another profile and filter to add to it, but use an "include" custom filter type when creating the filter on the new "Staff Only" profile. If you've set up custom reports, or would like to, you can set them on your filtered profile(s) if you'd like to focus on user-only or staff-only use.

Success!

You'll be tempted to jump right over and check out what your site data looks like with library staff data omitted, but, unfortunately, you'll have to wait-just for a day, though. Filters collect data going forward from the time they were set-another reason that the master profile is important! The master profile will retain all of your historical data, and going forward, the filtered profile will slice out the user-only data for you.

If other colleagues have access to Google Analytics, be sure to communicate the changes you've made to them so that they understand the purpose of the profiles, and which would best suit their needs. For example, even though our web team will be using (as well as setting up customized reports on) the user-only filtered profile for most of our needs, administration will still likely want to report the statistics to the library's Board from the master profile, as it's the sum total of web use.

So, in summary:

  • Figure out if you have an effective range of IP addresses that you can include or exclude from GA's data collection.
  • Set up a master profile in Google Analytics that will track ALL of your user data, both library staff and users.
  • Create a filtered profile for public users.
  • Optionally, create a second filtered profile for staff use.
  • Wait for the separated user/staff data to roll in!

Have you used profiles and filters in your library to get at juicy user-centric data? Share in the comments! Questions, corrections, suggestions, and general comments welcome, too!

Comments

Great post, thanks so much for sharing this info!

One question--doesyour internal IP range include publicly available computers in your building? Would you be excluding public use inside your library with this filter?

Thanks for your comment, Karen! It all depends on how your network is set up and how IP ranges are assigned; if your network folks put them in the same IP range as staff, it's unfortunately going to be a mix of internal public patron and staff use. In the "Good: Internal Library Use, External Library Use" example above, that was the case. I don't know for sure, but I think that will tend to be the case with many academic libraries and how they fit into the campus network architecture, whereas publics (especially those with their own IT departments) may be more likely to separate internal and external traffic. However, I think it's still worth seeing how the site is used on-campus versus off, even if you can't clearly delineate staff from public use. Does that answer help?