A fascinating story unfolded this week, as Yale University went to war against a website that scraped the Yale University website. But first a quick history.
Years ago, some students at Yale built a website that presented information about Yale courses. It did this by scraping the University’s official course website, but presented this information in a way that was easier to use. They called the site “Yale Bluebook”. No, this is not the site with whom Yale went to war this week. Instead they rewarded it. In 2012 Yale purchased a license to the site and took it in-house.
Two different students had been working on what they thought was a better version of the site, which they called Yale Bluebook Plus (YBB+). It allowed users to sort courses by the average student ranking and workload. This information was available on the original website, but there was no easy way to directly compare it. But the university did not like giving the students the ability to directly compare courses by student ratings, so they blocked YBB+ on the university network.
Instead of the university giving the real reason they were blocking the website, they told the website’s creators that they objected to YBB+ using their trademarks. This was a bit silly, since the original YBB was started by other students who did exactly the same thing. But the students dutifully removed the the Yale name from the site and rebranded the site as CourseTable.
Yale then blocked the new site, accusing the site of “malicious activity” and threatened the students with disciplinary action if they did not take down the site. So the students took it down. But by then the story had hit the innertubes and everybody was talking about it. Yale did respond, stating:
[Yale’s course] evaluations… became available to students only in recent years and with the understanding that the information they made available to students would appear only as it currently appears on Yale’s sites — in its entirety.
Questions of whether Yale has the right to control information they have released is interesting, but that’s not why I’m posting this. The ironic part is what happened next. Another student built a Chrome browser plugin that performed the same function as YBB+. His blog post about this is definitely worth a read, but here’s an excerpt:
I built a Chrome Extension called Banned Bluebook. It modifies the Chrome browser to add CourseTable’s functionality to Yale’s official course selection website, showing the course’s average rating and workload next to each search result. It also allows students to sort these courses by rating and workload. This is the original site, and this is the site with Banned Bluebook enabled (this demo uses randomly generated rating values).
Banned Bluebook never stores data on any servers. It never talks to any non-Yale servers. Moreover, since my software is smarter at caching data locally than the official Yale course website, I expect that students using this extension will consume less bandwidth over time than students without it. Don’t believe me? You can read the source code. No data ever leaves Yale’s control. Trademarks, copyright infringement, and data security are non-issues. It’s 100% kosher.
Making a copy of data from one website and making it available on another site in a different form is one thing. Never mind that it is incredibly common on the web (heck, that’s basically what Political Irony does). But implementing it as a browser extension, so that all the work is done in the user’s browser, is another.
I’m really curious how the university can respond to this. Technology changes everything.