The Bookie Posted November 17, 2015 Share Posted November 17, 2015 Travis Yost penned a blistering attack on NHL/SAP and their new enhanced stats page that kicked off this past February. It's a pretty long and in-depth read - I've bolded the summary at the end if you want to skip to that. Quote SAP, NHL, and The Long Con SAP and the NHL entered into a massive joint partnership in Q1 2015. The plan was for SAP to ‘revolutionize the hockey stats world’. Let’s check in with how they have done so far: 1. NHL’s stats database currently suggests that seventeen regular goaltenders have not given up a goal on the penalty kill (This is because they have inverted special teams save percentage.) 2. NHL’s stats database currently features completely inaccurate/randomly generated numbers for team-level shot statistics. They have been compared to multiple independent resources (all of which are agreeable on what the numbers should be). Not in the same universe. 3. NHL’s stats database has either inverted the faceoff count or is calculating zone starts through, again, randomly generated numbers. (Also true for certain players at the individual level.) 4. SAP has convinced the NHL that they can deliver results in less than zero seconds: 5. It is believed that SAP is going to (inexplicably) need three years time to digitize the NHL’s old stat sheets. 6. One of their biggest ‘rollouts’ to date was automated data viz, which includes user-friendly graphs like the following: 7a. The capturing and incorporation of “clutch factor” measurements, paragraph below, which almost certainly do not exist (but certainly sound nice!) 7b. As an extension of that, the belief that faceoffs are so crucial in the determination of outcomes despite all evidence to the contrary. 8. The usage of ‘close’ statistics (as a way to mitigate score effects) despite them being phased out years ago in favor of score-adjusted metrics. 9. The creation of a ‘milestone tracker’ to capture huge accomplishments for active players such as … Chris Pronger. 10. “Deep statistical comparisons” which include little more than basic counting statistics that have existed for decades. 11. The searching of a database coordinator with proficiency in … Powerpoint. 12. A relationship grounded in the belief that SAP would deliver ‘never before seen ideas’, which includes re-hashing of years old data that lives (accurately!) on no less than a dozen sites across the internet. As an aside, it’s somewhat amazing the NHL had so much tunnel vision over the last decade that they genuinely believe some of the furnished metrics are property of SAP. 13. Speaking of those never before seen ideas, check out these unprecedented (and stolen) shift charts! 14. Randomly generated CorsiClose% numbers. CorsiClose% is a defunct metric and has been for quite some time. Even if CorsiClose% wasn’t defunct, the numbers are still wrong: 15. Their belief that what the market wants is ‘black box power rankings’, which was a key component of their ‘Phase Three’ rollout: 16. Totally inaccurate or randomly generated goaltending numbers with no source data from an old presentation: 17a. A comical, comical belief that they had built a predictive model that could accurately select 85% of post-season winners. A reliable source indicated that this is one area the NHL pushed back on in disbelief. SAP eventually purged any mention of it from their publications without clarification as to whhy. 17b. (We checked the model anyway. I stopped caring once they couldn’t clear 50%.) 18. Has their snake oil sales act and falsified data had a negative impact? Well, hockey media still cite their numbers, despite the fact that (a) their database is inundated with inaccuracies; and (b) their business model seems to be little more than pulling doing a dance and pulling wool over the eyes of Big NHL. The dangers of being perceived as credible despite no supporting evidence, I suspect. 19a. What do NHL employees think about the product? Here’s Andrew Thomas, consultant to the Minnesota Wild: “ ..What burns me the most isn’t at all "competition”: it’s extremely lazy work masquerading as professional innovation.” 19b. A league employee: “Up there with the first Lindros trade and the 1988 game with linesmen in yellow practice jerseys, only worse because the league would have us believe it is now competent.” 19c. Another league employee: “Outrageously bad.” 20. At the time of the first (and second) rollouts, NHL.com failed to separate game states from one another. I say this without hesitation: the separation of hockey statistics by game state is the FIRST STEP in hockey analysis. In summary: there is no reason to go to NHL.com for anything related to hockey statistics. Their numbers are inaccurate. If not inaccurate, they are misleading. If not inaccurate or misleading, they aren’t capturing what they believe they are capturing. If none of the above, the site is a total hassle to maneuver through and the filtering/visualization looks like a child slapped the cursor on MS Paint a hundred times. The NHL, at an early point in their business relationship with SAP, considered a liaison or oversight position to ensure that SAP’s deliverables were accurate. That never materialized. SAP’s ran roughshod since then. Many of the above numbers are data points pulled from months ago. These [inaccurate] numbers remain on the official web site of the NHL. There’s no quality control, no reliability, no incentive for SAP to satisfy their half of the partnership. Why? Because at some point in time, the NHL realized that pushing the stats responsibilities out to a third-party meant a lot of man hours and headaches saved. They took no interest in vetting the information. And they ate up a ridiculous SAP song-and-dance about a litany of never-seen-before statistics that have existed in the foremost corners of the internet for close to a decade. The NHL should terminate their partnership and find someone who actually cares about the work they are doing. And the SAP should find more big business to sink their teeth into before the well dries up. Monorail sales have never been better. Lastly: if you ever want to conduct stats research, the following resources are worth your time. Behindthenet.ca War-on-ice.com Hockeyanalysis.com Nicetimeonice.com HockeyStats.Ca TSN.Ca (Naturally.) http://yosttravis.tumblr.com/post/133148427661/sap-nhl-and-the-long-con Puck Daddy article today with some follow-up responses from the League: Quote NHL slammed for ‘catastrophically’ bad advanced stats, listens to critics NEW YORK, NY - OCTOBER 08: National Hockey League Commissioner Gary Bettman visits the floor of the New York Stock Exchange after ringing the opening bell on October 8, 2015 in New York City. (Photo by Andrew Burton/Getty Images) When it came to the National Hockey League’s new advanced stats site, a few things did not compute for Travis Yost. OK, more than a few things. “There is no reason to go to NHL.com for anything related to hockey statistics. Their numbers are inaccurate. If not inaccurate, they are misleading. If not inaccurate or misleading, they aren’t capturing what they believe they are capturing,” wrote Yost, an analytics writer for TSN and co-host of the PDOcast, an analytics heavy podcast. Yost’s screed against the NHL.com advanced stats site went viral, echoing and summarizing the concerns from many hockey fans about the League’s approach to analytics and its partnership with SAP, which was announced with much fanfare in February.  That's when they added an “enhanced stats” section added to NHL.com, that featured metrics on puck possession and zone starts. But the partnership was widely criticized by the advanced stats community, from the League’s decision to do away with traditional stat names like “Corsi” to some specious boasts from SAP about its number crunching – as Yost notes, there was a “comical” claim that “they had built a predictive model that could accurately select 85% of post-season winners.” Quietly, references to that model have been scuttled. “The NHL should terminate their partnership and find someone who actually cares about the work they are doing. And the SAP should find more big business to sink their teeth into before the well dries up. Monorail sales have never been better,” Yost wrote. Ouch. In an interview with Puck Daddy on Monday, Yost didn’t mince words on NHL.com’s advanced stats site. “As for right now, no, they are not a good source. In fact they are the worst hockey stats source I can ever recall, and it's not particularly close. There are data integrity issues everywhere. They don't seem to know what they are scraping. They don't seem to know what is and isn't relevant. The visualization is atrocious. They add no value at present time. And sadly they are getting precisely zero help from their business partner in SAP in all of this, who should know better,” he said. Chris Foster felt the burn of Yost’s critique all way up in the NHL’s front office. “The first couple of points, those were great. Fantastic feedback. But the rest of the criticisms … I felt he was trying to pile on and paint the entire site as problematic, when he found a couple of issues that were corrected quickly,” said Foster, the NHL’s director of digital media. “I would call that a very large generalization not based on truth. It’s not an accurate assessment of the site.” Between Yost’s screed and Foster’s defense we find some common ground: There were some basic issues with the NHL’s fancy new stats site, and it was that criticism that prompted their correction. *** Problem No. 1: Goalies On The Kill Yost noted that, according to NHL.com’s stats, 17 goalies had not given up a goal on the penalty kill. Which is frankly impossible, given the stats for the League’s power play efficiency. This was a valid concern and an easy fix. “We basically had the drop down menu backwards,” said John Dellapina of the NHL. “We had the menu labeled as ‘shorthanded’ when facing shorthanded shots.” The NHL now has a save a goalie makes while his team is shorthanded properly labeled. Problem No. 2: NHL’s stats database currently features completely inaccurate/randomly generated numbers for team-level shot statistics. Yost ran a chart that showed the wild disparity between the NHL’s numbers detailing Corsi-For (shots on goal, missed, or blocked) per 60 minutes of even strength time: View photo . Yost Yikes, right? Turns out the disparity is a difference in philosophy. According to the NHL, some stats sites use “the average number of even strength minutes in a game” while the NHL numbers are based on 60 minutes of even strength hockey. The NHL argues that while its numbers weren’t in sync with those of the stats sites, they all led to the same conclusions about teams. “It’s like Celsius and Fahrenheit,” said Dellapina. “If you look at the chart, you see that worst teams are still the worst teams and the best teams are still the best teams.” That said, the NHL is making sure its methodology is in sync with other stats sites after Yost’s call out. “All of the data is correct, but we just used different standards for time on ice. It was good feedback, so we’re making that adjustment,” said Foster. Problem No. 3: Zone Starts The NHL’s numbers on where plays begin or end were wildly inaccurate when compared to advanced stats sites. In one example, the Carolina Hurricanes started 39 percent of their plays in the attacking zone according to independent stats sites, but the NHL had them at just over 30 percent. “NHL’s stats database has either inverted the faceoff count or is calculating zone starts through, again, randomly generated numbers,” wrote Yost. Turns out it’s the former, and it’s human error. “In two or three arenas, we put the officials who record this stuff on the opposite side of the ice. It’s an X and Y coordinates input, not offensive or defensive. For all games in those arenas, it was flipped. Derek Stepan was supposed to have 15 offensive zone starts, and instead he had 15 defensive zone starts,” said Dellapina. So the NHL has rectified this as well. But the question remains: How can mistakes like this happen on the League’s official site when so many other stats sites, scraping the same data, are accurate? “I'm guessing the more likely answer is they didn't spend five minutes of their time to realize that the sheets list zones relative to the home team. Other independent databases haven't had a single issue calculating zone starts, and they are scraping from the same exact resource. Again, you can call it what you want: laziness, ignorance, an accident. These things shouldn't happen to a billion dollar enterprise if hobbyists can get it right,” said Yost in an interview on Monday. *** One of the primary attacks on the NHL’s “enhanced stats” site was that it was a poor attempt to replicate what other stats sites had already perfected. In some cases, the metrics they chose to use have been already tossed aside by cutting-edge analytics analysts, and have been criticized as being behind the curve. Take “close” stats for example, defined by War On Ice as “situations when the game is within 1 goal (1st and 2nd periods) or tied (3rd period or overtime). It’s a stat cited by many when writing about puck possession, but it’s been vetted and diminished by many in the analytics community. “Years ago, smart people recognized that simply throwing out data for the sake of correcting for score effects was inefficient. We started using score adjusted stats at the team-level as far back as 2012. It was reaffirmed as a superior approach in terms of repeatability and predictability in 2014. Anyone who has spent 10 minutes on the internet looking up hockey stats is by and large familiar with this work. I don't know anyone who has cited FenwickClose% or CorsiClose% in years for these very reasons,” said Yost. Foster argues the jury is still out on “close” stats, which is why the NHL uses them. “It’s one variation. It’s one context. You can use it or you can choose to ignore it. It’s fair to say that some sites are phasing it out, but we’re putting it out there. I think it’s up for debate. I don’t think there’s been anything that definitive,” he said. Yost sees the “close” debate as part of a larger problem with the NHL and SAP project. “It's not just about 'Close' stats. It's about paying attention and knowing what's already been done. They were years behind the curve and once someone finally realized that they needed to catch up (and fast), they rushed the entire thing without thinking or vetting anything that lived on the web site,” he said. “The whole thing has been a catastrophe.” *** From the start of the project, there’s been an adversarial relationship between the NHL and the advanced stats community. It started with a slight change in language within the NHL.com terms of service that seemed to target the way advanced stats sites gather their data. It continued when the NHL, for whatever reason, didn’t involve the established sites and the smartest analytics minds in helping to craft their “enhanced stats” site or rolling it out. Besides the obvious fact that, in essence, hockey fandom’s garage band had started playing stadiums. “I get the initial antagonism. Once the league starts doing it, it’s not as exclusive. It becomes a little more mainstream,” said Foster. “We don’t feel that we’re in competition with any other sites. The hockey analytics sites are the trailblazers. They’ve been doing it for years before us. We don’t want to take anything away from them. We just have in some cases a larger reach, and just want to make these stats as accessible as possible.” Both Yost and Foster are hopeful that this project can eventually get to where it was promised to go, bringing advanced to the mainstream and increasing the quality of the data. “I cannot emphasize this enough: the NHL needs someone to vet everything that SAP's doing. It's really that simple,” said Yost. “When you don't have someone who is familiar with the work that's out there, you get things like 'Big Data Can Predict 85% of All Games' produced on league web sites. Get that person in place, get player tracking data going in a year or so, and I become genuinely hopeful that they can become an excellent source for hockey stats -- not dissimilar to what has occurred with the NBA and NBA.com. There are smart people who work (or worked) on this project. They just didn't listen to them.” Foster says that everything SAP does is signed-off on by the NHL. Although it stings, Foster accepts the criticisms from Yost and others in the analytics community. Because as much as these oversights and wrong turns have earned NHL.com its scorn, he hopes they eventually make the SAP-driven advanced stats pages better. “The fan feedback has been vital to making improvements to the site. Our goal is a spirit of collaboration,” he said. “We’re not in competition. We’re not trying to take traffic away from other sites or shut down other sites. We want to be part of the conversation as well. And we have a big voice.” Link to comment Share on other sites More sharing options...
This topic is now archived and is closed to further replies.