Supreme Court Affirmative Action Ruling Could Give Rise to Data Mining

What will replace affirmative action if the Supreme Court kills it?

Ever since conservative courts and voters began trying to eliminate affirmative action in the 1990s, universities have sought creative ways to boost their enrollment of minority students without explicitly relying on race. When California voters banned racial preferences in public universities in 1996, for example, the University of California responded by adopting admissions preferences based on socioeconomic status instead. And after a federal appellate court struck down the University of Texas’s race-based affirmative action program, the school adopted a plan that guaranteed admission to those students graduating in the top 10 percent of their high school class.

When the Texas effort—known as the Top Ten Percent Plan—failed to generate the racial diversity school officials sought, the university returned to using explicit racial preferences. Those preferences are now being challenged in the Supreme Court case of Fisher v. Texas, and many expect the conservative justices to deal what could be a fatal blow to race-based affirmative action at American public universities. Once again, however, the universities have a secret weapon they hope will allow them to circumvent such a ruling: data mining.

Whether it’s used in airport security or online advertising or education, data mining works by finding patterns and correlations. Based on census data, the spending patterns of my neighbors, and my Washington, D.C., ZIP code 20016, the Nielsen Company classifies me as someone who lives among the “Young Digerati”—that is, high-income consumers who are “tech-savvy and live in fashionable neighborhoods on the urban fringe.” My fellow Washingtonians a few miles to the southeast in Anacostia are categorized using very different terms. They are the “Big City Blues,” a community of “low-income Asian and African-American households occupying older inner-city apartments.” Based on where we live and what we spend, Nielsen creates aggregate predictions about our likely buying habits so that advertisers can send us ads that reflect our interests. That’s a little creepy—but then again, we’re talking about advertising. To some education experts, however, data mining also represents the future of public education.

After Michiganders voted in 2006 to ban the use of racial preferences in college admissions, the University of Michigan wasn’t willing to give up on the goal of enrolling more minority students. So it turned to a data-mining program called Descriptor Plus, which was originally developed by the College Board to help admissions officers more efficiently target likely students. The program employs the same kinds of algorithms that Nielsen uses to provide consumer data to advertisers based on demographic patterns and spending habits, but in this case, it sorts those data into categories that are useful for higher-education institutions. Descriptor Plus works by dividing the country into 180,000 geographic neighborhoods, and then regrouping those neighborhoods into 30 more manageable “clusters” whose residents share similar socioeconomic, educational, and racial characteristics.

Take two distinct clusters identified by Descriptor Plus. High School Cluster 29 is most likely to include high-achieving students who have aced standardized tests, stand out in their elite private high schools, and demonstrate superior math ability. “There is very little diversity in this cluster,” notes Descriptor Plus. By contrast, the students in High School Cluster 30 are much more likely to be ethnically diverse. While also college bound, they have far fewer resources than the junior achievers in Cluster 29. “These students,” concludes Descriptor Plus, “will typically end up at a local community college.”

Armed with the Descriptor Plus categories, the University of Michigan could give preference to applicants from low-income clusters like 29, in which African-American students were disproportionately represented, without explicitly relying on race. The method worked. Two years after Michigan voters banned the use of racial preferences, Michigan’s freshman class saw a 12 percent increase in African-American enrollment, even as the overall class size shrank and other minority groups lost ground.

If the Supreme Court’s decision in Fisher puts new restrictions on racial preferences, it is likely that universities will expand their use of data mining to get around the ruling. But data mining has proved to be an even less effective a way of promoting racial diversity in the classroom than the explicit preferences it’s designed to replace. In a new book, “Mismatch: How Affirmative Action Hurts Students It’s Intended to Help, and Why Universities Won’t Admit It,” Richard H. Sander and Stuart Taylor, Jr. note that as seniors in high school, African Americans are more likely than whites to express interest in majoring in science, technology, engineering or math majors, known as STEM. Once admitted to elite schools, however, African Americans pursuing STEM majors were more than half as likely as whites to finish with a STEM degree: students who feel less prepared than their classmates tend to leave science for less challenging humanities courses after their freshman year. Sanders told me that the minority students admitted under Descriptor Plus are, by definition, less academically qualified than those admitted under the Texas' Top Ten Percent Plan—because if they had graduated in the top 10 percent of their class, they would have gained automatic admission without the Descriptor Plus boost. By admitting minority students with lower levels of academic preparation than those admitted under the Top Ten Percent Plan, Sanders said, programs like Descriptor Plus might exacerbate the problem of racial mismatch and self-segregation.

WHILE LEGAL PRESSURES on affirmative action prompted the initial expansion of data mining as an admissions strategy, schools are also beginning to use it for other purposes—and in ways that may result in ever more segmentation and segregation of students based on their racial backgrounds, tastes, and preferences.

Tristan Denley, the provost of Austin Peay State University in Tennessee, has developed data mining programs designed to steer students toward the courses and majors in which they are most likely to succeed. One such program, Degree Compass, uses predictive analytics to estimate the grade a student is most likely to receive if he or she takes a particular class. It then recommends courses in which the student is likely to earn the highest grades. “It uses the students’ transcript data, all of their previous grades, and standardized test scores, and it combines that with the data we have with thousands of similar students who have taken the class before,” Denley told me. He said the predictions are accurate—within a half letter grade, on average. And he noted that students from lower socioeconomic backgrounds who used the program to select their classes experienced a more pronounced grade swing—from lower to higher grades—than students from higher socioeconomic groups, perhaps because they were being steered into easier courses. Although the program also records students’ race and ethnicity, Denley said he found a disproportionate grade swing in students from lower socioeconomic groups, but not from minority groups in particular.

Another program his university uses, My Future, employs similar predictive analytics to recommend majors in which students are most likely to get good grades and graduate on time. “Students are less likely to choose sociology as an incoming major,” says Denley, “because people don’t do sociology in high school; instead, lots of students choose business, pre-law or pre-med.” He hopes that by exposing students to a broader range of majors they may not have considered, My Future will help to match them with fields and careers in which they’re likely to thrive.

As college and even public high school and elementary schools record the race of students as part of their data-mining programs, there’s likely to be increased pressure to steer students with similar backgrounds into similar classes, reducing diversity in the classroom as a whole. Public high schools and even some elementary schools are beginning to input information about students’ race and ethnicity in giant databases that track their academic performance in order to construct models about what kinds of students are most likely to succeed in particular classes.

Highland Park Elementary School in Pueblo, Colorado, recently adopted a data mining program called Infinite Campus that is operated by Pearson, the textbook publishing giant. Ronda Gettel, who coordinates math and English programs at Highland Park, and she tells me she was shocked when her supervisors asked her to input information about the ethnicity of individual students while grading a math and reading program. “I was putting in how they self-reported their ethnic background, whether they’re black or Hispanic, and whether they’re getting free or reduced lunches, and their socioeconomic patterns,” says Gettel. “I thought maybe we shouldn’t be doing this—I’m a person that’s against tracking.”

Of course, guidance counselors have always had the power to steer students toward classes that coincide with their interests and ability levels. But Gettel and others are concerned that by slicing and dicing students into profiles and clusters, data mining threatens to segregate classrooms in more permanent ways, creating profiles from which students can’t easily escape, and placing minority students into less rigorous classes because of the predictions of computer programs.

Diversity in the classroom is valuable because it encourages students to interact with peers from very different backgrounds and to explore classes and careers that might not have occurred to them before they enrolled. But not all human choices can be predicted by algorithm. If the Supreme Court eliminates the use of race-based affirmative action, and drives schools to pursue an ersatz diversity through profiles and computers models, it may inadvertently encourage the proliferation of technologies that allow even less consideration of students as individuals than the racial preferences they’re designed to avoid.