Maintaining the Integrity of Big Data

4 min readApr 14, 2017

April 14, 2017 | By Priyanka Dinakar

In 2014, The White House released a report on Big Data, outlining the goals of future data usage. One of these goals detailed the use of big data for social good, stating that “big data [should be] used ethically to reduce discrimination and advance opportunity, fairness, and inclusion, [and] it should inform the development of both private sector standards and public policy making in this space.”

When presenting data, it is important to maintain the integrity of that data in order to reduce discrimination and promote inclusion. This, of course, connotes a dangerous realm within this idea of big data, one that enables presentation manipulation and disparate representation, which is particularly relevant to the Asian-American demographic in the United States. The term “Asian-American” represents a group for which there is very little data integrity maintenance because this data is often presented in a largely aggregated and, therefore, misleading way.

One of the challenges listed in the White House report is selection bias: when a set of data is not representative of an entire population due to (intentional or unintentional) selection manipulation, which favors certain groups over others. For example, since the label “Asian-American” consists of many different ethnic subgroups, aggregate data often misrepresents the situational reality of many of these people. Take the Indian-American subgroup versus the Bangladeshi-American subgroup: the median household income for Indian-American families is about $40,000 higher than that of Bangladeshi-American families. Yet, does big data take this into account when profiling the given Asian-American? Further, when looking at national averages, the poverty rates of American Hmongs, Cambodians, Laotians, and Vietnamese are all higher than the national average of Americans in poverty. This shows that there is a stark divide between different Asian-American communities not only within these subgroups themselves but also in contrast to Americans, especially regarding socioeconomic status.

Misleading aggregate data reverse engineers this argument that Asian-Americans are performing better than other groups, thereby causing discrimination and pitting ethnic groups against one another. Many policy decisions affecting Asian-Americans are made by people who are not Asian-American, further causing polarization and discrimination between Asian-Americans and other groups. The “positive stereotype,” often causing Asian-Americans to be portrayed as more successful and hardworking, can carry a cost for young people by artificially inflating expectations or, on the other hand, narrowing life decisions.

Aside from socioeconomic class and income, Asian-Americans as an aggregate group are often thought to perform better academically than other groups, for which there are a few reasons. One of these reasons is the fact that, on a macro level, Asian-Americans are able to attend better schools. To test this, the Brookings Institution calculated math proficiency rates, stripping Asian-American test scores from schools’ aggregate test scores on state-administered tests to ensure that Asian-American test scores were not driving up testing averages of the schools. Then, they calculated the average passing rate for schools by Public-Use Microdata Area (PUMA), which are rough estimates for public school attendance zones, and ranked all PUMAs by state according to the average math-passing-rate. Through this study, Asian-Americans, especially those earning higher incomes, were found to live in areas with better performing public schools, granting Asian-American children access to a better education.

Even in this study, however, Asian-Americans are treated as a single group, one that strips this data of its integrity as it is blind to these various subgroups. Many students of East Asian and Indian descent are doing well economically, but Cambodians and Hmongs, as previously mentioned, have higher poverty rates than the national average. This means that they are not able to attend as prestigious of schools and, thus, often perform worse than East Asian Americans and Indian Americans. This leads to a wide gap in academic achievement and opportunity within the larger, aggregate demographic of “Asian-American.”

According to the White House report on big data, better algorithmic systems must be designed to include transparency and accountability mechanisms, so subjects can appeal algorithmic-based, misinformed decisions. It is also important for Asian-Americans to be involved in policymaking and decisions to inform the algorithms used for analytics.

In order to solve these larger systemic issues, we must begin with an overall shift in mindset regarding Asian-Americans. While it is true that some benefit disproportionately from well-educated parents and a belief in hard work, it is important not to place too much emphasis on these two factors rather than more straightforward ones, which affect public policy more concretely.

For more insight, check out this video: “The Myth of Asian-American Success and How Invisibility Becomes Institutionalized”

Originally published at beeckcenter.georgetown.edu on April 14, 2017.