UCSB Data Science Presents An APPsolutely Amazing Showcase

0
1986
Image Courtesy of Data Science UCSB

Xander Apicella
Science and Tech Editor

Nearly all in attendance were from the data science club, those who were not presenting waited to see how far their fellow club members had come, what they had managed to make in a year. The field itself has a wide breadth of possibilities — data science is defined as any experiment or application that takes data from the past (usually a lot of it) and converts it into a model that can be used to predict future events. It is a field made more practical as computing power increases, but people of all disciplines can learn its skill sets.

After a few minutes of socializing, the excited chatter thinned and someone announced that the presentation was about to start. Everyone filed into a small lecture hall. Chairs turned to face a semicircular stage, red curtains framing it, and clear plastic lectern beside a black microphone upon it. Tim climbed the stairs to the lectern and the suppressed chatter shifted to faint murmuring.

“We have a group of folks who have been dedicating their evenings after class, their free time to do these projects over the past two months,” Tim said. “I’m just going to let their work speak for itself.”

APPsolutely Successful

The first project took data from app stores as inputs — metrics like downloads, number of ratings, value of ratings, category, and more. They marketed their software to app developers, so they could — ahead of release — see what features to keep or tweak to maximize their app’s success on the market. They found that the most powerful indicator of success was the number of ratings an app had.

Predicting Drafted Quarterbacks

This project predicted which college quarterbacks the NFL would draft. Its intended customers were sports agents — the application’s team claimed that this application saves resources by allowing agents to know who they should be focusing on, rather than spending time investigating each individual player. In the end, they recommended players that agents should check out in the upcoming season.

Reddit Toxicounter

This software measured the toxicity of the AskReddit subreddit and could be adapted to analyze other sites as well. The project generated various figures and metrics examining the levels of toxicity over the years. It was designed as a moderation tool for websites — to take toxic comments down before they can hurt people. One unexpected result the team found was that AskReddit’s toxicity had correlations with real-world events. The highest toxicity spikes on the site corresponded to shootings and white nationalist rallies. They emphasized that the technology was by no means predictive in its current state, though it could be used to view past events through a different lens.

Playing Card Classifier

The next project was made and presented by an undeclared freshman a testament to the club’s acceptance and education of anyone who wants to learn. He had built a software that could tell several playing cards — six, so far — apart with almost perfect accuracy. When he attempts to classify between a greater number of card types, the accuracy decreases, but he’s planning on fixing that with time.

Dexter 2.0

The fifth project, Dexter 2.0, was built around an important medical issue and won the Most Impactful award of the evening. It takes images of a blood sample at the cellular level, and classifies white blood cells into one of the four types. It does so at a price much cheaper than the commercial standard which can cost a patient up to $100 or even $3000. The program itself has demonstrated 84 percent accuracy, which is on par with current public softwares with the same intent. It takes 30 minutes to build on a laptop, according to its team.

Dining Hall Student Counter

After a brief intermission, the sixth project was presented. This project was made to keep people from wasting time in line. With their application, they could check how many people were in each dining commons at a given time. It used some image recognition software and campus dining commons’ cameras, which anyone can access, to figure out which commons would make them wait the most.

Newsfeed Article Clustering

Project seven was a practical application made by Parker, a linguistics major at UCSB, and won the Most Innovative award. It was designed for the everyday individual and prevented newsfeed clustering — the tendency of new, important developments (events like the Mueller report, a frequent example throughout the presentation) to fill multiple articles in one’s newsfeed, despite a lack of new developments. The project was also meant to filter out multiple articles of the same tone or political affiliation, allowing one to read from a diversity of perspectives to see where they differ.

Algorithmic Trading with Reinforcement Learning

Project eight is maybe the first get-rich-quick scheme one might think of regarding a data science project — or any intelligent software, for that matter. Constructed by Calvin Wang, the club founder and vice-president, the application uses historical data to figure when to buy and sell on the stock market. He ran the program in simulation, and it made him a whopping 350,000 dollars. He hopes to use it to make some more tangible cash in the future.

Eyewire

The next project was a collaboration between Brian Lim and the citizen science program, Eyewire. He was an avid user on their site for a few years before working with them and he admires the goals of citizen science in general. The concept is that anyone, regardless of scientific background, can help to further research on the brain just by logging onto their computer and going to the Eyewire website. There, they map the brain, one neuron at a time, by solving puzzles formed by 3D renderings of the mind’s connections. By mapping one out, they are performing research in a field that is in desperate need of their work, and that of those like them. The project has already unveiled six new types of neurons, with more to come.

AntiGan

Ananya Haravu, the other vice president for the club, took to the stage. “Our last project, AntiGAN, will make you rethink whether every photo you’ve ever seen is real or fake,” she said. “I’d like to call up Amil Khan.”

Amil’s research focused on Generative Adversarial Networks, or GANs. In a GAN system generator would create fake images designed to seem real based on examples it was given. A discriminator would try to tell whether or not the generator’s images were real or fake.

Each would get better after each iteration, evolving through their cat-and-mouse relationship — the generator getting more and more realistic in its images, and the discriminator getting better and better at telling the real from the generated — until both worked at their maximum potential. The danger then, lies in the use of the generator to make image that surpass people’s abilities to discriminate against, fooling them into thinking its generated images are real.

Amil showed an image of a bedroom, a restaurant, and a celebrity and asked which were real. The audience couldn’t tell…neither could Amil. There is no subtle trick to telling a GAN image from a real one, at least not one the human eye can pull off. He worked with a few different discriminator types to try and determine which images were GAN and which were not, and succeeded in getting one that usually chose correctly.

One judge asked what would happen if he trained a generator with the discriminator that he found successful, wondering if the work of the generator would then prove, again, too elusive to detect. He said he would work on that next.

Tim announced that the presentation segment was over and that there would be a postering session outside for anyone interested in talking to the presenters. A brisk wind swept through Cowin but the sun shone bright. Groups milled about, gravitating to the various posters to hear its group talk about the product of their year’s work in data science, and what they got out of it.

Parker, the linguistics student who made the article clustering application, believed that there were subtleties to data science that people just have not been exposed to yet.

“There are some classes, but nowhere do they teach this depth of data science,” he said in an interview with The Bottom Line. “I think the whole art of data visualization — telling a story — isn’t emphasized enough.”

Tim thought that more could be done to emphasize data science here at UCSB. The school could set up cloud servers, like UC Berkeley has — so students like those in Data Science @ UCSB can remotely access the software tools they need to work on a project, rather than having to download mountains of applications for each new project. He believes more resources should be put into this issue, along with building positions with a data science focus.

He also thinks that UCSB is making some good decisions in terms of its data science future. “Professor Singh,” he said, “is leading an interdisciplinary effort for data science at the school with professors like one of the showcase’s judges, Dr. Kharitanova, recently hired to assist in this effort.”

“Their main goal is to take an interdisciplinary; very beginner-friendly approach to data science,” he said. The courses first offered will be INT courses, taught by stats and CS professors, so anyone can learn the basics of this field.

Xander Apicella
Xander Apicella is a third-year physics major interested in communicating science of all sorts to anyone and everyone. He flew in from Clarendon, New York, and misses the snow sometimes. He enjoys rock-climbing, reading, writing, and learning something new.