Thursday, October 22, 2015

Plagiarism in the digital era

I teach a class of 120 Master's students. There are two sections-52 students registered in one and 68 in the other. The material pertains to distributed computing. We talk about architectures, communication protocols, file systems, synchronization, concurrency and replication, fault-tolerance and so on and so forth. There are several in-class quizzes, a midterm and a final exam and a semester long project in the course.

So here is how the exam(s) go - I walk into the class, with/without proctors (I get proctors for midterm and finals, but not for the quizzes). In a loud and clear voice I announce, 'Please keep all bags and electronic devices including laptops, cell phones, pdas, ipads, etc. up on the dais. You are not allowed any additional piece(s) of paper as a cheat sheet or for rough work. You can only have a pencil/pen to do your exam.' I wait for a few minutes and watch as the students lazily put their things away-or so I think.

After the first quiz, some students reported to me that a few colleagues had apparently not kept cell phones away near the dais, it being 'an essential tool' that they wanted to hang on it even during the exam. These students managed to take pictures of the exam and email it to their friends in the next section, who start their exam 10 minutes after the first class is done. Well. Smart thinking.

I cannot combine the classes for the exam-it is a scheduling hassle. Only the final is scheduled ahead of time and both sections take it together.

I have separate exams for the two sections, knowing I cannot trust anyone. I learnt the adversarial setting I was working in/against.

Looking into your neighbors work is the old style of 'copying'. Quite effective, however. When you do not know something, glance at your neighbor's work and see what s/he has. Each student gets a different exam (there are usually two sets of questions I prepare for each section!). Good luck, trying to see what your neighbor has. You will first need to tally the questions, then the answers. For the midterms, my Teaching Assistant figured it would be wiser to put together a 'randomized' seating arrangement. Friends are no longer next to friends.

The random seating arrangement was a huge success in curbing cheating. But then, the 'restroom break' requests emerged. The first time around-I allowed those. I was too nice. My proctor ended up discovering that the ladies room was becoming more like a student dorm with books, electronic devices openly being used and discussed. There was even a social circle (a miniature version of a facebook friends list) setup to discuss and answer questions in the exam! Question papers were taken away and perpetrators brought to task. Since then, I allow 'No restroom breaks' by 'class policy'. Actually, feel like an autocrat of sorts with that rule in place.

Now, it almost became a power struggle. The naysayers go their way. I go mine.

In the recent midterm, I had a student come in with a knitted cap with a stiff peak projecting in front. Think baseball cap. My proctor complained he was not able to see her eyes and hence could not decide whether she was doing her own work or not. I watched her from a different direction and noticed she was not. The idea was to use the cap as a cover for looking into her neighbor's work. No digital enhancements - simple old fashioned style. And gives her quite the look. I am technically not permitted to ask her to open a head dress or cap since one may have his/her own need for it. We just moved the students around and made them sit further from each other. But hey - buy a new cap for the exam versus study hard for it?

Moving students around did not work too well either. The proctor was flabbergasted to hear one student question 'There are common questions in the two groups. Why can I not look and learn?' Excuse me? This is supposed to be an exam!

So that was for the in-class exam(s).

For the project, they are required to submit a proposal, an intermediate report and a final report. About 60% of the class wanted to do 'sentiment analysis on twitter data'. One of my colleagues got so amused, he asked them what motivated their choice and if 'sentiment' meant anything more than being positive or not. I sighed. Where does it all begin (or end)?

The Student Handbook - code of conduct. Amen.

Tuesday, October 20, 2015

Workshop on Data Science, Learning and Applications to Biomedical and Health Sciences

The NSF sponsored workshop on Data Science, Learning and Applications to Biomedical and Health Sciences (DSLA-BHS2016) will be held on Jan 7-8, 2016 at the New York Academy of Sciences. Workshop website: https://sites.google.com/site/dslabhs2016/
Please contribute if relevant to your research.

Sunday, October 18, 2015

Review of 'Description of a struggle', Franz Kafka

My first reaction after reading the 'Description of a struggle' was one of complete resentment. I neither understood the plot, nor the hidden meaning the reader had to take home.

Haunted by this disturbing feeling, I decided to search for reviews of this novella. Quoting Wikipedia,"[the work] .. is often dismissed by critics turned off by its fragmentary nature and lack of polish. John Updike, in his foreword to an English language collection of Kafka's stories calls it (along with Wedding Preparations in the Country another early story) repellent containing something of adolescent posturing and advises new readers of Kafka to skip them. Updike encourages readers to return to these early stories once initiated with his other works."

While this was encouraging, it did not relieve me of the sense of discomfort. I re-read the story only to realize that the fact that the narrator never identifies himself was part of his style of writing. In other words, one never gets to know who the protagonist is, what he does, where he lives and so on which builds a surrealism that is sometimes hard to digest. 

How can you ride (like a horse) an acquaintance from a party? 
He says "I leapt onto the shoulders of my acquaintance and by digging my fists into his back I urged him to a trot." One suddenly realizes that he is not in the real world any more. 

The description of the fat man being carried by the litter was revolting. For example, "His folds of fat were so carefully spread out that although they covered the whole litter and even hung down his side like the hem of a yellowish carpet, they did not hamper him." The conversation with the supplicant is a scathing criticism of religion - a young man with an emaciated figure who throws himself on the ground and beats his skull in the palms of his hand on the stone floor - is a hypocrite who does what he chooses to because he gets attention. It almost feels like Kafka pities people with blind faith. Why are they believers - not because they strongly feel so from within, but because there are people watching their actions. 

In short, the narrator has an acquaintance-the acquaintance meets a girl he falls in love with. No one ever knows who is the narrator, acquaintance or the girl he loves. One only works with emotions-some questions that seem apt to ask at the moment, but perhaps rising out of incidents that occurred in the past. 

This being my first read, I take Wikipedia's advice to heart and will bravely read some of his other masterpieces before revisiting this novella again.

Saturday, October 10, 2015

Measuring influence in prospographical research

Suppose you wanted to know the life story of the Australian operatic soprano, Dame Nellie Melba who was famous in the late Victorian era and early 20th century. Where would you begin your search? You could start reading about her life on the internet (perhaps on Wikipedia) and follow-up with references therein. Very soon, however, you could find yourself digging through stacks of old newspapers such as "The Musical Times" or "The Sun" published from New York to learn more about her debut at the Metropolitan Opera. Information is sparse and often incomplete.

Relying on secondary biographical information such as family archives and photographs, publicly available archives including newspaper articles, financial accounts from cities, economic and fiscal sources such as sales of deeds and tax lists, and other surviving documents of the era helps weave a story of the subject's life. It helps verification of facts from multiple sources.

A team of researchers from the School of Management, SUNY Buffalo (Dr. Haimonti Dutta), the Department of Computer Science at IIIT-Delhi (Aayushee Gupta), IBM Research India (Dr. Srikanta Bedathur) and TCS Research India (Dr. Lipika Dey) have been involved in this prospographical research. The project was funded in its initial phases by the National Endowment of Humanities.

Article level data was obtained from the historical newspaper archive of the New York Public Library after curation by the New York Public Library Labs. Using techniques from natural language processing and large scale machine learning, the team was able to build a system that could identify influential people. The noisy text from old historic newspapers were subjected to Optical Character Recognition (OCR) and the text from articles was spell corrected. This was then used to form a people gazetteer from which people with influence were identified. It was particularly interesting to find local people who held a sway in the government offices, arts and sciences, and the armed forces at that time.


A small set of influential people detected using their algorithm from two months of newspaper data published in ``The Sun" newspaper.
The team is now in the process of building a historical timeline very similar to the jazz timeline hosted by pbskids.org to help kids learn about influential people local to a geographic region.

Related Publications
1. Aayushee Gupta, Finding influential people from a historical news repository, 2014. [Master's thesis] 
https://repository.iiitd.edu.in/jspui/handle/123456789/166
2. Aayushee Gupta and Haimonti Dutta, Evaluation of Spell Correction on Noisy OCR Data. INFORMS Workshop on Data Mining and Analytics at INFORMS Annual Meeting, Philadelphia, October 2015.
3. Aayushee Gupta, Haimonti Dutta, Srikanta Bedathur and Lipika Dey. A Machine Learning Framework for Prosopographical Research. In preparation.