This week, to say the least, was productive. On Thursday, in addition to my usual weekly meeting, I was able to meet with Ms. Green, a language arts teacher at BASIS. Ms. Green had experience with NLP and was able to give me some good pointers on my project. She recommended that I use semantic features in order to perhaps improve my dataset and introduced me to Google nGrams, as well as part of speech taggers that I could use for my first method.
My second method, the method that analyzes word concreteness, is more or less finished, and runs on nearly any adjective-noun pair. It is by no means 100% accurate, however, and I have yet to run it through a large dataset in order accurately appraise the accuracy of the method–I plan on doing that this week.
When coding my first method, as previously mentioned, I ran into a couple of problems. Notably, I needed a good database that displays word co-occurrence, because within the AN pair, the noun and it’s hypernyms/hyponyms may or may not occur frequently with the nouns that are often used with the adjective–this determines whether or not the phrase is used metaphorically. I intend to use COCA, take the information off of the website, find suitable AN pairs with a part-of-speech-tagger, and analyze the different nouns that occur with any given adjective. This section of the project will definitely be the most difficult portion moving forward.