An individual's concepts|
[Most Recent Entries]
Below are the 20 most recent journal entries recorded in
eternally stressed semanticist's LiveJournal:
[ << Previous 20 ]
[ << Previous 20 ]
|Tuesday, May 8th, 2012|
|The intensionality of "alleged"
"Alleged" is well-known in semantics as a word that introduces reference to possible worlds: just as a "former senator" isn't a senator (who is former), but rather someone who at a prior time was a senator, an "alleged criminal" is someone who isn't (necessarily) an actual criminal, but merely someone who, in the worlds compatible with what the alleger belives, is a criminal.
Which makes the following sub-header from the front page of boston.com particularly odd:
A witness saw Christopher Piantedosi allegedly stab his ex-girlfriend in their daughter's room via a videochat on an iPad.
It's sensible to say that Piantedosi allegedly stabbed his ex-girlfriend; it means that, according to police, Piantedosi stabbed his ex-girlfriend. But I'm not at all sure what it would mean for someone to see Piantedosi allegedly stab someone. (She saw the police allege that he stabbed her?) Perhaps this witness can see into possible worlds, in which case we really need to get her into a lab to do some experimental semantics.
|Friday, March 2nd, 2012|
|How to Normalize: Karma edition
Word frequency is something I deal with a lot in my work. It's the basis of some fairly fundamental information: how often does Word X show up in the text you're analyzing? Is that more or less often than you'd expect Word X to show up? It's by no means the way we measure everything, but it's at the very least a good benchmark.
The problem I was considering Wednesday morning was the following: if you take a word's frequency in the text you're analyzing, compare it to its frequency in a corpus (say, the Google Books from a certain point in time), multiply it by the log of this other thing and divide it by the fifth root of something else after adding in an offset in order to...the point is, once you're done with your computations, you end up with a pretty arbitrary looking number that falls somewhere on a scale from zero to who the heck knows.
So I thought, wouldn't it be nice to normalize those numbers? Some sort of normalization that would bring them in line as something meaningful, as measured against some kind of standard. My coworker suggested that the right standard might be something like "a word that occurs an average number of times in the corpus", which turned out to be 223,037 ("the", for comparison, occurs a little less than 8.7 billion times). As a number, 223,037 wasn't bad, but it's nice to get a handle of what that means, so we looked at words that appeared as close to that number of times as possible.
The first three we found, in order of closeness, were "Wien", "bombed", and "parol", none of which struck as something you'd want to name your metric after. Then we hit the fourth word, which turned out to be exactly the kind of word we wanted to express our normalization method.
That word, with 223,058 occurrences, is "normalization".
Sometimes, things just work.
|Tuesday, February 21st, 2012|
|How not to remove suffixes
Sadly, I don't have much interesting to report about my work, because most of the interesting things I've discovered have been on particular not-yet-discussable projects; and because mostly I've just been frustrated with the state of natural language processing, which I'm sure has made advances since the last time I looked in on it when I was in college, but I'm having a little trouble seeing them. (I'm sure tf*idf
is great and all, but I'd like it better if it were working.)
So the best I can do, in the tradition of this video courtesy of Heidi Harley
, is to observe that a coworker has found an example of bad suffix removal that probably beats anything I've found so far. I was merely amused to learn that a particularly common term in one set of data was the singular city of "Lo Angele", the parser having helpfully stripped away the plural suffixes; but he discovered that the parser did the same for the less-than-superlative city of Budap.
|Tuesday, February 14th, 2012|
|Spell checking and the internet
I've spent the last week noodling around with spelling correction in Python
, with no particularly good results. (I might need to do more than noodle, if I want good results.) Part of the problem is deciding what to do with unfamiliar words—and if your client wants you to be searching on Pepsi-Cola (not an actual example), you kind of want references to "Pespi" to be corrected to "Pepsi"...but without any reference to "Cole Porter" to be corrected to "Cola Porter".
Today's lesson, though, while looking through capitalized phrases in a corpus, and finding "beret syndrome": no amount of spell-checking will help you when someone refers to Gion Beret Syndrome
|Tuesday, February 7th, 2012|
|Thanks, Python Style Guide!
So, this journal having been laying fallow while I transitioned from academia to real, live, paying jobs, I'm now thinking of reviving it for the occasional work-related post.
As a potential starting post, then: I'm reading the Python code style guide
, which I've never read before, but now that I'm writing code other people will have to read, it seems like a good idea to get accustomed to it. I'm finding it a mix of things that are good ideas, things that don't strike me as particularly useful or necessary, and snottiness about using complete sentences and writing in English. On this last point, the sentence that really struck me:
When writing English, Strunk and White apply.
I'll set aside the subject-verb agreement. (I know that US and British English differ on things like "the crowd is..." vs. "the crowd are...", but even in British English, wouldn't "Strunk and White" be taken as a single unit, insofar as it's a single book, and therefore use the singular verb "applies"?) Instead, what I find really striking is the dangling modifier: is it supposed to be Strunk and White who are writing English? A proper, Strunk-and-White-sanctioned sentence would say "When writing English, you should follow Strunk and White" or "When you are writing English, Strunk and White apply" or "In English, Strunk and White apply". I'd even be willing to grant them "When writing English, Strunk and White apply to what you write", which I believe S&W and its adherents would object to, because at least there's something the sentence ("you") for "when writing English" to modify. But the sentence as it stands? Unacceptable by any standard.
Meanwhile, I'm going to go back to writing comments however I darned well care to. Er, to however I darned well care.
|Wednesday, June 23rd, 2010|
|I love children! Sauteed in a little...
I don't have a huge problem with dangling participles; I'll often say things like "Speaking as a linguist, that sentence isn't grammatical", where it's obviously not the sentence that's speaking as a linguist. (See various Language Log posts from Geoff Pullum arguing that avoiding danging participles is typically a matter of politeness as opposed to grammar.) But the following sentence from the government's anti-childhood-obesity website
Cauliflower? No problem. Roasted with garlic and olive oil, the kids happily munched as if they were fries.
It doesn't help that the only plural antecedent for "they" is "the kids"; but, man, they really ought to have a copyeditor over there.
|Thursday, March 4th, 2010|
Dear Mango Languages,
Insofar as I'm not fluent in at least three languages, nor am I certain I want to work 50-60 hour weeks as a contract worker, I doubt I'm going to be applying for your job
in any case. But in the meantime, in case I do become fluent in another language or two and find myself wanting to work long hours, perhaps you should explain your core values a little.
I mean, it's a good thing that on your blog, you have a post that says
: "Mango Languages has six core values that we all believe very strongly in: Quality, Entrepreneurial Spirit, Positive Attitude, Innovation, Integrity, and Fundipline. Let’s chat today about Entrepreneurial Spirit." Perhaps you should consider that it isn't Entrepreneurial Spirit that people need an explanation of.
|Friday, February 5th, 2010|
|Wednesday, January 6th, 2010|
|A sentence too poorly constructed to succeed
Lake Superior State University has, as is its wont, released its annual "please pay attention to us" annual Banished Words List
. As Arnold Zwicky puts it
, "it's a steaming pile of intemperate peeving". Picking apart the gripes is like shooting fish in a barrel—they hate "czar" because it's "long used by the media" and "tweet" because it's new, and so forth. (Heck, the complaint about the latter that "I don't know a single non-celebrity who actually uses ['tweet']" just makes the people behind this look old and grumpy.) They hate "app" because it's "yt another abrv"; presumably they say "mobile vulgus" and "taximeter cabriolet" instead of "mob" and "taxicab", but more to the point, Merriam-Webster dates the word to 1987, so they're coming to this fight a little late....
Right, sorry, enough barrelfish shooting. The point I'd actually wanted to make was about their quote from Claire Shefchik in favor of banning "too big to fail": "Just for the record, nothing's too big to fail unless the government lets it." That is to say, nothing's too big to fail, so no matter how big something is, it can still fail, unless the government lets it fail, in which case it's...wait, if the government lets it fail, then it's too big to fail, i.e., it can't fail? Or does she mean that unless the government lets it be
too big to fail, then—except that to let something be too big to fail, it has to already be too big to fail, so...
I'm pretty sure that Ms. Shefchik's statement is in fact gibberish. But that didn't stop these defenders of the Queen's English from citing it approvingly. Dolts.
|Tuesday, December 8th, 2009|
To those reading this who feel qualified to answer: in the sentence Mary took physics in high school
, is Mary went to high school
an entailment or a presupposition?
|Wednesday, December 2nd, 2009|
|Tuesday, November 17th, 2009|
I would like to take a moment to thank the creators of PowerPoint who took the time to understand that, if I changed the font in one line of Slide 22 into Braille, it meant that I wanted all of my Verdana on slides 31 to 44—not every instance, mind you, just the ones appearing in text boxes—to also be changed into Braille. Your understanding of the way I use your product is truly astonishing.
|Tuesday, November 10th, 2009|
|Airticle / Collogue / Gaun on the nou
As a linguist, I more or less have to believe that this Wikipaedia
is real and not a parody, and yet, it so entirely looks like someone made it up: Neeps is weel-likit in Europe, parteecular in caulder airts, sith they growe weel in cauld climates an can be keepit for mony months efter the hairst.
|Saturday, November 7th, 2009|
|Sorta language related
When I give my name at delis, pizza places, and so forth, I give it as "Henry", which is not at all my name. But it's a name I've used as a pseudonym; and I use it because it pretty much always gets spelled right, as opposed to "Lance", which has been known to come out as "Vance" or "Len" or, on a particularly bad day, "Laura". (I think that was a fact about the guy's handwriting, as opposed to my pronunciation or his hearing.) "Henry" is recognizable, and I kind of had to snicker when I was at lunch with Youri and Uri, whose names were predictably mangled while "Henry" came out fine. (It was nearly a problem once when I gave my name as "Henry" and then handed a credit card to the cashier, who looked at it and looked at me....)
The other day at Boloco, the guy at the end of the line called out...well, something. It sounded like it started with a "k". But the burrito he described sounded like the one I'd ordered, so I nodded, and took it, and sat down, and looked at the receipt, where the cashier had typed "kaka". "Kaka?" I thought. "That's pretty far off for Henry..." I considered, I hesitated, and finally I went back to the counter, where sure enough my burrito was done, and the woman before me was looking confused and hungry. So I handed hers to her (I hadn't unwrapped it), took mine, and all was well.
But man—poor Keiko.
|Tuesday, October 13th, 2009|
|Reflection on Northeastern
Some of my students are just so lost
Not academically; they're doing fine. It's just really hard to find your way to my office, and I suspect that some people who want to come to office hours are staring at elevators that don't even go to the basement and thinking, "What?" (Indeed, this post was interrupted by a student saying, "I just got lost for ten minutes trying to find this place.") Heaven knows that the other day, to get here from the fourth-floor-next-building-over office where my mailbox is, I eventually had to give up, go outside, and come in a different door.
Say what you will about Frank Gehry; whoever designed the J-shaped Nightingale/Holmes/Meserve/Whatever complex has a lot to answer for.
|Monday, October 5th, 2009|
|Number nine...number nine...
Obviously, "nine" denotes a constant; the number "nine" doesn't change. (Even when the number of planets changed from nine to eight, the number nine itself didn't change at all.) Which is why I was particularly pleased to see the following quote in the New York Times, from Dr. Elizabeth Blackburn, one of this year's winners of the Nobel Prize in Medicine:
Only eight women had won the Nobel Prize in Physiology or Medicine. Asked how she felt about becoming No. 9, Dr. Blackburn replied, "Very excited, and hoping that nine will quickly become a larger number."
|Monday, September 14th, 2009|
|Oh, Human Resources...
Teaching has been going at least passably well. The students seem attentive, I think they're understanding the material—I mean, we'll find out for certain when they turn in Homework 1.
At this point, the biggest frustration is not having an email account or access to the course website. This morning, I got mail from Heather telling me that I am (at last!) in the system, so I went to the Blackboard site to log in. No luck. It seemed to be taking my registration, but I couldn't log in, using either my ID number or the standard first-initial-last-name. Hm. So off I went to the help desk in the library (which I had to semi-sneak into, since of course I don't have an ID card yet, either). The guy at the help desk checked to make sure I was in the system, which I was, and took me to a computer to log in...which I couldn't. After some more poking around, he discovered why: HR had set up the account for F.Last, but somewhere along the way swapped my names and put me in the system as L.First.
Apparently it's already being looked into and should be resolved by tomorrow morning. In the meantime...man, it's always something, isn't it?
|Saturday, August 29th, 2009|
I thought that Monday's email debacle [please, please, please don't even ask] was going to be the weirdest thing to happen to me all month. I didn't anticipate Tuesday. (Wednesday's possibly-food-poisoning-related severe abdominal pain was just sort of icing.)
Long story short, I seem to be teaching Intro to Linguistics at Northeastern this semester. Go figure. More information as it becomes available.
|Wednesday, August 19th, 2009|
|Call for judgments: redux!
Three months ago
, I asked loyal readers of this blog for judgments on the truth/falsity/appropriateness/what-have-y
ou of the sentence Jordan mostly knows who was in the Beatles
. I'd intended to post a summary of the results, but things got away from me, as I was preparing for a trip to Germany at the time, and so forth. Things are now back with me, and the topic has become even more relevant to what I'm working on, and so I finally sat down to collate the responses.
Unfortunately, the responses are a case study in pragmatics, because there was a serious complicating factor in my question: while I knew, in posing the scenarios,*
that Jordan being right or wrong about the number of Beatles would affect people's judgments, I didn't take into account just how fundamental people would find it that the Fab Four had four members. (Among the comments: "I think you might get different judgements from people who say otherwise if you substitute corresponding statements about Herman's Hermits"; "I find a big clash (in vibe) between the opening sentence, about Jordan thinking they're the greatest anything, and 'Jordan mostly knows who was in the Beatles.'"; and most extensively,
I think I know most of the members of the Travelling Wilburys. (Note that I don't know how to spell the name of the band, however.) If I try to list them, I might miss one or two, and I might also include one or two people who are not in the band, but I think my list would be mostly right.
However, I don't know how many people are in the band. I would guess four, but it could easily be five, and I wouldn't be shocked to learn it's more or less than that. The difference between the Beatles and the Travelling Wilburys is that the number of members of the Travelling Wilburys doesn't feel to me like a critical fact about the band. It's a supergroup and I would expect to recognize the names of all of the members from other groups they have played in, but how many of them there actually are doesn't seem as important.
So I'd like to try again. New scenario, new person, new question.
Here are a few facts of the matter: I happen to have a terrible
sense of geography. And, as it happens, Minnesota is bordered by North Dakota and South Dakota to the west, Wisconsin to the east, and Iowa to the south, which I know because I just looked it up.*
So, let's suppose you see me a few days from now, by which point I'll have forgotten that I just looked this up (because also: terrible memory), and you say to me, "Your sense of geography can't be that bad, can it? Tell me which states you think border Minnesota." Consider sentences (1) and (2); for each scenario, would you judge them true or false (or somehow otherwise)?
(1) Lance mostly remembered which states border Minnesota.
(2) Lance was mostly certain which states border Minnesota.
- Scenario A:
- I perk up. "Minnesota? That's easy! The five states that border it are North Dakota, South Dakota, Wisconsin, Iowa, and Nebraska.
- Scenario B:
- I frown a little. "Minnesota? Well, it's bordered by North Dakota, South Dakota, Wisconsin, Iowa, I'm sure of those, and...er...I forget the fifth. Nebraska or Missouri or Illinois or Kansas or Wyoming, something like that."
- Scenario C:
- I perk up. "Minnesota? That's easy! The four states that border it are North Dakota, South Dakota, Wisconsin, and Nebraska."
- Scenario D:
- I perk up. "Minnesota? That's easy! The three states that border it are North Dakota, South Dakota, and Wisconsin."
- Scenario E:
- I frown a little. "Minnesota? Well, it's bordered by...there are four of them, um...maybe North Dakota, South Dakota, Wisconsin, Nebraska, and Iowa, no wait that's five—ok, I'm sure it's four of those five."
Thanks for bearing with me on this; I know that in this case it's a lot of judgments (five scenarios, two sentences in each, though for all I know the judgments come out the same for both of them in each one, I've lost all ability to tell). Comments again screened to let people judge the sentences uninfluenced by others, but I'll try to get a summary posted sooner rather than later this time.*
Cnoocy pointed out in the comments on the post that "scenario" is from the Italian, and its Italian plural—used even in English when referring to a synopsis of a play, according to Merriam Webster's Third New Unabridged, perhaps because it's used in particular for commedia dell'arte—is "scenari". From the Latin scaenarium
, "place for erecting stages". And now you know!*
Also Ontario and Manitoba to the north, but that won't be important for this scenario. For the purposes of these sentences, we'll assume that as bad as my sense of geography might be, my knowledge of what's a state and what isn't a state is complete and correct.
|Friday, August 14th, 2009|
|Very, very confused
Hey there. I was going to post something linguistics-related but not really related to my work; and then tonight I started staring at something in a published paper—in two papers, actually, since Paper B picks it up from Paper A—and suddenly realized it doesn't work
. Huge empirical flaw. Like, drive-a-truck-through, wrong-truth-conditions flaw. And I have read and reread and rereread these papers for nearly five years without noticing before, which makes me wonder how I missed it, and how the authors missed it, and the reviewers, and the editors of the respective journals (Journal of Semantics
and Linguistics & Philosophy
—not journals known for their lack of attention to detail). And therefore makes me think I'm misanalyzing this now.
So. I'm likely to send email to a few people, but in the meantime and just in case: are there any (semi-)professional semanticists reading this, who'd be willing to check my math here?