a bunch of experiments with amazon annual reports

unix text processing commands in python

tr 'value' for 'surveillance'

python string methods docs for future reference

experiment 1: grep, then print line up to 100 words

$ grep 'We' amz_1997_shareholder_letter.txt | python strip_line.py >amz_we_1997.txt

$ grep 'We' amz_2015_shareholder_letter.txt | python strip_line.py >amz_we_2015.txt

experiment 2: grep, then print line between 50 and 100 chars

$ grep 'We' amz_1997_shareholder_letter.txt | python strip_line_50_100.py >amz_50-100_1997.txt

$ grep 'We' amz_2015_shareholder_letter.txt | python strip_line_50_100.py >amz_50-100_2015.txt


grep 'user base' fb_2015_annual_report.txt | python strip_line_50_100.py >fb_user_base_chunk_2015.txt

i think some questions these experiments raise for me are:

  • okay, i have some blocks of text. i’m not used to thinking about chunks of text in a structural way. what do i do with the chunks?
  • how do i break up the chunks in a programmatic way?
  • do the chunks have anything to do with the content?
  • highlighting the corporate jargon-y-ness of annual reports is not very interesting. what’s a more interesting thing to do with corporate jargon?
  • workflow! should i create a file for each new experiment? only good experiments? what’s the best way to name these slight variations? how do i document the command? i love the idea of these commands with slight variations as a score <3

Leave a Reply