jueves, 17 de agosto de 2017

grep for two words in the same file

Let's go for some more csv (or any text file) fixing, grepping and slicing and dicing.

The problem is simple:
Find a file (among a vast amount of them) that contains 2 or more words, not necessarily in the same line.

That makes piping greps onto other greps useless. The solution is quite easy, but it might not be obvious:

grep -l word1 **/*csv(.) | xargs grep -l word2 

Thanks for watching.

miércoles, 16 de agosto de 2017

guerrilla csv and xlsx

I like to have a huge toolbox so that I can always find the right tool to do any task. But I'm also a big fan of composability, and orthogonality.  So it's a bit like vim vs emacs, or small languages vs big languages, or scheme vs CL, or Python vs Perl.

On the command line, I also like to find tools that compose.  Although pipes and xargs are the way to compose commands, the interfaces have to be compatble, by using stdin/stdout, or file names ( <() comes to the rescue by helping with the plumbing).

So today I had to count the appearances of a given word in different xlsx files. Each xlsx had many sheets, and we only want to count the appearances in column 9.  It was kind of a checksum to make sure that all appearances of  $KEYWORD were still there.  So, task the task is:

Aggregate counts of appearances of 'keyword' in the ninth column in all sheets of each one of those excels.  Get the sum per file.

Apparently, after 5 minutes of typing in trance, this did the trick.

for i in **/*xlsx ; do echo $i ; csvfix write_dsv -f 9  <(xlsx2csv.py --all $i ) G 'keyword' WC ; done


We can't get much further with debugging this. The pity with these kind of approaches is that they either solve your problem in the first shoot, or it gets exponentially difficult to treat for special cases, or add debugging info.

I got to, at least, compare the results themselves using vimdiff.

vimdiff <(csvfix write_dsv -f 9  <(xlsx2csv.py --all file1.xlsx ) G 'keyword') \
        <(csvfix write_dsv -f 9  <(xlsx2csv.py --all file2.xlsx ) G 'keyword')


This is totally not rocket science, but I love the feeling of power and accomplishment you get when this magic incantations work.  You run that, you get the result, you use the result, and you throw the whole thing away.

And you keep doing what you were doing.  Or go write a post about that.


domingo, 13 de agosto de 2017

The information


I've just finished one of the books I've enjoyed the most. In my life. "The Information", by James Gleick. I'm totally fascinated by it. maybe because of many references I already knew, and it added contents here and there to a field I'm already into it.  I don't know, it's a different beast from purely original content books, as this one is mostly a recap of stories and history, but it's put in a VERY enjoyful way. It burnt my pan twice.

It was a recommendation from Santiago Ortiz, and it was a blast. Filled with insights and anecdotes about how mankind have been treating the information (or the lack of), storing, understanding, using, creating, inventing, discovering.....

I'd say it's a more practical (and up to date) companion for Thomas Khun's "the structure of scientific revolutions". It's nowhere near as revolutionary, but I found it a great source of epistemology.

If you've read CODE by Charles Petzold, I'd say it drives you so a similar steep ramp, but in the abstract side of the same concepts. From simple to complex, from individuality to community, from concrete to abstract, from singularity to commonality. Layers of abstraction get placed one after the other, using bricks filled with dates, names and anecdotes.

It quotes the masters, and it's filled with folklore, which helps you understand what happened when.  I truly enjoyed it, and I have tenths of bookmarks inside it, which means I had tenths of AHA! moments. Which unfortunately doesn't happen very often.

martes, 8 de agosto de 2017

Dynamic Languages Wizards Series - Panel on Language Design

This video is a stream of knowledge and insights capsules from VERY smart people related to dynamic languages. Give it a try. 0 bullshit.