Welcome to Emre's Digital Garden

This site is a digital garden. At least I intend to tend it. It’s mostly intended for my future self. If you like them, please drop an email to contact@emresahin.net to tell what you like. You can use the same address to report errors and dislikes.

These are now classified into roles I play in this limited lifetime. I may change the classification some time in the future. After 40 years, I still don’t know how to classify myself.

Some roles I have could be:

Continuous Developer

Everyone knows software development never finishes. You have to keep up, learn new technologies and fix (or replace) the bugs. I’m developing software since 1996 or its whereabouts. These are mostly tidbits and development journals.

Perpetual Learner

Masochistic Minimalist

  • Using cat for writing: I was using a simple script based on cat to write notes and posts. It was an experiment to keep myself to think more about content and less about minor errors. I think I lost that kind of minimalist enthusiasm and switching to Neovim from Emacs helped a bit too.
  • Using todo.txt for recurring tasks: I gave up using todo.txt after the list grown into something not managable. These days I use Trello to forget the tasks.

Manager

  • Burnouts: I read a lot about it but I think I never lived it. This is mostly because I don’t have to do jobs that I don’t like for the long run.
  • Energy for taking risks: How to make people to work on long term goals?

Tool-Junkie

Lazy Scripter

Software Architect

Researcher

These are mostly about my Ph.D. work in building OCR for Ottoman Turkish. They are dated. I may return to work on this some time in the future as the problem seems still open.

Shameful Procrastinator (To Be Classified)

Burnouts

I’m reading about burnouts time to time. Most of the comments say that it’s not working for long hours. My take is also similar. Burnouts are not related to hours, but it’s about control. If you’re working in a project that you can show yourself, you can probably work longer with little burnout. Otherwise, even if the work is easy or simple, if you don’t have control over the outcome or the results won’t carry your name, working less doesn’t prevent burnouts.

Rust `ends_with` and `strip_prefix` behavior differences in `Path` and `str`

As I’m writing an ignore library, I’m having subtle bugs caused by Rust’s Path behavior. ends_with in Path is different from ends_with of str If you try to check whether the Path is a directory by the final character, you’ll see that path.ends_with("/") returns false. Path::strip_prefix eats also the final slash. This means if you have a directory marker at the end, /Users/emre/mydir/ and send it to strip_prefix("/Users/emre"), you’ll get mydir, not mydir/.

Creating a file system watcher with ignore rules

In the [previous post], I described the development of a file system walker. It returns files from a directory recursively considering the ignore patterns. In this post, I’ll update it. Some of the requirements has changed and I need new features: walk_serial takes a Sender and uses channels to send the results. This is confusing, as it doesn’t use any parallelism to traverse the file system. There are long running processes that modify the filesystem, which can affect the behavior.

Creating OS-dependent temporary directories in Rust

There are a few crates in Rust for getting system dependent directories, like user’s home or system configuration directory. The one I prefer is directories-next Recently, I needed a standard function to get the temporary directory. I checked the crate documentation but couldn’t find a proper function for this. Then I noticed std::env::temp_dir() returns a PathBuf that points to the system temporary directory. This is note for myself that we don’t always need extra packages for basic functionality.

Unit tests vs Integration Tests in Rust

One thing I’ve noticed after starting to work in Rust is that Test Driven Development became feasible. As an absentminded by default developer, I value tests. However, if the cycle is slow, it becomes highly expensive to test everything. I think this is one reason unit tests are more favored than integration tests. They are more precise and require less time to complete. Round-trip tests are more feasible with unit tests.

Developing a gitignore crate

I needed a file system ignore library for a utility I write. This is similar to Git’s .gitignore, but the files containing the rules can have different names, and ignore rules may be defined programmatically. I was using burntsushi’s ignore crate, which in turn uses globset by the same author. In a directory hierarchy, I was first collecting all .ignore files, then trying to decide which files are ignored. Rg Gitignore’s from parameter led me to believe this.

Software architecture as a tree

In Rust, there are several ways to manage memory. Memory has two regions: one is called stack. Each new scope (between { and }) adds variables to the stack. Variables in the stack are reserved at once when a new scope is introduced and they are freed when the scope ends. This brings speed. Variables in the stack are also reserved contiguously, so they are more compatible to the principle of locality.

Representation

The programmer at wit’s end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming. – Mythical Man-Month As years pass with newer projects, the one thing I notice more is that data is more important than algorithms. You can alleviate bad algorithms with better algorithms during the course of project, but to fix the bad data design, a full rewrite is usually necessary.

Death of Agile?

I watched a relatively old talk by Allen Holub: The talk is nice. He says the scrum is not agile, and that it provides a very strict set of rules that has nothing to do with it. Although I’m convinced that scrum isn’t agile, I’m not convinced that we don’t need processes or habits. One other talk he has is this: Basically, what agile boils down to self-managing teams composed of self-managing people, working in close collaboration with the customer.

Quote from the Mythical Man-Month

Although we don’t rent IBM mainframes anymore, it looks using rented resources for software is not new. The following still applies today for cloud services, maybe in a different scale:

About estimation from the 'Mythical Man-Month'

I’ve read these in Mythical Man-Month. First, our techniques of estimating are poorly developed. More seriously, they reflect an unvoiced assumption which is quite un-true, i.e., that all will go well. Second, our estimating techniques fallaciously confuse effort with progress, hiding the assumption that men and months are interchangeable. Third, because we are uncertain of our estimates, software managers often lack the courteous stubbornness of Antoine’s chef. Fourth, schedule progress is poorly monitored.

Object-Oriented Brain Damage

So, this is the post that I’m thinking for some time. I’m still surprised that it feels like swearing in church, it shouldn’t be this hard (and this rare) to criticize object-oriented programming. What do I mean by OOP? Mainly the classical style, where you define classes with members and methods, and use this to model the solution you’re dealing. It’s supposed that this way was superior to the procedural style, where you write some procedures that modifies some global state.

Activate left window when the current one is closed in tmux

When I closed a window in tmux, it activates the window to the right of it. I want the last window to be activated. I could not find a setting for this but tmux has hooks for many events, so I set window-unlinked event to open the last-window. set-hook -g window-unlinked 'last-window'

risk taking energy

There is a limited amount of energy in an organization. This energy allows people to take risks or not. If people don’t feel that the risks they’ll take will somehow increase their position, they don’t take them. Instead there is an infinite amount of useless and harmless stuff that will fill the time and spend their energy. There is this delusion that if the people are working, they are contributing. Contribution is mostly about taking risks.

Premature Caching is the Root of All Evil

I’m writing a Rust command line app in my spare time to learn the language. It involves some file system checks that I use fs::metadata. As everyone knows, accessing the disk is an expensive operation and must be kept at the minimum. I was thinking to use a HashMap::<Path, Metadata> to cache the results for paths. I’ve then met the cached crate. It caches the results of functions for memoization. This is exactly what I need I thought.

Converting MNIST and Fashion-MNIST IDX format to NumPy

MNIST and the newer Fashion-MNIST datasets are the most well-known datasets to test Machine Learning models. Although the original MNIST dataset is solved as an ML problem, it seems it will be with us for a long time. These datasets are presented in a binary format. There are 4 files, 2 for the training set, 2 for the testing set that are in gzipped IDX and IDX3 formats. Although a simple format, it is non standard and requires to write a custom code.

Deleting duplicate files in Google Drive using rclone

I’m a Google One user and my Google Drive has about 1TB of content from various sources. A few years ago I used a third party utility to sync my linux boxes to Drive and it created more duplicate files. I had around 3-4 different versions of some dirs with different sets of files. As a true procrastinator I postponed the problem until Google One alerted me about my quota.

TIL June 11

Creating AWS S3 buckets from command line If you have necessary AWS credentials export AWS_ACCESS_KEY_ID="XXXX" export AWS_SECRET_ACCESS_KEY="YYYY" you can install aws cli with pip3 install --user aws and create an S3 bucket from command line without opening a web browser. aws s3api create-bucket --acl public-read --bucket my-unique-bucket-name --region eu-central-1 --create-bucket-configuration LocationConstraint=eu-central-1 which will create a bucket readable by public in http://my-unique-bucket-name.s3.amazonaws.com. Then you can copy your files with aws s3 cp my-file.

Translating Ottoman Turkish Spelling to Latin Alphabet using Surface Forms

dervaze is a project I have started back in my Ph.D. work in 2015 to translate Ottoman Turkish to modern Turkish spelling and providing an OCR/ICR/handwriting recognition engine for Ottoman language. The reason I had to stop was the lack of data, since without some considerable amount of data, statistical methods for both Natural Language Processing and Computer Vision fails. Producing and maintaining data seemed a much more important burden than having technical solutions, so I mostly gave up the idea that a working solution is obtainable with the classical OCR techniques.

telegram-send

There is a little Python command line program called telegram-send to send messages to you telegram account. First you need to register a new bot from BotFather and get a key. Then you pip3 install --user telegram-send and prepare a config file in ~/.config/telegram-send.conf [telegram] token = <TOKEN_YOU_GET_FROM_BOT_FATHER> chat_id = <CHAT OR USER ID> You need to start a conversation with the bot and learn your user id (that’s identical with the chat id you start with the bot.

aerc and goneovim

aerc I started to use aerc as a command line email client. At first I used its archived Github repository but the software was buggy. Then the real repository gave me the fastest processing IMAP client I’ve ever had. Asynchronous operations make the workflow very smooth and you don’t wait the server for each deleted/archived mail. goneovim I also began using a neovim GUI called goneovim. Formerly I was using Neovim-GTK for this but somehow (either from Fira Code’s ligatures or some kind of incompatibility) the visuals were ugly and there was some lag.

Using SSH Private Keys in Dockerfile aimed for Google Cloud Run

I had a one-user software that I wanted to deploy to Google Cloud Run from some time. It was on Python 3.5 and when I updated the system it lives on, the virtual environment stopped working. It also depened on some lxml-3.7 and that particular version didn’t compile on my new stable Debian installation. This motivated me to learn docker and gcloud rather quickly. I was able to create a new docker container in a short time.

TIL May 1

Nota seems a nice command line calculator. It converts what you type into ASCII art formulas. In[1]: 10 + 10 Out[1]: 20.0 _____ In[2]: ā•²ā•± 100 Out[2]: 10.0 ā”Œ ā” In[3]: Max ā”‚ 10 , 1 , 21 , -3 ā”‚ ā”” ā”˜ Out[3]: 21.0 In[4]: āŸØEmre's NumberāŸ© ā‰” 79 Out[4]: 79.0 _______________ In[5]: ā•²ā•± Emre's Number Out[5]: 8.888194417315589 2 In[6]: Emre's Number Out[6]: 6241.0 Emre's Number In[7]: Emre's Number Out[7]: 8.

TIL April 29

When I try to use sed for find edit in multiple files, always I remember that perl -pe is better suited for this task. Today this happened again. I tried to find and replace lines starting with # Bla bla with title: Bla bla and it was easier to use perl -pe 's|^#+ (.*)|title: $1|g than identifying what kind of regular expressions does sed use. For Hugo front matter at the beginning of files, it’s possible to determine type but not possible to set the section.

TIL April 28

In yesterday’s post, I’ve presented a Python script to convert Pelican preamble files to YAML for Hugo. For some UTF-8 files, these is a BOM marker at the beginning of the file. The script (as a true quick and dirty solution) doesn’t check the presence of such marker and it cannot detect the Title element if it exists. I added an fm = fm.strip('\ufeff') line to clear BOM marker from a line if it exists.

TIL April 27

This blog has now moved to Amazon Amplify. It’s connected to a Bitbucket git repository and AWS pulls it at the moment it’s pushed. I was polling the repository manually in a VPS but this is much quicker. Setting your domain name for Amplify requires (a) to write a CNAME record to prove ownership. Then (b) you modify ALIAS and CNAME records of @ and www records to a cloudfront URL given to you and automatically your site becomes https.

TIL April 26

Hugo has a Casper theme but not listed in the official themes directory. Hosting static websites on AWS takes 5 minutes of configuration. For some of my books, I think I can use some ornamental public domain images. This guy talks about a third way to stop the pandemic: Testing everyone. The one that is most proven and ready to scale is based on a technology called LSPR. The team building it originally developed the device to monitor the status of the immune system, but it is easily adapted to detect the proteins on the surface of the virus instead of the proteins used for immune signaling.

Anonymous functions in dart

Sometimes we need anonymous functions to use for once. Dart allows two similar syntax for writing these. First one is when there is a single expression to write. (a, b) => a + b The other is when you need to write multiple statements in an anonymous functions. (a, b) { return a + b; }

Should we expect a software crisis?

I read a blog post titled The Quiet Crisis unfolding in Software Development that mainly says, current software building practices lead to accumulate technical debt and legacy software becomes unmanageable in time. It warns about highly skilled developers “These kinds of high performers are actually low performers when when TCO is factored in. Unless youā€™re a startup where time to market is the highest priority, keep these kinds of developers under close scrutiny with extensive design and code reviews.

Zenity

When you need a simple dialog to get input from the user or just some piece of information in a GUI dialog, zenity helps. It allows scripts to receive user input by dialog. zenity --info --text="Merge complete. Updated 3 of 10 files."

Python Data Science Handbook

The book has the chapters on iPython, Numpy, Pandas, Matplotlib and Machine Learning. It looks, it doesn’t delve into technical/theoretical aspects but focuses on Python libraries regarding data science. Github page for the book.

SSH keys for Multiple Accounts in Github

I have multiple Github accounts and some of these are collaborator in others. I don’t like to write passwords every time I push, so I set up SSH keys for my accounts. But Github (understandably) doesn’t accept a key in more than one account. (Otherwise how can it know?) But there is a way to use .ssh/config file to use different keys for different target urls. # Personal account, - the default config Host github.

Fixing Pip Timeout Problems

For a large package like tensorflow I was experiencing the following error in pip pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. v = self._sslobj.read(len, buffer) socket.timeout: The read operation timed out I noticed the pip has a timeout parameter that can be set: pip --default-timeout=1000 install package-name

How to convert Numpy image to QImage?

When writing the GUI code in Qt for a deep learning system, a general problem is to convert an image (read from disk or camera using OpenCV) in the form of a Numpy array, to a QImage to be shown in a form or widget. There are basically two problems: Numpy array’s data type has usually more than 8 bits and OpenCV reads the image in BGR format rather than the more general RGB.

Query Logging in Databases when using Parameters

We don’t construct the database queries using the string formatting due to security problems. SQL injection attacks stem from lack of escapes and building queries from given strings. We use parameter passing to database engine, e.g. SELECT * FROM people WHERE name = ? and use this query and pass the parameters /separately/ to the database. All databases support this kind of queries. In Sqlite 3 under Python, we use

Coursera Deep Learning Specialization Notes

These do not contain answers to quizzes or assignments per Honor Code. If you are looking for those, look elsewhere. Binary Classification Given a picture, classify it as cat or non-cat. The result is $\hat{y} = P(y=1 | x)$. In other words, given $x$, we calculate the probability that this data represents a cat. Feature Vector from Image We convert a picture, e.g. (64, 64, 3) picture into a (64 * 64 * 3, 1) feature vector.

Numpy ValueError while using dlib's face detector

For two days, I was trying to find a bug in my code, because an assertion in the code that uses numpy.max was giving an error like ValueError: zero-size array to reduction operation maximum which has no identity which didn’t seem reasonable. I’m building a face recognizer with dlib’s frontal face detector, and today, I noticed that some of the results return negative coordinates in face detection. This means the detected face is partial, although it’s a bit a stretch to use negative coordinates for this.

Static Variables in Python

I’m using this too much in different projects and would like to keep it here. Python doesn’t have C style static variables natively. (Although it supports class variables which can be used for similar purpose in OOP.) However, as the functions are also objects in Python, it’s possible to embed variables inside the function. An elegant solution at SO creates a decorator for static variables. def static_vars(**kwargs): def decorate(func): for k in kwargs: setattr(func, k, kwargs[k]) return func return decorate @static_vars(counter=0) def foo(): foo.

Development Journal, June 9

I began implementing Ottoman translator using Finite State Transducers via OpenFST. Instead of using ad hoc algorithms to translate Ottoman and Turkish into each other, I’ll be creating FSTs. In the past I have used FOMA and TRmorph, as a building block and basis for Ottoman conversion. However I saw that writing something on top of a morphological analyzer to convert Ottoman to Turkish requires almost another morphological analyzer. (This is also true for Turkish to Ottoman conversion as well, because spelling rules of Ottoman requires another layer of FSTs.

Adding version information to executables in CMake projects

In programming, versioning your code files are of immense importance. Most of the files needs to be updated, renamed, merged constantly. You also need backups, as one learns through losing work due to various computer problems. Another problem that we face is establishing a connection between an executable file or library to its code. We normally don’t add executable files to the version control, as they are produced from code files.

When `y` and `p` commands in IdeaVim doesn't work

I began using IdeaVim plugin for Android Studio, some time ago. It’s nice, but as a Vim newbie, I wasn’t aware that Vim doesn’t use system (Windows, macOS, XWindow, etc.) clipboard for copy/paste by default. So when you use y command in IdeaVim, it doesn’t paste to other applications.. The solution is easy when you spot it: Your ~/.ideavimrc file needs the following line set clipboard+=unnamed

The Sorry State of NDK testing in Android

I’m writing a C library to use in Android, iOS and Python applications. Although the C library has its own unit tests, I wanted to write a few more to ensure that data transfer between C and Android layers are correct. In Android, one needs to put unit test files in app/src/test/*module-name* directory. I spent a few hours yesterday to write tests that checks the conversion between visenc and unicode are correct in Android.

Regular and Recurring Tasks in todo.txt

I’m using todo.txt format to keep some of my daily tasks. It’s a plain text format and both iOS and Android has apps, like SimpleTasks. Emacs and Vim has support for the format too. Actually you don’t need a special editor for it, the format is so simple that even Notepad may be enough. I have a shell script to add daily recurring tasks, like Drink Water or Pray Maghreb to this file.

Progress on Ottoman Translation - 2018-6. Week.

Some of the following posts will be like TODO list for the coming months. What am I planning with dervaze and its mobile versions. As I have become mostly a solo developer, I’ll share my experience with the problem here to shed light for those interested. The technology for Ottoman OCR was mostly ready when my interruptions regarding family life began. I’ll need to check what is available but a more pressing problem for me is the speed of translation.

A Restart

It’s been a while, a few years, that I’ve updated this site. I had some of my technical writing elsewhere, but I’ve decided that I can restart updating here as well. I’ve moved the site to Pelican and moved older writings here. Much of the content is out of my interest, but I’m just keeping them for no concrete reason. I’ll try to update the site daily, with my adventures in software development and technology.

CV as of February 2022

Personal Info Name Ibadullah Emre Sahin Birth 15 July 1979, Ankara Turkey Citizenship Turkish Gender Male Contact i.emre.sahin@gmail.com +90 532 261 8985 Work Experience Various small programming projects during high school. (1995-1998) Project Manager, Devops and Software Developer in YD Yazilim, Ankara Turkey. Freelance Developer during university years and after (2000-2015) CEO and Chief Researcher of Teknokrat Yazilim A.S. Bursa, Turkey (November 2013-Present) Technical Writer in iterative.ai (February 2021-Present) In the past I have completed research projects for Ottoman language recognition for TUBITAK, Arabic video transliteration/search systems for Carnegie Mellon University, real time video face recognition for a company in Ankara, Turkey.

zsh'de dosya seƧim operatƶrleri

zsh, dosya seƧmek iƧin bash'ten biraz daha gelişmiş operatƶrlere sahip. Bunlar sayesinde bir defada, bir dizindeki dosyaların tamamına erişip, onlar Ć¼zerinde işlemler yapmak mĆ¼mkĆ¼n. Burada kısa bazı ƶrnekler vereceğim. Bir dizindeki tĆ¼m dosyalar :: ls * Bulunduğumuz ls ** Bulunduğumuz ls */**(.mw-2) Bulunduğumuz ls */**(.Lm+100) Bulunduğumuz ls */**(.R) /etc ls /etc/**(.W) /etc =ls etc/**(.Wmw-1) zsh'in sağladığı bu gibi seƧeneklerin yanında, dosya adını parƧalamak da kolay . Misalen dosyanın eklentiden ƶnceki kısmını almak iƧin *(:r) yazıyoruz.

Visual Transliteration for Ottoman

There are already various transliteration systems for Arabic based scripts to represent in Roman. However all of them aims to represent phonemes in transliteration, without paying attention to different visual elements. When we are manually transcribing these texts, the method is fine. However when we try to represent visual elements in scanned handwritten documents, we faced some problems regarding these transliteration systems. Since conventional systems aim to represent phonemes, a correct reading is necessary and this requires expertise in the represented language.

A Fast Local Descriptor for Dense Matching

Authors: Engin Tola, Vincent Lepetit, Pascal Fua Keywords: Stereo image descriptor circle quantization formalization binary mask Depth estimation Q1: How depth estimation is related with object recognition? Objects are located in a 3D environment and in order to recognize them correctly, we need to be able to recreate their layout in a scene. With such an aid, we cn successfully determine the object boundaries. Q2: What does the descriptor contain? It's a concatenation of vectors.

A Need for Yet Another Transliteration Alphabet for Ottoman

The Ottoman Text Archival Project has its own reversible transcription system. However, for word labels, this is an overkill and too much work for experts. I'm looking for one-to-one mapping between different visual elements of a word and its representation in UTF-8. The labels should be simple to remember, but variable enough to represent visual variations of words. I'm thinking to create letter+digit codes. Letter part will reflect the most similar sound, the digit will reflect the visual variation.

A Regular Conversion Algorithm Between Turkish and Ottoman

Modern Turkish spells all Turkish/Arabic/Farsi rooted words according to their pronunciation. When it comes to convert from a system to another, this creates a problem that might be solved with the aid of regular expressions. For example, in Ottoman a word is spelled as mnwr, as letters corresponding to letters in Arabic, but in Turkish, the spelling reflects the pronunciation as mĆ¼nevver. Since 1-1 mapping is not possible between these two writing systems, a set of possible Ottoman spellings must be produced with a regular expression.

Backup Script for Recent Files

I decided to write a script to backup only the recent files. There are solutions based on unison that work periodically for all files, but as I'm changing projects, I need to configure new backups for these projects as well. However this is cumbersome and error prone, it's easy to forget to add new artifacts to backup scripts and lose them in a state of emergency. Therefore I decided that a small bash script that works with rsync and find works better.

cat-for-writers

I write everyday. Everyday I write. I have a quota of words to fill and after each sleeping session, I sit in front of my keyboard and begin pressing words. I was using Emacs for this. Emacs, the One True Editor. Yet it has one flaw that makes me divert from this writing routine. It has too many features and when I see a block and some idea that I'm not big enough to put into words, I begin to play with it.

Converting Latin based Turkish spelling to Ottoman

I’m working on a system to search Ottoman document collections. In order to query a large collection in Ottoman, the user needs to write the query in Ottoman, which uses Arabic based alphabet with completely different set of spelling rules. This limits the usability, since most of the users will not be familiar with spelling. Experts do, but we can't assume experts will be able to use it. There are various methods of transcribing Ottoman to modern Turkish.

Copying and pasting with XWindow clipboard from tmux

tmux does not natively support XWindow's clipboard. With two lines in .tmux.conf you can configure two keys to send and retrieve clipboard content. Traditionally applications use PRIMARY selection which uses the mouse selection for copy and pastes with the third button. However this becomes less and less common, so I'll configure the CLIPBOARD selection most newer browsers, applications and Emacs use. Add following lines to .tmux.conf: # move x clipboard into tmux paste buffer bind < run "xsel -ob | tmux load-buffer - ; tmux paste-buffer " # move tmux copy buffer into x clipboard bind > run "( tmux show-buffer | xsel -bi ) && tmux display-message \"ok!

Dervaze: A Transliteration System for Ottoman

Dervaze (meaning “the portal”) is a set of tools that aim to transliterate historical Ottoman documents to Modern Turkish. Here, I describe the transliteration system. The system is organized as a pipeline in which the tools at a stage produce the input of the next stage. Input to the system is a set of historical document images. The output is either a search result or a textual representation of these documents.

ggplot2 Elegant Graphics for Data Analysis

The important parts of the book are grammar of ggplot qplot for easy plotting geoms linear models in plots qplot =qplot()= is designed after plot() The three most important parameters to qplot are x, y and data. If data is specified, it's used as a namespace for variables =qplot(carat, price, data = diamonds)= =qplot(carat, x * y * z, data = diamonds)= =color= is another argument that can be specified for differentiating.

Midori

My browser of choice was Google Chrome, but latest versions became resource hogs and I was feeling this in my older machines. I decided to take a look at alternative browsers and settled on Midori. I turned off JavaScript (best JS is dead JS), turned on ad blocking and keyboard shortcut customization (Ctrl-F to Ctrl-S as in Emacs). It's loading noticably faster and I can't guess the number of tabs open in my browser while using other applications.

mu4e

I used mutt for years. I like it. Its customizability and macros make me feel at home and I was able to automate most of my tasks with it. I began to use mutt after gnus on Emacs. The reason I left gnus behind is that it was incompatible with offlineimap and slow for IMAP use. I see no point installing a local IMAP server when the tool must work with Maildirs does not work.

My Emacs Packages

I'm using Emacs for about 7 or 8 years now, maybe a bit less than that, maybe more. I tried to quit several times for other editors, different workflows and everytime I returned with more enthusiasm. It's hard to tell for those who use their editors with mouse clicks on pretty icons but once you catch this virus called doing everything from the keyboard, it becomes attached to your digital (from digitus, finger) psyche that is impossible to leave behind.

nginx and php-fpm notes

These are a few points that I put as a reminder to myself. If you host multiple sites, only one of them (default) should have listen 80 directive. The rest are defined by server_name directives. Debian's default configuration file comes with Unix socket definitions for php-fpm. Nginx needs to connect via TCP port, it should be changed to port directives.

Notes on Computer Vision A Modern Approach 2E

A: What do you want from me? What should I know to consider myself expert in CV? A: How an object is separated from its background? An object is separated from its background in an image by an occluding contour. A: What would you want from Chapter 1? Chapter 1 is about cameras and their parameters. I don't like to learn much about these at the moment. A: What would you want from Chapter 2?

Paper Review: A practical approximation algorithm for LMS line estimator

Authors: David M. Mount, Nathan S. Netanyahu, Kathleen Romanik, Ruth Silverman, Angela Y. Wue Keywords: LMS estimator O(n logn) bracelet slab random approximation quantiles Q1: What is LMS? Given a set of points $p_0, …, p_n$, LMS finds a line $q_0, q_1$ that minimizes the median of the square of distances of $p_0, …, p_n$. This is in contrast with summing up all the squared distances and minimize them as in OLS (Ordinary Least Squares.

Paper Review: Computerized Paleography: Tools for Historical Manuscripts

Authors: Liow Wolf, liza Potikha, nachum Dershowitz, Roni Shweka, Yaacov Choueka Keywords: handwritten paleography fragments SIFT sparse coding dictionaries Q1: What is the ultimate goal of authors? Two main goals are, providing tools to bring together the fragments of the same page (from Cairo Genizah) and trying to classify handwriting and dates. Q2: How SIFT is used? SIFT is used in (all?) points of a letter to generate desxriptors. There are 100.

Paper Review: FREAK: Fast Retina Keypoint

URL: http://www.ivpe.com/papers/freak.pdf Authors: Alexandre Alahi, Raphael Ortiz, Pierre Vandergheynst Keywords: Keypoint Binary descriptor Retina Sampling Saccadic Coarse-to-fine Orientation Q1: What is the formula for the retina pattern? The one difference from BRISK is the pattern has overlapping circles. In BRISK they were tangential. Redundancy increases recognition. The circles are log polar. In this case, it's similar to Shape Context descriptors, but we don't divide into regions, we create increasingly larger circles on polar lines.

Paper Review: High Performance Layout Analysis for Arabic and Urdu

Authors: Syed Saqib Bukhari, Faisal Shafait and Thomas M. Breuel Keywords: ridge printed text non-text segmentation gaussian-filter bank reading order Q1: How line skew is determined? There is a $\theta$ parameter in Gaussian kernel which is used to produce ridges. This may be used in detecting the skew, but since it's constant for an entire page, a varying line skew will probably decrease its performance. Q2: How non-text portions are detected?

Paper Review: Polygonal Approximation of Digital Curves to Preserve Original Shapes

Authors: Daeho Lee, Seung Gwan Lee Keywords: dominant points consecutive vectors toothbrush shape distance metric smallest perpendicular distance Q1: How usual calculation of distance is done? Minor DPs are deleted in approximation. A minor DP is a DP where the perpendicular distance between the point and the straight line is minimum. a a b Here b is deleted when its distance to the line a-a is minimum. The perpendicular distance is calculated using

Paper Review: Shape Classification Using Zernike Moments

A: What is a moment? A moment is defined as $m_{p,q}(x,y) = \int_{-\infty}^{+\infty} \int_{-\infty}^{+\infty} x^p y^q f(x,y) dxdy$ In other words, it's the summation of the figure w.r.t function f for both axes A: What are Zernike moments? Zernike moments are complex polynomial functions that we use to sum the elements of a shape. It is was first introduced in 1930s. The higher the order of it, the more complex shape appears.

Paper Review: Text Line Segmentation of Historical Documents: A Survey

Authors: Laurance Likforman-Sulem, Abderrezak Sahour, Bruno Taconet URL: http://arxiv.org/pdf/0704.1267.pdf Keywords: page segmentation overlapping components image quality document complexity preprocessing projection based smearing based grouping based hough transform based repulsive attractive stochastic touching components Q1: What are the most usable techniques for Ottoman divans? Likforman-Sulem and Faure's techique which uses Gestalt criteria to associate text elements might be of use. Feldbach and Tennies' work which is tried on Church Registers may also be helpful.

Paper Review: Three Things Everyone Should Know to Improve Object Retrieval

Authors: Relja Arandjelovic, Andrew Zisserman Keywords: large scale image datasets rootSIFT image augmentation query expansion paris buildings Q1: What's RootSIFT and how does it improve over L2? /RootSIFT/ is a modified SIFT descriptor where the elements are square roots of L1 normalized SIFT descriptors. Comparing RootSIFT descriptors with Euclidean (L2) is equivalent to using Hellinger kernel to compare SIFT. Hellinger kernel is $d_E(\sqrt{x}, \sqrt{y})^2 = 2- 2 H(x, y)$. It improves over Oxford 105k baseline system from 0.

Patch Histogram Feature

This post will introduce a new feature for binary blobs like connected components in a text. The feature is called patch histogram and it's the histogram of 3x3 patches of black and white pixels. We collect all 3x3 patches and count their frequency. 3x3 patch for a binary image contains $2^9 = 512$ different combinations. For each of these combinations, we assign a number. I wrote the implementation in Python and here is a lookup table that converts all possible 3x3 patches to their ids.

Probabilistic Graphical Models Course Notes

Preliminaries Distributions Suppose A has 2, B has 2 and C has 3 possible values. Their Joint Probability Distribution will contain 2x2x3=12 values. We can condition the values by setting a variable to a certain value. We can also marginalize the values to a certain variable and check the distribution of this single variable. Factors A factor $\phi$ is a function that takes values for A, B and C and returns a real value.

R Notes

These are the notes I took from here and there, including Coursera Data Analysis course and R’s online help, with help.start Basics R objects have attributes which can be observed by attributes() functions. <- is the assignment operator. : is used to create integer sequences. 1:4 = 1 2 3 4 c function can be used to create vectors from different kinds of objects. (concatenate) c(TRUE, FALSE) creates a logical vector.

Randomness Course Notes

Definitions of Randomness Kolmogorov Complexity of a seqyence = The shortest algorithm that produces it Martin-Lƶf A sequence is random is it passes all statistical tests It cannot be produced by a program shorter than itself The digits of \pi are not random in this sense. Not just "difficult to compute", there is no consistent way to define shortest algorithm It's impossible to find a way to ensure that a sequence is random.

Recurrent Neural Networks

These notes are gathered from various places. When I can, I give credits and links, but even if I don't, they are certainly not original ideas. Sequence Learning in RNNs A example to sequence is a set of words in English. Sequence learning and transforming allows computers to translate this sequence to another language. Or if no target exists, RNNs predict next element in a sequence. The prediction blurs the line between supervised and unsupervised learning.

Shell (Bash and Zsh) Notes

Don’t use ~ in scripts, use $HOME I used ~ several times in scripts and it may or may not work. Use $HOME to refer to the home dir, it always works.

This Site's RSS Generator

Previously with Pandoc I was using a simple setup to create RSS. Markdown files were converted to plain, headerless HTML and they were collected together to build an XML file. The obvious drawback is that all HTML files should be generated by Pandoc and anything that doesn't fit that route does not appear in the feeds. However, when I began to use Org Mode for data analysis and other tasks, I began not to touch Pandoc.

Turning Ottoman Letters into Graphs (1)

Today's work was about sharding a page's components and recording them as new images. Instead of artificial boundaries (like word/sentence boundaries), the labeling should rely on connected components. There are two problems here. In Arabic based writing systems, dots play a significant role, much more so than Latin based scripts. Therefore these dots should be classified correctly. The second problem is that the connected components are not always reliable. There are unduly divided components which are part of a single component.

Using a Single Threaded Functor in Multiple Threads with Futures in C++

Multithreaded programming requires a shift of paradigm when it comes to return values of functions. C++11 provides `std::async <http://en.cppreference.com/w/cpp/thread/async>`__ to run functions asynchronously but this is not available in older versions. My current project on word spotting on historical documents is fairly complete in functionality but I decided that searching word images on page images concurrently is necessary for speed up. I'm already using Boost for many of the functionality and instead of creating a dependency on not yet mature C++11 support in various compilers, I decided to use =boost::thread=s.