TIL June 11

Creating AWS S3 buckets from command line If you have necessary AWS credentials export AWS_ACCESS_KEY_ID="XXXX" export AWS_SECRET_ACCESS_KEY="YYYY" you can install aws cli with pip3 install --user aws and create an S3 bucket from command line without opening a web browser. aws s3api create-bucket --acl public-read --bucket my-unique-bucket-name --region eu-central-1 --create-bucket-configuration LocationConstraint=eu-central-1 which will create a bucket readable by public in http://my-unique-bucket-name.s3.amazonaws.com. Then you can copy your files with aws s3 cp my-file.

Translating Ottoman Turkish Spelling to Latin Alphabet using Surface Forms

dervaze is a project I have started back in my Ph.D. work in 2015 to translate Ottoman Turkish to modern Turkish spelling and providing an OCR/ICR/handwriting recognition engine for Ottoman language. The reason I had to stop was the lack of data, since without some considerable amount of data, statistical methods for both Natural Language Processing and Computer Vision fails. Producing and maintaining data seemed a much more important burden than having technical solutions, so I mostly gave up the idea that a working solution is obtainable with the classical OCR techniques.


There is a little Python command line program called telegram-send to send messages to you telegram account. First you need to register a new bot from BotFather and get a key. Then you pip3 install --user telegram-send and prepare a config file in ~/.config/telegram-send.conf [telegram] token = <TOKEN_YOU_GET_FROM_BOT_FATHER> chat_id = <CHAT OR USER ID> You need to start a conversation with the bot and learn your user id (that’s identical with the chat id you start with the bot.

aerc and goneovim

aerc I started to use aerc as a command line email client. At first I used its archived Github repository but the software was buggy. Then the real repository gave me the fastest processing IMAP client I’ve ever had. Asynchronous operations make the workflow very smooth and you don’t wait the server for each deleted/archived mail. goneovim I also began using a neovim GUI called goneovim. Formerly I was using Neovim-GTK for this but somehow (either from Fira Code’s ligatures or some kind of incompatibility) the visuals were ugly and there was some lag.

TIL May 1

Nota seems a nice command line calculator. It converts what you type into ASCII art formulas. In[1]: 10 + 10 Out[1]: 20.0 _____ In[2]: ╲╱ 100 Out[2]: 10.0 ┌ ┐ In[3]: Max │ 10 , 1 , 21 , -3 │ └ ┘ Out[3]: 21.0 In[4]: ⟨Emre's Number⟩ ≡ 79 Out[4]: 79.0 _______________ In[5]: ╲╱ Emre's Number Out[5]: 8.888194417315589 2 In[6]: Emre's Number Out[6]: 6241.0 Emre's Number In[7]: Emre's Number Out[7]: 8.

TIL April 29

When I try to use sed for find edit in multiple files, always I remember that perl -pe is better suited for this task. Today this happened again. I tried to find and replace lines starting with # Bla bla with title: Bla bla and it was easier to use perl -pe 's|^#+ (.*)|title: $1|g than identifying what kind of regular expressions does sed use. For Hugo front matter at the beginning of files, it’s possible to determine type but not possible to set the section.

TIL April 28

In yesterday’s post, I’ve presented a Python script to convert Pelican preamble files to YAML for Hugo. For some UTF-8 files, these is a BOM marker at the beginning of the file. The script (as a true quick and dirty solution) doesn’t check the presence of such marker and it cannot detect the Title element if it exists. I added an fm = fm.strip('\ufeff') line to clear BOM marker from a line if it exists.

TIL April 27

This blog has now moved to Amazon Amplify. It’s connected to a Bitbucket git repository and AWS pulls it at the moment it’s pushed. I was polling the repository manually in a VPS but this is much quicker. Setting your domain name for Amplify requires (a) to write a CNAME record to prove ownership. Then (b) you modify ALIAS and CNAME records of @ and www records to a cloudfront URL given to you and automatically your site becomes https.

TIL April 26

Hugo has a Casper theme but not listed in the official themes directory. Hosting static websites on AWS takes 5 minutes of configuration. For some of my books, I think I can use some ornamental public domain images. This guy talks about a third way to stop the pandemic: Testing everyone. The one that is most proven and ready to scale is based on a technology called LSPR.

Anonymous functions in dart

Sometimes we need anonymous functions to use for once. Dart allows two similar syntax for writing these. First one is when there is a single expression to write. (a, b) => a + b The other is when you need to write multiple statements in an anonymous functions. (a, b) { return a + b; }

Should we expect a software crisis?

I read a blog post titled The Quiet Crisis unfolding in Software Development that mainly says, current software building practices lead to accumulate technical debt and legacy software becomes unmanageable in time. It warns about highly skilled developers “These kinds of high performers are actually low performers when when TCO is factored in. Unless you’re a startup where time to market is the highest priority, keep these kinds of developers under close scrutiny with extensive design and code reviews.


When you need a simple dialog to get input from the user or just some piece of information in a GUI dialog, zenity helps. It allows scripts to receive user input by dialog. zenity --info --text="Merge complete. Updated 3 of 10 files."

Python Data Science Handbook

The book has the chapters on iPython, Numpy, Pandas, Matplotlib and Machine Learning. It looks, it doesn’t delve into technical/theoretical aspects but focuses on Python libraries regarding data science. Github page for the book.

SSH keys for Multiple Accounts in Github

I have multiple Github accounts and some of these are collaborator in others. I don’t like to write passwords every time I push, so I set up SSH keys for my accounts. But Github (understandably) doesn’t accept a key in more than one account. (Otherwise how can it know?) But there is a way to use .ssh/config file to use different keys for different target urls. # Personal account, - the default config Host github.

Fixing Pip Timeout Problems

For a large package like tensorflow I was experiencing the following error in pip pip._vendor.urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='files.pythonhosted.org', port=443): Read timed out. v = self._sslobj.read(len, buffer) socket.timeout: The read operation timed out I noticed the pip has a timeout parameter that can be set: pip --default-timeout=1000 install package-name

How to convert Numpy image to QImage?

When writing the GUI code in Qt for a deep learning system, a general problem is to convert an image (read from disk or camera using OpenCV) in the form of a Numpy array, to a QImage to be shown in a form or widget. There are basically two problems: Numpy array’s data type has usually more than 8 bits and OpenCV reads the image in BGR format rather than the more general RGB.

Query Logging in Databases when using Parameters

We don’t construct the database queries using the string formatting due to security problems. SQL injection attacks stem from lack of escapes and building queries from given strings. We use parameter passing to database engine, e.g. SELECT * FROM people WHERE name = ? and use this query and pass the parameters /separately/ to the database. All databases support this kind of queries. In Sqlite 3 under Python, we use

Coursera Deep Learning Specialization Notes

These do not contain answers to quizzes or assignments per Honor Code. If you are looking for those, look elsewhere. Binary Classification Given a picture, classify it as cat or non-cat. The result is $\hat{y} = P(y=1 | x)$. In other words, given $x$, we calculate the probability that this data represents a cat. Feature Vector from Image We convert a picture, e.g. (64, 64, 3) picture into a (64 * 64 * 3, 1) feature vector.

Numpy ValueError while using dlib's face detector

For two days, I was trying to find a bug in my code, because an assertion in the code that uses numpy.max was giving an error like ValueError: zero-size array to reduction operation maximum which has no identity which didn’t seem reasonable. I’m building a face recognizer with dlib‘s frontal face detector, and today, I noticed that some of the results return negative coordinates in face detection. This means the detected face is partial, although it’s a bit a stretch to use negative coordinates for this.

Static Variables in Python

I’m using this too much in different projects and would like to keep it here. Python doesn’t have C style static variables natively. (Although it supports class variables which can be used for similar purpose in OOP.) However, as the functions are also objects in Python, it’s possible to embed variables inside the function. An elegant solution at SO creates a decorator for static variables. def static_vars(**kwargs): def decorate(func): for k in kwargs: setattr(func, k, kwargs[k]) return func return decorate @static_vars(counter=0) def foo(): foo.

Development Journal, June 9

I began implementing Ottoman translator using Finite State Transducers via OpenFST. Instead of using ad hoc algorithms to translate Ottoman and Turkish into each other, I’ll be creating FSTs. In the past I have used FOMA and TRmorph, as a building block and basis for Ottoman conversion. However I saw that writing something on top of a morphological analyzer to convert Ottoman to Turkish requires almost another morphological analyzer. (This is also true for Turkish to Ottoman conversion as well, because spelling rules of Ottoman requires another layer of FSTs.

Adding version information to executables in CMake projects

In programming, versioning your code files are of immense importance. Most of the files needs to be updated, renamed, merged constantly. You also need backups, as one learns through losing work due to various computer problems. Another problem that we face is establishing a connection between an executable file or library to its code. We normally don’t add executable files to the version control, as they are produced from code files.

When `y` and `p` commands in IdeaVim doesn't work

I began using IdeaVim plugin for Android Studio, some time ago. It’s nice, but as a Vim newbie, I wasn’t aware that Vim doesn’t use system (Windows, macOS, XWindow, etc.) clipboard for copy/paste by default. So when you use y command in IdeaVim, it doesn’t paste to other applications.. The solution is easy when you spot it: Your ~/.ideavimrc file needs the following line set clipboard+=unnamed

The Sorry State of NDK testing in Android

I’m writing a C library to use in Android, iOS and Python applications. Although the C library has its own unit tests, I wanted to write a few more to ensure that data transfer between C and Android layers are correct. In Android, one needs to put unit test files in app/src/test/*module-name* directory. I spent a few hours yesterday to write tests that checks the conversion between visenc and unicode are correct in Android.

Regular and Recurring Tasks in todo.txt

I’m using todo.txt format to keep some of my daily tasks. It’s a plain text format and both iOS and Android has apps, like SimpleTasks. Emacs and Vim has support for the format too. Actually you don’t need a special editor for it, the format is so simple that even Notepad may be enough. I have a shell script to add daily recurring tasks, like Drink Water or Pray Maghreb to this file.

Progress on Ottoman Translation - 2018-6. Week.

Some of the following posts will be like TODO list for the coming months. What am I planning with dervaze and its mobile versions. As I have become mostly a solo developer, I’ll share my experience with the problem here to shed light for those interested. The technology for Ottoman OCR was mostly ready when my interruptions regarding family life began. I’ll need to check what is available but a more pressing problem for me is the speed of translation.

A Restart

It’s been a while, a few years, that I’ve updated this site. I had some of my technical writing elsewhere, but I’ve decided that I can restart updating here as well. I’ve moved the site to Pelican and moved older writings here. Much of the content is out of my interest, but I’m just keeping them for no concrete reason. I’ll try to update the site daily, with my adventures in software development and technology.

CV as of February 2019

Personal Info Name Ibadullah Emre Sahin Work Address: Teknokrat Yazilim A.S. Yahsibey Mh. Yahsibey Ck. No: 8 Bursa Turkey Birth 15 July 1979, Ankara Turkey Citizenship Turkish Gender Male Contact i.emre.sahin@gmail.com +90 532 261 8985 Work Experience Various small programming projects during high school. (1995-1998) Project Manager, Devops and Software Developer in YD Yazilim, Ankara Turkey. Freelance Developer during university years and after (2000-2015)

zsh'de dosya seçim operatörleri

zsh, dosya seçmek için bash'ten biraz daha gelişmiş operatörlere sahip. Bunlar sayesinde bir defada, bir dizindeki dosyaların tamamına erişip, onlar üzerinde işlemler yapmak mümkün. Burada kısa bazı örnekler vereceğim. Bir dizindeki tüm dosyalar :: ls * Bulunduğumuz ls ** Bulunduğumuz ls */**(.mw-2) Bulunduğumuz ls */**(.Lm+100) Bulunduğumuz ls */**(.R) /etc ls /etc/**(.W) /etc =ls etc/**(.Wmw-1) zsh'in sağladığı bu gibi seçeneklerin yanında, dosya adını parçalamak da kolay . Misalen dosyanın eklentiden önceki kısmını almak için *(:r) yazıyoruz.

Visual Transliteration for Ottoman

There are already various transliteration systems for Arabic based scripts to represent in Roman. However all of them aims to represent phonemes in transliteration, without paying attention to different visual elements. When we are manually transcribing these texts, the method is fine. However when we try to represent visual elements in scanned handwritten documents, we faced some problems regarding these transliteration systems. Since conventional systems aim to represent phonemes, a correct reading is necessary and this requires expertise in the represented language.

A Fast Local Descriptor for Dense Matching

Authors: Engin Tola, Vincent Lepetit, Pascal Fua Keywords: Stereo image descriptor circle quantization formalization binary mask Depth estimation Q1: How depth estimation is related with object recognition? Objects are located in a 3D environment and in order to recognize them correctly, we need to be able to recreate their layout in a scene. With such an aid, we cn successfully determine the object boundaries. Q2: What does the descriptor contain?

A Need for Yet Another Transliteration Alphabet for Ottoman

The Ottoman Text Archival Project has its own reversible transcription system. However, for word labels, this is an overkill and too much work for experts. I'm looking for one-to-one mapping between different visual elements of a word and its representation in UTF-8. The labels should be simple to remember, but variable enough to represent visual variations of words. I'm thinking to create letter+digit codes. Letter part will reflect the most similar sound, the digit will reflect the visual variation.

A Regular Conversion Algorithm Between Turkish and Ottoman

Modern Turkish spells all Turkish/Arabic/Farsi rooted words according to their pronunciation. When it comes to convert from a system to another, this creates a problem that might be solved with the aid of regular expressions. For example, in Ottoman a word is spelled as mnwr, as letters corresponding to letters in Arabic, but in Turkish, the spelling reflects the pronunciation as münevver. Since 1-1 mapping is not possible between these two writing systems, a set of possible Ottoman spellings must be produced with a regular expression.

Backup Script for Recent Files

I decided to write a script to backup only the recent files. There are solutions based on unison that work periodically for all files, but as I'm changing projects, I need to configure new backups for these projects as well. However this is cumbersome and error prone, it's easy to forget to add new artifacts to backup scripts and lose them in a state of emergency. Therefore I decided that a small bash script that works with rsync and find works better.


I write everyday. Everyday I write. I have a quota of words to fill and after each sleeping session, I sit in front of my keyboard and begin pressing words. I was using Emacs for this. Emacs, the One True Editor. Yet it has one flaw that makes me divert from this writing routine. It has too many features and when I see a block and some idea that I'm not big enough to put into words, I begin to play with it.

Converting Latin based Turkish spelling to Ottoman

I'm working on a system to search Ottoman document collections. In order to query a large collection in Ottoman, the user needs to write the query in Ottoman, which uses Arabic based alphabet with completely different set of spelling rules. This limits the usability, since most of the users will not be familiar with spelling. Experts do, but we can't assume experts will be able to use it. There are various methods of transcribing Ottoman to modern Turkish.

Copying and pasting with XWindow clipboard from tmux

tmux does not natively support XWindow's clipboard. With two lines in .tmux.conf you can configure two keys to send and retrieve clipboard content. Traditionally applications use PRIMARY selection which uses the mouse selection for copy and pastes with the third button. However this becomes less and less common, so I'll configure the CLIPBOARD selection most newer browsers, applications and Emacs use. Add following lines to .tmux.conf: # move x clipboard into tmux paste buffer bind < run "xsel -ob | tmux load-buffer - ; tmux paste-buffer " # move tmux copy buffer into x clipboard bind > run "( tmux show-buffer | xsel -bi ) && tmux display-message \"ok!

Dervaze: A Transliteration System for Ottoman

/Dervaze/ (meaning "the portal") is a set of tools that aim to transliterate historical Ottoman documents to Modern Turkish. These will be hosted in http://dervaze.com in the near future. In this document, I describe the transliteration system. The system is organized as a pipeline in which the tools at a stage produce the input of the next stage. Input to the system is a set of historical document images and the output is either a search result or a textual representation of these documents.

ggplot2 Elegant Graphics for Data Analysis

The important parts of the book are grammar of ggplot qplot for easy plotting geoms linear models in plots qplot =qplot()= is designed after plot() The three most important parameters to qplot are x, y and data. If data is specified, it's used as a namespace for variables =qplot(carat, price, data = diamonds)= =qplot(carat, x * y * z, data = diamonds)= =color= is another argument that can be specified for differentiating.


My browser of choice was Google Chrome, but latest versions became resource hogs and I was feeling this in my older machines. I decided to take a look at alternative browsers and settled on Midori. I turned off JavaScript (best JS is dead JS), turned on ad blocking and keyboard shortcut customization (Ctrl-F to Ctrl-S as in Emacs). It's loading noticably faster and I can't guess the number of tabs open in my browser while using other applications.


I used mutt for years. I like it. Its customizability and macros make me feel at home and I was able to automate most of my tasks with it. I began to use mutt after gnus on Emacs. The reason I left gnus behind is that it was incompatible with offlineimap and slow for IMAP use. I see no point installing a local IMAP server when the tool must work with Maildirs does not work.

My Emacs Packages

I'm using Emacs for about 7 or 8 years now, maybe a bit less than that, maybe more. I tried to quit several times for other editors, different workflows and everytime I returned with more enthusiasm. It's hard to tell for those who use their editors with mouse clicks on pretty icons but once you catch this virus called doing everything from the keyboard, it becomes attached to your digital (from digitus, finger) psyche that is impossible to leave behind.

nginx and php-fpm notes

These are a few points that I put as a reminder to myself. If you host multiple sites, only one of them (default) should have listen 80 directive. The rest are defined by server_name directives. Debian's default configuration file comes with Unix socket definitions for php-fpm. Nginx needs to connect via TCP port, it should be changed to port directives.

Notes on Computer Vision A Modern Approach 2E

A: What do you want from me? What should I know to consider myself expert in CV? A: How an object is separated from its background? An object is separated from its background in an image by an occluding contour. A: What would you want from Chapter 1? Chapter 1 is about cameras and their parameters. I don't like to learn much about these at the moment. A: What would you want from Chapter 2?

Paper Review: A practical approximation algorithm for LMS line estimator

Authors: David M. Mount, Nathan S. Netanyahu, Kathleen Romanik, Ruth Silverman, Angela Y. Wue Keywords: LMS estimator O(n logn) bracelet slab random approximation quantiles Q1: What is LMS? Given a set of points p0, ..., pn, LMS finds a line q0, q1 that minimizes the median of the square of distances of p0, ..., pn. This is in contrast with summing up all the squared distances and minimize them as in OLS (Ordinary Least Squares.

Paper Review: Computerized Paleography: Tools for Historical Manuscripts

Authors: Liow Wolf, liza Potikha, nachum Dershowitz, Roni Shweka, Yaacov Choueka Keywords: handwritten paleography fragments SIFT sparse coding dictionaries Q1: What is the ultimate goal of authors? Two main goals are, providing tools to bring together the fragments of the same page (from Cairo Genizah) and trying to classify handwriting and dates. Q2: How SIFT is used? SIFT is used in (all?) points of a letter to generate desxriptors.

Paper Review: FREAK: Fast Retina Keypoint

URL: http://www.ivpe.com/papers/freak.pdf Authors: Alexandre Alahi, Raphael Ortiz, Pierre Vandergheynst Keywords: Keypoint Binary descriptor Retina Sampling Saccadic Coarse-to-fine Orientation Q1: What is the formula for the retina pattern? The one difference from BRISK is the pattern has overlapping circles. In BRISK they were tangential. Redundancy increases recognition. The circles are log polar. In this case, it's similar to Shape Context descriptors, but we don't divide into regions, we create increasingly larger circles on polar lines.

Paper Review: High Performance Layout Analysis for Arabic and Urdu

Authors: Syed Saqib Bukhari, Faisal Shafait and Thomas M. Breuel Keywords: ridge printed text non-text segmentation gaussian-filter bank reading order Q1: How line skew is determined? There is a θ parameter in Gaussian kernel which is used to produce ridges. This may be used in detecting the skew, but since it's constant for an entire page, a varying line skew will probably decrease its performance. Q2: How non-text portions are detected?

Paper Review: HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents

Authors: Andreas Fischer, Emanuel Indermühle, Volkmar Frinken and Horst Bunke Keywords: error tolerant DTW HMM inaccurate transcriptions Parzival DoG string alignment keyword spotting Viterbi Q1: What's the measure for success of alignment? The measure for success is (words − deletions − insertions − substitutions)/(words) . It gives the accuracy of alignment. Q2: What are the features used in keyword spotting? Q3: How Viterbi algorithm is employed? Q4: What does the first pass receive and produce?

Paper Review: Polygonal Approximation of Digital Curves to Preserve Original Shapes

Authors: Daeho Lee, Seung Gwan Lee Keywords: dominant points consecutive vectors toothbrush shape distance metric smallest perpendicular distance Q1: How usual calculation of distance is done? Minor DPs are deleted in approximation. A minor DP is a DP where the perpendicular distance between the point and the straight line is minimum. a a b Here b is deleted when its distance to the line a-a is minimum.

Paper Review: Shape Classification Using Zernike Moments

A: What is a moment? A moment is defined as mp, q(x, y) = ∫+ ∞− ∞∫+ ∞− ∞xpyqf(x, y)dxdy In other words, it's the summation of the figure w.r.t function f for both axes A: What are Zernike moments? Zernike moments are complex polynomial functions that we use to sum the elements of a shape. It is was first introduced in 1930s. The higher the order of it, the more complex shape appears.

Paper Review: Text Line Segmentation of Historical Documents: A Survey

Authors: Laurance Likforman-Sulem, Abderrezak Sahour, Bruno Taconet URL: http://arxiv.org/pdf/0704.1267.pdf Keywords: page segmentation overlapping components image quality document complexity preprocessing projection based smearing based grouping based hough transform based repulsive attractive stochastic touching components Q1: What are the most usable techniques for Ottoman divans? Likforman-Sulem and Faure's techique which uses Gestalt criteria to associate text elements might be of use. Feldbach and Tennies' work which is tried on Church Registers may also be helpful.

Paper Review: Three Things Everyone Should Know to Improve Object Retrieval

Authors: Relja Arandjelovic, Andrew Zisserman Keywords: large scale image datasets rootSIFT image augmentation query expansion paris buildings Q1: What's RootSIFT and how does it improve over L2? /RootSIFT/ is a modified SIFT descriptor where the elements are square roots of L1 normalized SIFT descriptors. Comparing RootSIFT descriptors with Euclidean (L2) is equivalent to using Hellinger kernel to compare SIFT. Hellinger kernel is dE(√(x), √(y))2 = 2 − 2H(x, y).

Patch Histogram Feature

This post will introduce a new feature for binary blobs like connected components in a text. The feature is called patch histogram and it's the histogram of 3x3 patches of black and white pixels. We collect all 3x3 patches and count their frequency. 3x3 patch for a binary image contains 2^9 = 512 different combinations. For each of these combinations, we assign a number. I wrote the implementation in Python and here is a lookup table that converts all possible 3x3 patches to their ids.

Probabilistic Graphical Models Course Notes

Preliminaries Distributions Video Suppose A has 2, B has 2 and C has 3 possible values. Their Joint Probability Distribution will contain 2x2x3=12 values. We can condition the values by setting a variable to a certain value. We can also marginalize the values to a certain variable and check the distribution of this single variable. Factors file:~/bighome/Watch/1 - 4 - Factors (0640).mp4 A factor \phi is a function that takes values for A, B and C and returns a real value.

R Notes

These are the notes I took from here and there, including Coursera Data Analysis course and R's online help, with help.start Basics =R= objects have attributes which can be observed by attributes() functions. =<-= is the assignment operator =:= is used to create integer sequences. 1:4 = 1 2 3 4 =c= function can be used to create vectors from different kinds of objects. (concatenate) c(TRUE, FALSE) creates a logical vector, c(1+3i, 4+8i, 3-5i) creates a complex vector.

Randomness Course Notes

Definitions of Randomness Kolmogorov Complexity of a seqyence = The shortest algorithm that produces it Martin-Löf A sequence is random is it passes all statistical tests It cannot be produced by a program shorter than itself The digits of \pi are not random in this sense. Not just "difficult to compute", there is no consistent way to define shortest algorithm It's impossible to find a way to ensure that a sequence is random.

Randomness Course Notes

Definitions of Randomness Kolmogorov Complexity of a seqyence = The shortest algorithm that produces it Martin-Löf A sequence is random is it passes all statistical tests It cannot be produced by a program shorter than itself The digits of \pi are not random in this sense. Not just "difficult to compute", there is no consistent way to define shortest algorithm It's impossible to find a way to ensure that a sequence is random.

Recurrent Neural Networks

These notes are gathered from various places. When I can, I give credits and links, but even if I don't, they are certainly not original ideas. Sequence Learning in RNNs A example to sequence is a set of words in English. Sequence learning and transforming allows computers to translate this sequence to another language. Or if no target exists, RNNs predict next element in a sequence. The prediction blurs the line between supervised and unsupervised learning.

Shell (Bash and Zsh) Notes

Don't use ~ in scripts, use $HOME I used ~ several times in scripts and it may or may not work. Use $HOME to refer to the home dir, it always works.

This Site's RSS Generator

Previously with Pandoc I was using a simple setup to create RSS. Markdown files were converted to plain, headerless HTML and they were collected together to build an XML file. The obvious drawback is that all HTML files should be generated by Pandoc and anything that doesn't fit that route does not appear in the feeds. However, when I began to use Org Mode for data analysis and other tasks, I began not to touch Pandoc.

Turning Ottoman Letters into Graphs (1)

Today's work was about sharding a page's components and recording them as new images. Instead of artificial boundaries (like word/sentence boundaries), the labeling should rely on connected components. There are two problems here. In Arabic based writing systems, dots play a significant role, much more so than Latin based scripts. Therefore these dots should be classified correctly. The second problem is that the connected components are not always reliable. There are unduly divided components which are part of a single component.

Using a Single Threaded Functor in Multiple Threads with Futures in C++

Multithreaded programming requires a shift of paradigm when it comes to return values of functions. C++11 provides `std::async <http://en.cppreference.com/w/cpp/thread/async>`__ to run functions asynchronously but this is not available in older versions. My current project on word spotting on historical documents is fairly complete in functionality but I decided that searching word images on page images concurrently is necessary for speed up. I'm already using Boost for many of the functionality and instead of creating a dependency on not yet mature C++11 support in various compilers, I decided to use =boost::thread=s.